Mathematical models are recommended by the ICH Q8(2) guidlines on pharmaceutical development to generate enhanced process understanding and meet Quality-by-Design (QbD) guidelines. Mathematical models can be built using two fundamentally different paradigms: statistics or mechanistically (Table 1). We will discuss the differences between statistical and mechanistic models, and their use in improving your process development.

Table 1. Mechanistic vs statistical models

Mechanistic models Statistical models
Principle Use natural laws Find patterns in existing data
Equations Complex equations from natural sciences Simple equations, derived from statistics and regression analysis
Database Very few data needed (3-10 exp) A lot of data needed (the more data, the better)
Implementation Very high effort to program a simulation tool. Once a model is implemented and calibrated, very low cost of ownership Low programming effort, low ownership cost
Calibration effort Little effort to generate data for model calibration Very high initial effort to generate data and initialize model
Process flexibility Yes No
Interpolation Yes Yes
Extrapolation Yes No
Generates process understanding Yes Limited
Qualification downstream Ideam for downstream, the same model is used throughout the process lifecycle Sub-optimal for downstream, solves only one problem at a time
Qualification upstream Very complex, only few examples of industrial application Frequently used to guide process optimization and scale-up

## Understanding statistical models

Statistical approaches like big data, machine learning, and artificial intelligence use statistics to predict trends and patterns. All of these models learn from experience provided in the form of data. The more the experience, the better the model will be.

Typically, a lot of data is generated within a given parameter space. The model equations are derived by developing a probabilistic model that best describes the relationship between the dependent and independent variables. This model is then based on correlations in the data.

Statistical models are, however, bound to their calibration range and can only predict results within the data space they are calibrated from. In particular, they do not allow any major change in the process set-up. Since they are based on correlation and not causality, statistical models provide limited mechanistic process understanding.

## How are mechanistic models set up?

Mechanistic models are based on the fundamental laws of natural sciences, including physical and biochemical principles. Less experimental data is needed to calibrate the model and determine unknown model parameters, such as adsorption coefficients, diffusivity, or material properties. An essential benefit of mechanistic versus statistical models is that the model parameters have an actual physical meaning, which facilitates the scientific interpretation of the results.

## Building a digital twin using mechanistic models

Both mechanistic and statistical models have their pros and cons. However, mechanistic models are ideal to build digital twins of downstream chromatography processes. Watch our video on digital downstream process twins, to see what impact the difference between statistical and mechanistic models have on building a digital twin.

## Benefits of mechanistic vs statistical models

Since mechanistic models are based on natural laws they are valid far beyond the calibration space. In practice, this means that you can easily change process setup and parameters. Such as switching from a step elution to a gradient or vice versa, changing from batch to continuous processing, changing column dimensions, and much more. As they are based on natural principles, mechanistic models allow you to generate mechanistic process understanding and thus fufill QbD obligations, which is not the case with statistical models.

This opens a wide range of applications using the same mechanistic model without any further experimentation, including early-stage process development, process characterization and validation, and process monitoring and control. Even completely different scenarios can be simulated with no additional experimental effort, such as overloaded conditions, flow-through operations, or continuous chromatography. The model will evolve with the proceeding development lifecycle and account for holistic knowledge management, enabling a fast and lower cost replacement of lab experiments with computer simulation.