GE Healthcare Life Sciences is now Cytiva - Find out more

March 19, 2018

Q and A from webinar about Design of Experiments (DoE) in protein purification method development

By Tuomo Frigård, Scientist, GE Healthcare Life Sciences

Design of experiments (DoE) is a technique for planning experiments and analyzing the information obtained. The technique allows use of a minimum number of experiments, in which we systematically vary several experimental parameters simultaneously to obtain sufficient information. In the webinar, Apply DoE for efficient protein purification method development, on November 15, I introduced DoE, presented how to apply DoE in protein purification and discussed two case studies. Following the presentation there was a live Q&A with many interesting questions from the audience. We have gathered the questions to which my colleagues Jon Lundqvist and Lotta Hedkvist and I responded during the session, plus a few additional ones that we did not have time to answer during the live event.

Read on to learn about the questions your peers were asking, or find the question that you submitted.

How do I select an experimental design?
The selection of a design is based on

  • your objective, mainly screening or optimization
  • the number of variables you wish to study
  • the cost and time you can spend on experimentation to answer the challenges you are facing

You sometimes talk about model, mathematical functions and transfer functions. Are they all the same?
Yes, it's just a fact that people from different areas use different terminology for the same thing.

Center points are mentioned in the presentation. Are replicates of corner experiments just as useful to assess variability?
Process variability can be assessed everywhere in a design, but most often we use center points to do this. Besides process variation, we use the center points to assess if we have non-linear behavior in our process. This is not readily determined by replication of the corner points. Using other experimental conditions as replicates should be considered if the process output is skewed towards a specific region of the design. We also sometimes consider repeating the entire design, for example, if there are only minor detectable differences in the measured response.

Do we need to run the experiments in any order, or do we have to run in random? I mean all corner points at once, start points at once and center points at once. Is it OK to run like this?
It is part of the DoE discipline to have the experimental order randomized. Randomization is important to prevent bias that may unintentionally influence the results. The experiments should NOT be run in a pre-determined order. However, there may be several reasons to divide a design into blocks, for example, if several buffer systems are needed to cover an extensive pH range.

Do we need to run the blocks in the same randomized order to avoid bias?
That would be the most reasonable way to perform it. Remember that randomization is the researchers’ insurance against bias and the introduction of unidentified variables.

On slide 27 you mention “quadratic” terms. What is that?
A quadratic term is a term in the model, the polynomial equation, that is used to describe a non-linear behavior in our data.

On slide 38 you mention that "Similar significant models were obtained also for the responses: HMW1, LMW2, and A300nm". Were all responses included in one DoE experiment?"
We performed a DoE using an experimental design that included the factors and factor ranges described. We then measured several responses from our collected fractions including HMW1, LMW2 and A300nm. In the statistical analysis, individual models were established for these responses. Remember that for one specific response the set of significant terms can vary from model to model.

What is the difference between DoE vs Peak Overlaying?
Different experimentation methods are frequently used. I am not exactly sure what “peak overlaying” is referring to, but generally speaking there are few experimental methods as efficient as DoE. DoE covers several variables in the same experimental design, measures interaction effects and/or non-linear effects, and gives a useable, defined and qualified model for future use. Furthermore, few other experimental methods have software for the analysis that equals the software available for DoE.

What are “quantitated effects”?
Quantitated effects are the effects a controlled input parameter has on a measured output response.

Please explain more about Q2 and R2 in the summary of fit chart. How do these numbers affect the choice of design?
The model coefficients give a picture of the quality of the modelling. They are calculated using multiple linear regression and cross-validation. The modelling has a goal of trying to maximize these coefficients, which involves editing the model. Thus, a relevant model has a unique set of model terms based on the data set/experimentation results we are analyzing.

I would like to know how to optimize analysis when the Q2 is low for a DoE study?
Usually there is a reason for a low Q2. The predictive power of the model has been compromised. The important thing is to analyze possible reasons for the low Q2.

  • Check if residuals are normally distributed.
  • Check for patterns or deviating experiments.
  • Check the residuals vs. variable plot.

What is a reasonable R2 for a full factorial chromatography DoE?
Usually we consider an R2 of 0.7 or above to be a good model. However, you must also consider the other model coefficients, especially Q2.

How many experiments are needed for a three-variable design?
It’s easier if you view the number of experiments envisioning a cube setup. The corner points represent one experiment and is called the factorial part of the design. The repeated experiments are often three or more in number and positioned in the center of the cube. For optimization purposes and for quantifying non-linear effects, we have to position experiments outside the cube, further away from the center of the cube. For example, running three variables in a full factorial design means running all corner points, i.e. all low-high combinations of all included variables. If we include three center points, the total number of runs is 11. Running a response surface modelling (RSM) design with three variables in a CCC (central composite circumscribed) or a CCF (face centered) design setup will add one experiment per side for the "cube," which gives a total of 17 experiments.

Can percent purity be used as an output?
Yes, percent purity is often used as an output. It is most often measured externally, but sometimes we use the area percentage from the peak integration as a measure for purity, as well.

How many actual experiments were run for case study 2? How much time would that take?
There were 27 experiments in the DoE design, and the total run time was about 11-12 hours.

Are the ranges tested restricted, or do you also test larger ranges? Does this influence how the DoE is determined?
We do screen larger ranges in our screening studies, but the range selection is often based also on previous knowledge about the application, the elution pattern determined in a gradient run, or other known factors.

Can you please elaborate on co-efficient plot?
The coefficient plot is a description of the magnitude of the effect each factor has on a specific response. It helps clarify which model terms are significant and should be included in the modelling, and which terms can be excluded (insignificant terms) as the model is refined.

What is the reason for defining response goals as "minimize" or "maximize"?
Sometimes we try to maximize (e.g. yield) or minimize (e.g. aggregate content) our output responses. The specification limits may be set differently depending if it is a capture, intermediate or polishing step.

You discussed having several chromatography steps in one protocol. How can I determine how many steps I need for my purification?
First, determine which purity and yield you need of your protein. The number of chromatography steps needed depends on the target purity and yield. Then you take one step at a time and develop the protocol until you obtain the purity and yield you need. Often, the most efficient improvement in an overall protein purification strategy is to add a second purification step instead of optimizing a single-step protocol. Each purification technique has inherent characteristics, which determine its suitability for the different purification steps. As a rule of thumb, two simple principles are applied:

  • Combine techniques that apply different separation mechanisms.
  • Minimize sample handling between purification steps by combining techniques that omit the need for sample conditioning before the next step.

Is it necessary to assess both static and dynamic binding capacity?
No, it is not necessary to have both. However, static capacity is a good response if you need to perform a first screening of resins. The most efficient method is to use HTPD plates, where you can test many resins at the same time. Next, when you have reduced the number of possible resins, pack them in smaller column formats and determine the dynamic binding capacity, which is a measure of how much protein you can bind without losing protein.

Why are aggregates the same % at pH 6 and pH 7 in the presentation in slide 24?
Data is based on experimental results showing the behavior of this specific protein. It’s difficult to say if there was a specific reason for these results. The data could be related to the measurement system, in this case SEC aggregate analysis. It is always crucial to keep the measurement system under control and be certain that the obtained data is correct.

Can we separate or remove acidic and basic variants in IEX?
How can we set up a design, or can we remove some variants via washing step, to be similar to the originator molecule in biosimilar studies without a lot of yield loss? Yes, IEX is commonly used in charge-based separation methods. Success of this method is difficult to predict because the elution behavior depends many variables:

  • Sample characteristics, including protein conformation, size, sequence variants, glycosylation, and post-translational modifications
  • Salt (ionic strength) or pH gradients used to elute the protein from the IEX column
  • Method parameters such as column types and other mobile phase components

DoE screening of buffer conditions in a HTPD (high throughput format e.g. 96-well filter plates such as Predictor plates), followed by further DoE screening /optimization in column format has proved to be an efficient strategy.

Is it always an advantage to use DoE when I need to purify my protein?
A part of the DoE method is to determine experimental targets for your purification. If you meet your targets for yield, purity and activity by running a pre-test based on your expertise or what you have found in the literature, that may be sufficient and you can settle for that. However, if you have challenges in setting up a protocol or meeting your purification targets, you need to perform further experiments. Then, DoE can be used as a tool for optimizing your protocol. Ultimately, the use of DoE can minimize the number of experiments needed to find the optimal conditions for your purification.

Why are the Capto MMC and Capto MMC ImpRes contour plots not in alignment in slide 24, considering that the only differences are ligand density and bead size, and bead size should not affect much at batch binding format?
We apologize for this: the reason was purely a graphical issue preparing the presentation. The contour plots should be aligned.

Was screening for load/wash/elution conditions performed using predictor plates?
Yes, we often tend to use 96-multi well filter plates for our screening efforts. They come pre-packed as Predictor plates with different resins.

Do you sometimes use DoE during the screening of the steps?
Yes, DoE is a tool which makes it possible to find the best conditions for protein purification. If you have difficulties obtaining enough purity and/or yield in your protocol, then DoE is a useful tool. However, if you find that the protocol you are using meets your needs, then it may be better to spend time on other challenges that you have in your work. You should always do a trial test and try to zoom in the values of your start parameters. Then you can screen by using DoE and start with the first step.

What is a good column size for a DoE run?
Column size depends on your sample size and the amount needed for further work, such as analysis and/or subsequent use of the sample. The most frequently used column format is 1 ml HiTrap columns, which give reasonable run times as well as product amounts sufficient for the intended use.

Do you recommend, in early phase development, that DoE be applied after having some ideas about what works for the process, i.e. after some single-factor experiments have been performed to find several promising resins?
Yes, that is correct. We often try to determine the most likely scenarios based on previous work, short screening experiments, and trying to answer simple questions with short efficient test runs.

For which ÄKTA systems is DoE available in the software?
The DoE functionality is available in UNICORN versions 6.0 or higher. It is included with ÄKTA avant and can be added to ÄKTA pure.

Is the DoE software add-on directly linked to UNICORN such that it would make method writing easier? In other words, is there a benefit to getting that add-on versus other DoE software?
There are a number of benefits to using the DoE function in UNICORN. In the method editor, you will get support for setting up the experimental plan, and a scouting scheme will be automatically generated to match the plan. The evaluation of the results is done using the integrated DoE function in the Evaluation module. For ÄKTA pure there is a separate UNICORN DoE software license that will give you access to the DoE functionality in UNICORN: for ÄKTA avant this license is included.

Which is preferable for DoE: manual runs or method runs?
To be able to utilize the DoE function in UNICORN, you must perform method runs. The experimental plan will automatically generate scouting protocols to match the factor settings, and the results files will be used for the DoE analysis. Running pre-programmed methods will also ensure that experiments are done in the same way in each run.

Is it possible to include a parameter "total protein concentration" in UNICORN 7.0?
Yes, we often use "user defined" variables based on some external entity in our design setup. We also include a predefined list of the most commonly used variables.

What is the maximum number of parameters that can be easily handled with GE UNICORN software?
The number of factors can be large, and I really do not think the software capability is an issue. More critically, as the number of parameters increases, the number of experiments will also increase, entailing a larger effort in buffer preparations, sample handling and analysis. How this can be handled practically depends on available resources, methods and strategy.

How different is DoE in UNICORN to other software, e.g. Design Expert?
DoE in UNICORN software is built on Modde from Umetrics. Although different software packages essentially do the same thing, there are also differences, mostly in how data is presented and what tools are included in the software.

What software was used for the DoE studies presented in the webinar?
We use Assist software for setting up plate studies, and DoE in UNICORN software for column runs.

Would you like to recommend any primary literature / textbook on DoE?
The DoE handbook from GE will guide you to getting started with Design of Experiments. There are also three video tutorials posted on YouTube explaining the principles of DoE.

What types of designs are available in UNICORN?
UNICORN includes a comprehensive list of screening as well as optimization designs. Please download the DoE handbook or the UNICORN manual for more information.

Is there a sample or a model file that you can share which we can open in UNICORN to play around with an experiment?
Please contact our scientific support team to get access to DoE sample results. Go to, select Support, then Contact us.

Is GE planning any advanced DoE design and analysis webinars?
We always try to respond positively to this type of request. Even if this is not currently planned, we will listen to your request and consider it. Meanwhile, check out the DoE training offerings in our FastTrack training schedules.

If you missed the webinar, it is still available on demand. If you would like to learn more about DoE, I recommend that you download the DoE handbook.