Application of Dynamic Light Scattering (DLS) to Protein Therapeutic Formulations: Principles, Measurements and Analysis - 3. DLS Deconvolution Algorithms

The basic types of DLS deconvolution algorithms used to extract the intensity weighted particle size distribution from the measured correlogram.

A Malvern Instruments' Bioscience Development Initiative

Executive Summary

Dynamic light scattering (DLS) is an analytical technique used to measure the particle size distribution of protein formulations across the oligomer and sub-micron size ranges of approximately 1 nm to 1 µm. The popularity of DLS within the biopharmaceutical industry is a consequence of its wide working size and extended sample concentration ranges, as well as its low volume requirements. With that said, the challenge that remains with the application of DLS to protein therapeutic formulations is centered around data interpretation. In this four-part white paper series, common issues and questions surrounding the principles, measurements and analysis of DLS data are discussed in order to help minimize the time required for and complexity of acquiring and interpreting DLS data that is critical throughout the development process. In this third white paper of the series, we cover the basic types of DLS deconvolution algorithms used to extract the intensity weighted particle size distribution from the measured correlogram.

Dynamic Light Scattering

Dynamic light scattering (DLS) is an analytical technique used within bioapplications to measure particle size distributions across the oligomer and sub-micron size ranges. In a DLS measurement, scattering intensity fluctuations are correlated across small time spans, yielding a correlogram. The distribution of particle diffusion coefficients is extracted from the measured correlogram using various deconvolution algorithms, each of which can yield different results. Identifying the proper algorithm, and hence the correct distribution of diffusion coefficients, requires a basic understanding of the operational limitations of the algorithm.

DLS Correlogram

For particles in solution moving under the influence of Brownian motion, the measured scattering intensity will fluctuate with time, producing an intensity trace which appears to represent random fluctuations about a mean value. The signal fluctuation rate will depend upon the rate of change of the position of the particles, with slow moving particles leading to slow fluctuations and fast moving particles leading to fast fluctuations.

Correlation is a 2^nd order statistical technique for measuring the degree of non-randomness in an apparently random data set. When applied to a time-dependent intensity trace, the intensity correlation coefficients, G₂(τ), are calculated as shown below, where τ is the delay time and the 2 subscript in G2 indicates the intensity rather than the field autocorrelation.

For direct application, the correlation equation can be expressed as the summation shown below and detailed in Figure 1.

Figure 1: Schematic detailing measurement and construction of the DLS correlogram.

Typically, the correlation coefficients are normalized, such that G₂(∞) = 1. For monochromatic laser light, this normalization imposes an upper correlation curve limit of 2 for G₂(τ₀) and a lower baseline limit of 1 for G₂(∞). In practice, the theoretical upper limit can only be achieved in carefully optimized optical systems. Typical experimental upper limits are around 1.8 to 1.9 for G₂ or 0.8 to 0.9 for G₂ - 1, which is what is usually displayed in DLS correlogram figures (Figure 2).

Figure 2: DLS measured correlograms for 6 nm ovalbumin and 95 nm silicon dioxide in PBS.

Deconvolution

The scattering intensity fluctuations measured during a DLS experiment are a manifestation of fluctuations in the electric field generated by the ensemble collection of solution particles. The electric field fluctuations are a consequence of the superposition of fields (or waves) generated by each of the scattering particles as they diffuse through the solution. So information regarding particle motion is contained within the electric field function, as indicated in the field autocorrelation expression (G₁) given below, where E is the field function, Γ is the decay rate, D is the mean diffusion coefficient, q is the scattering vector, λ₀ is the vacuum laser wavelength, ñ is the medium refractive index, θ is the scattering angle, and the 1 subscript in G₁ indicates the field autocorrelation.

The intensity, which is the actual parameter measured in a light scattering experiment, is equivalent to the square of the field (I = E²), with the respective autocorrelation functions related to each other through the Seigert relationship, where γ is a coherence factor expressing the efficiency of the photon collection system.

So the Seigert relationship can be used to deconvolute particle motion information from the DLS measured intensity autocorrelation function.

Cumulants Analysis

Cumulants analysis is the ISO recommended approach for extracting the mean or Z average size from a DLS measured correlogram. In the cumulants analysis, a single particle size family is assumed and the correlogram is fitted to a single exponential as shown below, where A is the amplitude or intercept of the correlation function, B is the baseline, and Γ is the correlation decay rate.

The exponential fitting expression is expanded to account for polydispersity or peak broadening effects and then linearized, as shown below, where the 1^st moment (a₁) is equivalent to the decay rate (Γ) and the 2^nd moment (a₂) is proportional to the distribution width (µ₂).

As shown earlier, the decay rate is related to the mean diffusion coefficient (D), which facilitates determination of the mean hydrodynamic size using the Stokes-Einstein relationship, where k is the Boltzmann constant, T is the absolute temperature, η is the viscosity of the medium, and R_H is the hydrodynamic radius. The mean size determined from the cumulants analysis is described as the Z (or intensity weighted) average.

As noted above, the 2^nd moment from the cumulants analysis is related to the width of the distribution. That relationship is given in the expression below, where PdI is the polydispersity index and σ is the standard deviation of a hypothetical Gaussian distribution centered on the Z average size.

The cumulants analysis is unique from other DLS algorithms in that it yields only a mean size and polydispersity index, with no additional information regarding the modality of the distribution. While it is not uncommon to see hypothetical Gaussian distributions derived from the mean and PdI in print, it is not recommended, due to the potential for misinterpretation. If distribution information is desired, the best approach is to utilize one of the NNLS algorithms, all of which produce a full particle size distribution.

NNLS Algorithms

The DLS particle size distribution is derived from the measured correlogram using a non-negatively constrained least squares (NNLS) fitting algorithm. While there are subtle differences between the various algorithms, all use a similar non-negative least squares fitting approach.

In contrast to cumulants analysis, which assumes an ideal field correlation function of identical diffusing spheres and fits the measured intensity correlation curve to a single exponential, NNLS algorithms make no assumption with regard to the number of particle families represented in the intensity correlation function. The challenge, of course, is in the identification of "how many" particle families or decays are actually present. In mathematical circles this is viewed as an ill-posed problem, since relatively small amounts of noise can significantly alter the solution of the integral equation and hence the number of predicted size families.

The fitting function, G₁^Fit, for the field autocorrelation function can be represented as a summation of single exponential decays, where the factor A_i is the area under the curve for each exponential contribution, and represents the strength of that particular i^th exponential function.

The best fit is found by minimizing the deviation of the fitting function from the measured data points, where a weighting factor σ is incorporated to place more emphasis on the strongly correlated, rather than the low correlation (and noisy), data points.

The weighting factor is proportional to the correlation coefficient, e.g. the correlation function value at a given τ value. As mentioned above, this serves to give more weight to highly correlated data points. As an example, consider the correlation curve shown in Figure 3. As evident in the inset view, there is experimental noise in the baseline. In the absence of a weighting factor, this noise could be interpreted as 'decays' arising from the presence of very large particles. So the weighting function provides one means of addressing experimental noise in the ill-posed deconvolution problem.

Figure 3: Example autocorrelation function highlighting baseline noise.

Identifying the solution of A_i's in the G₁^Fit fitting expression is accomplished by minimizing the deviation in ξ² with respect to each A_i, and then solving the resulting system of equations.

While the details are left to more advanced texts, the standard procedure for solving the above equation is to construct the solution as a linear combination of eigenfunctions. With that said, when the eigenvalues are small, a small amount of noise can make the solution (or the number of Γ_i) extremely large, hence the previous ill-posed problem classification. To mitigate this problem, a stabilizer (α) is added to the system of equations. This parameter is called the regularizer, and with its incorporation, we are performing a first order regularization of the linear combination of eigenfunctions in the deconvoluted solution.

The above expression is called a first order regularization because the first derivative (in A_i) is added to the system of equations. The alpha (α) parameter or regularizer determines how much emphasis we put on this derivative. In other words, it defines the degree of smoothness in the solution. If α is small, it has little influence and the solution can be quite 'choppy'; whereas a larger α will force the solution to be very smooth.

In addition to the smooth solution constraint, NNLS algorithms also require that the solution be physical, i.e. all A_i > 0. With these constraints, Z is minimized by requiring that the first derivatives with respect to A_i be zero. As indicated previously, this minimization corresponds to solving a system of linear equations in A_i. The solution of A_i values is found using an iterative approach called the gradient projection method.

The normalized display of A_i vs. R_i (or A_i vs. diameter) is called the intensity particle size distribution. Mean peak sizes are intensity weighted average values, and are obtained directly from the size histogram using the following expression:.

The peak width or standard deviation (σ), indicative of the unresolved distribution in the peak itself, can also be obtained directly from the particle size distribution histogram:

Figure 4 shows an example of a DLS-derived intensity weighted particle size distribution for a 60 nm latex standard, calculated using an NNLS type algorithm, along with the mean and standard deviation of the DLS peak.

Figure 4: Example DLS results for 60 nm latex, in histogram format.

What's The "Best" NNLS Algorithm?

A common question from users of DLS instrumentation is "what is the best NNLS algorithm?". Intuitively, one might think that the obvious best method for fitting the correlogram would be to use an iterative approach until the sum of squares error is minimized. For a perfect noise free correlation function, this approach would be ideal. But in practice, there is no such thing as a perfect noise free correlogram, and minimizing the sum of squares error in the presence of noise can lead to erroneous results, with no reproducibility and minimal validity. So the question, "what's the best NNLS algorithm", is a good one.

The problem here is that the answer depends very much on the type of sample being analyzed, the working size range of the instrument being used, and most importantly, the level of noise in the measured correlogram. There are a variety of named NNLS type algorithms available to light scattering researchers, either through the web or through the collection of DLS instrument vendors. While the algorithms are all NNLS based, what generally makes them unique is the locking of certain variables, such as the weighting factor or the regularizer, in order to optimize the algorithm for a given set of instrument and sample conditions. Some examples of named algorithms include:

CONTIN

The CONTIN algorithm was originally written by Steven Provencher and has become the industry standard for general DLS analysis. CONTIN is considered to be a conservative algorithm, in that the choice of the alpha (α) parameter controlling the smoothness of the distribution assumes a moderate level of noise in the correlogram. As a consequence, particle distribution peaks which are close in terms of size tend to be blended together in a CONTIN-derived size distribution. See http://s-provencher.com/pages/contin.shtml, for additional information.

Regularization

The Regularization algorithm, written by Maria Ivanova, is a more aggressive algorithm which is optimized for dust-free small particle samples, such as pure proteins and micelles. The Regularization algorithm utilizes a small α parameter, thereby assuming a low level of noise in the measured correlogram. As a consequence, Regularization derived distributions tend to have sharper peaks. However, this low noise estimate can lead to phantom (nonexistent) peaks if noise is present in the correlogram.

GP & MNM

The GP and MNM algorithms, distributed with the Zetasizer Nano instrument, are general NNLS algorithms that have been optimized for the wide range of sample sizes and concentrations suitable for measurement with the Nano system. The GP (General Purpose) algorithm is conservative, with a moderate estimate of noise, and is suitable for milled or naturally-occurring samples. The MNM (Multiple Narrow Mode) algorithm is more aggressive, with a lower noise estimate, and is better suited for mixtures of narrow polydispersity particles such as latices and pure proteins.

REPES & DYNALS

The REPES and DYNALS algorithms are available for purchase through various internet sites. Both are similar to the industry standard CONTIN, although more aggressive with regard to noise estimates.

There are two primary parameters that are varied in NNLS algorithms: the weighting factor and the alpha parameter (regularizer). The table below shows a comparison of the default values of these two parameters for some of the algorithms cited above. Note that the algorithms listed in the table are listed in order of increasing aggressiveness.

Algorithm	Weighting scheme	α Parameter
CONTIN	Quartic	Variable
General Purpose	Quadratic	0.01
Multiple narrow mode	Quadratic	0.001
Regularization	Quadratic	0.00002

Data Weighting

As described earlier, data weighting in the DLS deconvolution algorithm places emphasis on the larger and more significant correlation coefficients rather than the less important smaller baseline values. Figure 5 shows the effects of data weighting on a DLS correlogram for 1 mg/mL lysozyme, after filtration through a 20 nm Anotop filter. As shown in this figure, the weighting serves to stretch out the correlogram along the Y axis. In the absence of data weighting, noise in the baseline can lead to the appearance of ghost or noise peaks.

Figure 5: Comparison of quadratic and quartic weighting on the measured correlogram for a 1 mg/mL lysozyme sample, after filtration through a 20 nm filter, along with the resultant size distributions derived using the Malvern General Purpose algorithm.

Alpha (α) Parameter Or Regularizer

The Regularizer or α parameter in NNLS based deconvolution algorithms controls the acceptable degree of spikiness in the resultant distribution. Large α values generate smoother, less resolved distributions, whereas smaller alpha values generate more spiky distributions, with an appearance of better resolution. The α parameter then, can be loosely described as an estimate of the expected level of noise in the measured correlogram.

There is no ideal or best alpha parameter. The appropriate value depends upon the sample being analyzed. For mixtures of narrow mode (low polydispersity) and strongly scattering particles, decreasing the α parameter can sometimes enhance the resolution in the intensity particle size distribution. Consider Figure 6 for example, which shows the distribution dependence on the α parameter for a mixture of 60 nm and 220 nm latexes. The results derived using the default regularizer for the Malvern General Purpose and Multiple Narrow Mode algorithms, r = 0.01 and 0.001 respectively, are noted for comparison.

Figure 6: Intensity particle size distribution dependence on the α parameter for a mixture of 60 and 220 nm latex particles.

As evident in the above figure, a decrease in the α parameter leads to an increase in both the number of resolved modes and the sharpness of the peaks. It is also instructive to note that once baseline resolution is achieved, the resultant sizes (peak positions) are independent of the α value, with only the apparent width of the peaks changing with further changes in the regularizer.

The influence of the α parameter on the measured size distribution for a monomodal 220 nm latex sample is shown in Figure 7. As with the latex mixture, reduction of the regularizer has no influence on the measured particle size, and serves only to decrease the apparent polydispersity of the peak, i.e. decrease the peak width.

Figure 7: Intensity particle size distribution dependence on the α parameter for a monomodal 220 nm latex sample.

In the previous two examples for low polydispersity latices, the more aggressive algorithms utilizing smaller α values generated results that were consistent with the sample properties. For samples that are not composed of narrow mode particle families however, aggressive reduction in the α parameter can lead to over-interpretation of the measured data, and the generation of more modes or peaks than are actually present in the sample.

Figure 8 shows the influence of the α parameter on the resultant size distribution for a dilute protein sample. The monomeric protein has a known hydrodynamic diameter of 3.8 nm. Under the conditions employed here, the protein is also known to exist as a mixture of low order oligomers. As evident in this figure, the intensity weighted mean size of the sample is independent of the α parameter selected, and is consistent with the expected average size of an oligomeric mix. If the General Purpose algorithm is selected, with an α value of 0.01, the peak width is also representative of the expected polydispersity for a mix of protein oligomers (~ 25-30%). Over reduction of the α parameter (< 0.01) however, generates a phantom peak at circa 2 nm, and leads to the erroneous conclusion that the sample is composed of only two particle sizes, one of which is much smaller than the monomer itself.

Figure 8: Influence of the α parameter on the resultant size distribution for a 0.3 mg/mL lysozyme sample in PBS at pH 6.8.

The results shown in Figure 9 represent another example where the less aggressive α value for the Malvern General Purpose algorithm is the appropriate value to use in the generation of the particle size distribution. In the absence of stabilizing agents, hemoglobin (Hb) denatures and aggregates at temperatures >38°C. When the protein denatures, the aggregates formed are random in size, with no specificity, i.e. very polydisperse. As such, the distribution best representative of the actual sample is that generated using the Malvern General Purpose algorithm, with an α value of 0.01. Reduction of the α parameter to values < 0.01 leads to the generation of two apparently unique size classes in the 300 nm & 800 nm regions that are inconsistent with the actual properties of the sample.

Figure 9: Influence of the α parameter on the resultant size distribution for denatured hemoglobin at 44 C in PBS buffer.

Multiple Solutions (CONTIN)

CONTIN is unique among DLS algorithms, in that it generates a collection of solutions, each with a set of qualifying descriptors. The descriptors used to identify the most probable solution are 1) the number of peaks, 2) the degrees of freedom, 3) the α parameter, and 4) the probability to reject. The most probable solution is selected using the principle of parsimony, which states that after elimination of all solutions inconsistent with a priori information, the best solution is the one revealing the least amount of new information.

Figure 10 shows a comparison of the CONTIN generated solution set for the 60 nm and 220 nm latex mixture discussed earlier. As seen in this figure, one of the solutions (CONTIN 1) is consistent with the results generated using the Malvern Multiple Narrow Mode algorithm (α = 0.001) and is a good representation of the actual sample. The CONTIN determined most probable solution however, is CONTIN 6, which shows a blending of the populations to form a single peak of high polydispersity.

Figure 10: Comparison of the CONTIN generated solution set of size distributions for the 60 nm and 220 nm latex mixture.

In comparison to the Malvern General Purpose and Multiple Narrow Mode algorithms, CONTIN tends to be more conservative than the GP algorithm. While this works well for noise recognition and management (dilute protein in Figure 11), it can also lead to a reduction in apparent particle size resolution (latex mixture in Figure 11) for mixtures.

Figure 11: Comparison of CONTIN (▬), General Purpose (▬), and Multiple Narrow Mode (▬) results for a mixture of 60 nm and 220 nm latices and a dilute protein (0.3 mg/mL lysozyme) sample.

To finally address the question of "what is the best DLS algorithm?", the truthful answer is that there is no best algorithm. Each of the algorithms give useful information to the researcher. The best approach is to couple what you know with what you suspect about the sample, compare results from various algorithms, recognizing the strengths and limitations of each, and then look for robustness and repeatability in the results. In other words, if multiple measurements all indicate a shoulder in a wide peak, which resolves itself into a unique repeatable population upon application of a more aggressive algorithm, the chances are strong that the this unique population is real. If repeat measurements generate inconsistencies, then it is best to err on the side of a more conservative algorithm, such as the Malvern General Purpose or CONTIN.

Additional Reading

Braginskaya, Dobitchin, Ivanova, Klyubin, Lomakin, Noskin, Shmelev, & Tolpina "Analysis of the polydispersity by PCS. Regularization procedure", Phys. Scr. 1983, 28, 73.

Frisken "Revisiting the method of cumulants for the analysis of dynamic light scattering data", Applied Optics 2001, 40(24), 4087-4091.

ISO 13321 "Particle size analysis: Photon correlation spectroscopy", 1996.

Liu, Arnott, & Hallett "Particle size distribution retrieval from multispectral optical depth: Influences of particle nonsphericity and refractive index", Journal of Geophysical Research 1999, 104(D24), 31753.

Pecora "Dynamic Light Scattering: Applications of Photon Correlation Spectroscopy", Plenum Press, 1985.

Provencher "Contin: a general purpose constrained regularization program for inverting noisy linear algebraic and integral equations", Computer Physics Communications 1982, 27, 229-242.

Provencher "A constrained reqularization method for inverting data represented by linear algebraic or integral equations", Computer Physics Communications 1982, 27, 213-227.

Schmitz "An Introduction To Dynamic Light Scattering By Macromolecules", Academic Press, New York, 1990.

Stepanek "Data analysis in dynamic light scattering" in Light scattering: principles and development, Ed: Brown, Pub: Clarendon Press, Oxford, 1996, 177-241.

"Application of dynamic light scattering (DLS) to protein therapeutic formulations: part I - Basic Principles". Inform White Paper. Malvern Instruments Limited.

"Application of dynamic light scattering (DLS) to protein therapeutic formulations: part II - concentration effects and particle interactions". Inform White Paper. Malvern Instruments Limited.

"Application of dynamic light scattering (DLS) to protein therapeutic formulations: part IV - frequently asked questions". Inform White Paper. Malvern Instruments Limited.

"A basic guide to particle characterization". Inform White Paper, Malvern Instruments Limited.

"Developing a bioformulation stability profile". Inform White Paper, Malvern Instruments Limited.

About Malvern's Bioscience Development Initiative

Malvern Instruments' Bioscience Development Initiative was established to accelerate innovation, development, and the promotion of novel technologies, products, and capabilities to address unmet measurement needs in the biosciences market.