Tải bản đầy đủ - 0 (trang)
9 EXAMPLE: THE APPLICATION OF DOE IN MULTIVARIATE CALIBRATION

9 EXAMPLE: THE APPLICATION OF DOE IN MULTIVARIATE CALIBRATION

Tải bản đầy đủ - 0trang

DK4712_C008.fm Page 322 Saturday, March 4, 2006 1:59 PM



322



Practical Guide to Chemometrics



to encounter applications that have many different components. These can include

active components, inactive additives, dyes, etc., and the concentration levels of

these components can vary widely. In such cases, the method of experimental design

can be used to determine the optimum set of standards required to prepare calibration

models. One of the difficulties in developing a new calibration model is that the

mixtures must be within an appropriate range for the instrumental measurement

method. Using mixtures that are too concentrated gives rise to a nonlinear response

(e.g., in UV spectroscopy, for peaks that no longer obey Beer’s Law) and, as such,

the samples must be diluted. Conversely, using mixtures that are too dilute will cause

a loss in the signal-to-noise ratio, and hence introduce unnecessary noise into the

model. Given these constraints, it is possible to use the method of experimental

design with some reference spectra to build up a simulated calibration set with

proposed standards that are in the correct range for the chosen analytical measurement method, without the need to perform any preliminary experiments. These

simulated calibration data can be used to build calibration models using PLS, etc.

to test that there is sufficient variability in the calibration set to be useful for modeling

purposes. The net result of this approach is that the user can very quickly develop

a calibration model and test it prior to performing any actual experiments, which

thus maximizes productivity and reduces waste. A further advantage of the approach

described here is that one can perform some screening to discover inactive or

nonabsorbing components in the mixtures. The combination of screening, DOE, and

simulation of calibration mixture spectra proposed here can significantly reduce the

resources required for performing the actual measurements.

To illustrate the use of experimental designs in an analytical chemistry application, we will examine a problem taken from the agrochemical industry. The problem

under investigation was to develop a robust calibration model for several commercial

products based on UV spectral measurements. By the term “robust model,” we

assume that the model will be able to give acceptable predictions even if there is

some moderate variation in the controlled and uncontrolled variables.

The successful construction of any calibration model depends to a great extent on

the set of calibration points. Considering the components of the products as independent factors, we can construct the respective factor space in the terms discussed earlier

in this chapter (see Section 8.2.2). The calibration set will consist of points distributed

within this factor space, and the best distribution of these points will be achieved by

employing the experimental-design approach. Provided that the number of significant

factors and that the type of the required regression model are known, we should be

able to construct a successful experimental design. To implement the calibration design

and perform the necessary measurements, we also need to know the boundaries of the

factor space. The following discussion is directed toward these points.

8.9.1.1 Identifying of the Number of Significant Factors

In this example, 12 products, P1 to P12, were to be considered, each consisting of

one to nine components, coded here as C1 to C9. The list of products, their ingredients (components), and the amount of each are listed in Table 8.13. For reasons

of commercial confidentiality, we omitted the actual names of the products and



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 323 Saturday, March 4, 2006 1:59 PM



Response-Surface Modeling and Experimental Design



323



TABLE 8.13

List of Products under Consideration and the

Respective Quantities of the Components

Included in Each Product

P1

C6

C9

C7

C1

C8

C2

C3

C5



P2



100

50

2

24

80

0.4

2

10



C6

C7

C1

C8

C2

C3

C5





P5

C6

C9

C7

C1

C8

C2

C3

C5



C6

C7

C1

C8

C2

C3

C5





P6



120

80

0.5

35

117

0.25

2.5

10



C6

ó

ó

ó

ó

ó

ó





P9

C6

C9

C7

C4



200

0.5

35

116.7

0.25

2.5

5





P3



200

ó

ó

ó

ó

ó

ó





C9

C2

C4





200

0.6

12.5





C6

C7

C1

C8

C2

C3

C5





P7

C6

C9

C7

C1

C2

C3

C4





P10

8.77

8.77

0.07

4.6



200

0.5

35

117

0.25

2.5

1





P4



120

80

0.5

10

0.5

2.5

40





P8

C6

C7

C1

C2

C3

C5

C4





P11

C9

C2

C4





200

0.6

150





200

0.5

35

116.7

0.25

5

10





100

0.5

10

0.5

1

10

40



P12



C6

C9

C7

C4



7.93

7.93

0.128

4.2



ingredients and slightly changed the amount of the ingredients in each product.

These changes will not affect the generality of the approach described here.

Figure 8.25 shows the pure-component UV spectra of all nine components. The

respective concentrations of the solutions used to measure the pure-component

spectra are shown in Table 8.14. It is reasonable to assume that components having

large absorption values in the wavelength range of interest will have considerable

influence on the calibration model. Conversely, inactive components that have very

weak absorption values in the wavelength range of interest will have a weak influence

on the model while introducing some additional noise.

The level of UV absorption was used as a screening tool to divide the components

into two sets, active (Figure 8.26) and inactive (Figure 8.27), with the active components having strong UV absorption signals in the range from 250 to 360 nm and

inactive components having weak or insignificant UV absorption signals in this range.

Removing the UV-inactive components reduced the number of the components

to be considered from nine to five. The products and their corresponding UV-active



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 324 Saturday, March 4, 2006 1:59 PM



324



Practical Guide to Chemometrics



C1

C2

C3

C4

C5

C6

C7

C8

C9



1



Absorbance



0.8



0.6



0.4



0.2



0

200



250

Wavelength, nm



300



350



FIGURE 8.25 UV spectra of all nine pure components.



components are listed in Table 8.15. The number of components present in each

formulation is also shown.

Examining Table 8.15, we observe that products P6, P10, and P11 have only

one UV-active component; P9 and P12 have three UV-active components; P3, P4,

P5, P7, and P8 have four UV-active components; and P1 and P2 have five UV-active

components. Thus four types of experimental designs are needed for 1, 3, 4, and 5

independent (process) variables.



TABLE 8.14

Concentration of Components C1 to C9

Used to Measure Pure-Component Spectra

Concentration No.



Concentration, ppm



C1

C2

C3

C4

C5

C6

C7

C8

C9



10

10

10

10

100

100

10

5

1000



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 325 Saturday, March 4, 2006 1:59 PM



Response-Surface Modeling and Experimental Design



325



C1

C5

C6

C7

C9



1



Absorbance



0.8



0.6



0.4



0.2



0

250

300

Wavelength, nm



200



350



FIGURE 8.26 Spectra of the UV-active components.



8.9.1.2 Identifying the Type of the Regression Model

As the goal is to build a calibration model based on spectral data, we assume that

Beer-Lambert’s law is valid,

p



Aw =



∑ ε c l, w = 1,…



(8.79)



i i



i =1



C2

C3

C4

C8



1



Absorbance



0.8



0.6



0.4



0.2



0

200



250

300

Wavelength, nm



FIGURE 8.27 Spectra of the UV-inactive components.



© 2006 by Taylor & Francis Group, LLC



350



DK4712_C008.fm Page 326 Saturday, March 4, 2006 1:59 PM



326



Practical Guide to Chemometrics



TABLE 8.15

The “Reduced” Product Formulations

after the Removal of the UV-Inactive

Components Shown in Figure 8.27

P1



P2



P3



P4



1

2

3

4

5



C1

C5

C6

C7

C9

P5



1

2

3

4

5



C1

C2

C5

C6

C7

P6



1

2

3

4



C1

C5

C6

C7

ó

P7



1

2

3

4

ó



C1

C5

C6

C7



P8



1

2

3

4



C1

C5

C6

C7

P9



1



C9

ó

ó

ó

P10



1

2

3

4



C1

C6

C7

C9

P11



1

2

3

4



C1

C5

C6

C7

P12



1

2

3



C6

C7

C9



1



C9



1



C9



















1

2

3



C6

C7

C9



where l is the sample cell path length, εi is the extinction coefficient, and ci is the

concentration of the ith component at the wth wavelength. Noting the theoretical

linear nature of the response described in Equation 8.79, we assume that a linear

polynomial model for p independent variables will be adequate; thus, the model

selected for our experimental designs has the following structure

pj



yˆ j = bo +



∑ b x , j = 1,12

i i



(8.80)



i =1



where pj, j = 1, 12 represents the number of the jth product, e.g., p1 = 5, p = 4, etc. As

previously noted, the products fall into one of four categories depending on the number

of UV-active components. Thus, four different types of models are needed, one each

for one, three, four, and five process variables. Respectively, the number of regression

coefficients will be k1 = 2, k2 = 4, k3 = 5, and k4 = 6. The minimum number of points

in the designs for each of these models will be determined by the corresponding number

of regression coefficients.

Having the number and the type of the variables (components) and the type of

the regression model required, we can begin the task of constructing the appropriate

experimental designs. In this project it was decided to use exact D-optimal designs

having Ni = ki + 5 points. The number of the points selected provides sufficient

degrees of freedom to calculate the regression coefficients. The resulting D-optimal

designs are shown in Table 8.16.



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 327 Saturday, March 4, 2006 1:59 PM



Response-Surface Modeling and Experimental Design



327



TABLE 8.16

Catalog of Four Exact D-Optimal Experimental

Designs for the Spectroscopic Calibration Problem

ξ1(1,8)



ξ2(3,9)



x1c

−1

−1

−1

−1

1

1

1

1



x1c,

1,

−1,

−1,

−1,

−1,

1,

1,

1,

1,











x2c, x3c

–1, −1

–1, 1

–1, –1

1, –1

1, 1

1, –1

1, 1

–1, 1

−1, 1







ξ3(4,10)

x1c,

−1,

−1,

−1,

−1,

−1,

1,

1,

1,

1,

1,



x2c, x3c, x4c

–1, –1, –1

1, –1, 1

−1, –1, –1

1, 1, 1

−1, 1, 1

1, 1, –1

−1, –1, 1

1, 1, −1

1, –1, 1

−1, 1, 1





ξ4(5,11)

x1c, x2c,

1, 1,

1, –1,

1, 1,

−1, 1,

1, −1,

−1, −1,

−1, 1,

1, −1,

1, 1,

−1, 1,

−1, –1,



x3c,

−1,

1,

1,

1,

−1,

−1,

1,

1,

–1,

–1,

1,



x4c,

−1,

–1,

1,

1,

–1,

1,

–1,

1,

1,

–1,

–1,



x 5c

−1

1

−1

1

1

−1

1

−1

1

−1

−1



Note: The numbers in parentheses at the top of the table represent,

respectively, the number of variables and the number of measurements.



8.9.1.3 Defining the Bounds of the Factor Space

The coded values for the two levels of the controlled factors in these designs are +1 and

–1, which represent the upper and lower boundaries for each variable. To implement the

designs, we transform these two levels into the real values. By finding the lower and

upper boundaries of the variables for each product, the four generic designs (in coded

values) will be transformed to 12 calibration sets (in real values), one for each product.

To define the boundaries, we assume that the models should be valid over a

working range of up to ±10% of each component’s target value in the formulated

products. Considering each of the product formulations individually, we calculate

the bounds using Equation 8.81,

ximin = 0.90 pi*

ximax = 1.10 pi*



(8.81)



where pi* designates the target value of the ith component, and, ximin and ximax are

the lower and upper bounds, respectively. For example, if the target value of the

ith factor is xic = 200, the respective boundaries will be ximin = 0.9 × 200 = 180

and ximax = 1.1 × 200 = 220. The general formula for the transformation from

coded to natural (real) variables and vice versa is

xic =



© 2006 by Taylor & Francis Group, LLC



xi − xic

.

ximax − xic



DK4712_C008.fm Page 328 Saturday, March 4, 2006 1:59 PM



328



Practical Guide to Chemometrics



TABLE 8.17

List of Components Included in

Product P1, with UV-Inactive

Components Shaded

Component

C6

C9

C7

C1

C8

C2

C3

C5



Quantity

100

50

2

24

80

0.4

2

10



For the example given here, the formula becomes:

−1 =



xi − 200

180 − 200 −20

=

=

220 − 200 220 − 200

20



The reverse transformation is obvious.

The process of translating the coded values to real values is illustrated in detail

for the construction of the calibration set for product P1. The target values for each

component in product P1 are shown in Table 8.17, with UV-inactive components

shown as shaded rows.

By taking the entries of Table 8.17 as the target values, we calculate the respective

upper and lower bounds for each component. The results are shown in Table 8.18.

Now, using the correspondence between the real and coded upper and lower

bounds shown in Table 8.18, we can choose the appropriate design from Table 8.16

and replace the coded entries with the real ones. The set of the calibration points in

coded and real values, constructed using design ξ4 (5,11) (5 variables, 11 measurements), is shown in Table 8.19.



TABLE 8.18

Translation of Coded Factor Levels to

Real Experimental Levels for Product P1

Lower Bound



Upper Bound



Component



Real



Coded



Real



Coded



C6

C9

C7

C1

C5



90

45

1.8

21.6

9



−1

−1

−1

−1

−1



110

55

2.2

26.4

11



1

1

1

1

1



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 329 Saturday, March 4, 2006 1:59 PM



Response-Surface Modeling and Experimental Design



329



TABLE 8.19

Translated Experimental Design for Product

P1



1,

1,

1,

−1,

1,

−1,

−1,

1,

1,

–1,

−1,



1,

–1,

1,

1,

−1,

−1,

1,

−1,

1,

1,

–1,



−1,

1,

1,

1,

−1,

−1,

1,

1,

–1,

−1,

1,



−1, –1

–1, 1

1, –1

1, 1

–1, 1

1, –1

–1, 1

1, –1

1, 1

–1, –1

–1, –1



C6



C9



C7



C1



C5



110

110

110

90

110

90

90

110

110

90

90



55

45

55

55

45

45

55

45

55

55

45



1.8

2.2

2.2

2.2

1.8

1.8

2.2

2.2

1.8

1.8

2.2



21.6

21.6

26.4

26.4

21.6

26.4

21.6

26.4

26.4

21.6

21.6



9

11

9

11

11

9

11

9

11

9

9



8.9.1.4 Estimating Extinction Coefficients

Using the formulations of the calibration set listed in Table 8.19 and the purecomponent spectra measured earlier, we can generate a set of simulated calibration

spectra without performing any experimental work and investigate some important

properties of the calibration set. The first step is to estimate the matrix of extinction

coefficients, E, using the pure-component spectra. Assuming the path length, l = 1,

the ith pure-component spectrum can be represented by

Ai = εi ci* ,



i = 1, mt



(8.82)



where Ai is the vector of measured absorbances for the ith component at concentration ci*, and εi is the respective vector of extinction coefficients. Solving for the

vector of extinction coefficients, εI, gives Equation 8.83.



εi =



Ai

,

ci*



i = 1, mt



(8.83)



The matrix of extinction coefficients for the components of product P1 can be

assembled by arranging the vectors of extinction coefficients into the rows of

EP1 = [ε1 , ε5 , ε6 , ε 7 , ε9 ] . The matrix of concentrations, C, for the subset of active

species in product P1 is given in the right-hand side of Table 8.19, or in matrix form,

C = [C1, C5, C6, C7, C9]. According to the Beer-Lambert law in Equation 8.79, the

product of these two matrices gives the matrix of simulated mixture spectra, A, for

the calibration set, where the path length, l, is assumed equal to 1.

A = CE



(8.84)



Figure 8.28 shows the predicted calibration spectra listed in Table 8.19 for product P1.



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 330 Saturday, March 4, 2006 1:59 PM



330



Practical Guide to Chemometrics



10

9

8

Absorbance



7

6

5

4

3

2

1

0

200



250

300

Wavelength, nm



350



FIGURE 8.28 Predicted calibration spectra for product P1.



For calibration work in the UV range, spectra should have a maximum absorbance less than 1 for Beer’s law to be obeyed and to obtain good linear response.

Clearly, since this condition does not hold for the simulated calibration spectra shown

in Figure 8.28, a simple dilution of the calibration samples should be performed

before measuring their UV spectra. This is applicable to any sample type, as the

dilution only affects the analysis method and not the final result.



8.9.2 IMPROVING QUALITY



FROM



HISTORICAL DATA



As was mentioned previously, in process analytical applications we are usually

limited in how we do experiments and collect data. Sometimes we are not able to

adjust the controlled factors of a process according to the principles of experimental design because it would cause production of product that fails to meet

quality standards. In such cases, the only option is to measure the process and

deal with the data as received. Experiments performed in this manner are called

passive experiments. The values of the measured variables change according to

normal variation in the production process. This can cause correlation in the

measurements, which in turn can affect the numerical stability of fitting regression

models. In cases where it is desirable to achieve on-line or at-line control with a

regression model derived from measurements of the process, a procedure is needed

to avoid making unnecessary measurements and improve the accuracy of the

resulting models.

As a practical example, we consider data provided by BP Amoco from their

naphtha processing plant in Saltend, Hull, U.K. Briefly, naphtha is a mixture of

hydrocarbons and aromatics. The most important components in the feedstock are

naphthalenes and aromatics. Periodically, samples are collected. The near-infrared

(NIR) spectra of these samples are measured, and the amount of naphthalenes and



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 331 Saturday, March 4, 2006 1:59 PM



Response-Surface Modeling and Experimental Design



331



aromatics is measured by gas chromatography (GC). A calibration model is

constructed using PCR or PLS (see Chapter 6), and the predicted values of naphthalene and aromatic content are used to control the process. Here the goal is to

replace costly, time-consuming off-line GC measurements with rapid, on-line NIR

measurements. It is possible to collect hundreds or even thousands of NIR spectra

at relatively low cost, while it would be cost prohibitive to perform GC analysis on

each one. By analysis of the design matrix, X, which can be cheaply and quickly

measured, we can select a small subset of samples for GC analysis that will give an

optimal design, thus minimizing the time and expense of performing GC analysis

while maximizing the information we gather as well as the performance of the

regression model that we will build from these measurements.

Once the initial model is developed and placed on-line, a large historical database

of measurements and predictions can be accumulated. If the process or the measurement instrument drifts over time, the usual practice is to recalibrate the NIR model

periodically by collecting new plant samples and performing NIR and GC measurements. To avoid performing costly GC analysis on a large set of samples during

normal process operation, some method of using the inexpensive NIR data is needed

to select the most informative samples for off-line GC analysis. The resulting historical

data can be used in this way to augment the original experimental design with

maximum information and minimum effort. As a side effect, better performance of

the calibration model could be expected.

Following commonly accepted terminology, X represents an N × m data matrix

of NIR spectra with N rows (samples) measured at m variables. The predicted value

yˆi , i = 1… N of the response (naphthalene content or aromatic content) yi, i = 1, …, N

can be estimated using some appropriate form of a regression model,

k



yˆi =



∑ b f (x), i = 1, …, N

j j



(8.85)



j =1



By applying regression analysis, a k × 1 vector of the regression coefficients, b,

is calculated using the formula in Equation 8.86.

b = (FT F)−1 FT y



(8.86)



Using the notation of experimental design, F represents the extended design matrix,

where the elements of its k × 1 row-vectors, f, are known functions of x. The matrix

(FTF) is the Fisher information matrix and its inverse, (FTF)−1, is the dispersion

matrix of the regression coefficients.

As previously noted, in a typical process analytical application, the measured

data set might consist of spectral data recorded at a number of wavelengths much

higher than the number of samples. The rank, R, of the measured matrix of spectra

will be equal to or smaller than the number of the samples N. This causes rank

deficiency in X, and the direct calculation of a regression or calibration model by

use of the matrix inverse using Equation 8.85 and Equation 8.86 is problematic.



© 2006 by Taylor & Francis Group, LLC



DK4712_C008.fm Page 332 Saturday, March 4, 2006 1:59 PM



332



Practical Guide to Chemometrics



This problem can be solved using the multivariate calibration approach of

principal component regression (PCR) or partial least squares (PLS), described in

Chapter 6. In PCR, the matrix of spectra is decomposed into a matrix of principal

component scores, S, consisting of the vectors [s1, s2, …, sR], and loadings, P,

consisting of the eigenvectors [p1, p2,…, pR] of X [32]. During the process of

principal component analysis, we retain an appropriate number of principal components (latent variables), i.e., those that describe statistically significant variation of

the data. By deleting eigenvectors and scores associated with undesirable noise, a

new matrix, X′, is calculated

X′ = s1p1T + s 2 pT2 +,



, + s pc pTpc ,



pc ≤ R



(8.87)



so that the rank deficiency problem is resolved.

At the core of this approach is the improvement of the condition number of the

data matrix, X. The condition number of a matrix, cond(X), is the ratio of the largest

and smallest eigenvalue of X. It takes on values from 1 to +infinity, and can be used

as a measure of the numerical stability with which the inverse of X can be computed.

Values in the range from 1 to 1000 usually indicate that the matrix inverse calculation

will be very stable. In the limit, as the smallest eigenvalue of X goes to zero, cond(X)

tends toward infinity, indicating that matrix X is singular, i.e., it has a determinant

equal to zero, in which case the corresponding regression problem is rank deficient

and the inverse of X does not exist. When the condition number of X is extremely

large, the matrix X is close to being singular, which means computation of its inverse

will be numerically unstable. In PCA and PCR, the rank-deficiency problem is solved

by transforming the original variable space into PCA space and deleting principal

components corresponding to the smallest, closest to zero, eigenvalues. The result

is that the condition number of the new matrix X′ is better (lower) than the condition

number of the original matrix, X.



cond (X) =



λ1

λ

> cond (X′) = 1 ; λ R < λ pc

λR

λ pc



(8.88)



Finally, we turn to the problem of selecting the best experimental design, i.e.,

a subset of samples for passive experiments, as was outlined in the naphtha example.

To construct an optimal design that is robust against ill conditioning of the design

matrix, X, we use the E-optimality criterion. A design is E-optimal if it minimizes

the maximum eigenvalue of the dispersion matrix, M−1 = (FTF)−1. The name of the

criterion originates from the first letter of the word “eigenvalue.”

max δ i (M* )−1  = min max δ i [(M)−1 ] , i = 1, … , R,

i



x



i



(8.89)



where δi[X] represents the eigenvalues of X, and R designates the rank of the

dispersion matrix.



© 2006 by Taylor & Francis Group, LLC



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

9 EXAMPLE: THE APPLICATION OF DOE IN MULTIVARIATE CALIBRATION

Tải bản đầy đủ ngay(0 tr)

×