Tải bản đầy đủ - 0 (trang)
1 INTRODUCTION: GENERAL CONCEPT, AMBIGUITIES, RESOLUTION THEOREMS

1 INTRODUCTION: GENERAL CONCEPT, AMBIGUITIES, RESOLUTION THEOREMS

Tải bản đầy đủ - 0trang

DK4712_C011.fm Page 419 Thursday, March 16, 2006 3:40 PM



Multivariate Curve Resolution



419



In the resolution of any multicomponent system, the main goal is to transform

the raw experimental measurements into useful information. By doing so, we aim

to obtain a clear description of the contribution of each of the components present

in the mixture or the process from the overall measured variation in our chemical

data. Despite the diverse nature of multicomponent systems, the variation in their

related experimental measurements can, in many cases, be expressed as a simple

composition-weighted linear additive model of pure responses, with a single term

per component contribution. Although such a model is often known to be followed

because of the nature of the instrumental responses measured (e.g., in the case of

spectroscopic measurements), the information related to the individual contributions

involved cannot be derived in a straightforward way from the raw measurements.

The common purpose of all multivariate resolution methods is to fill in this gap and

provide a linear model of individual component contributions using solely the raw

experimental measurements. Resolution methods are powerful approaches that do

not require a lot of prior information because neither the number nor the nature of

the pure components in a system need to be known beforehand. Any information

available about the system may be used, but it is not required. Actually, the only

mandatory prerequisite is the inner linear structure of the data set. The mild requirements needed have promoted the use of resolution methods to tackle many chemical

problems that could not be solved otherwise.

All resolution methods mathematically decompose a global instrumental

response of mixtures into the contributions linked to each of the pure components

in the system [1–10]. This global response is organized into a matrix D containing

raw measurements about all of the components present in the data set. Resolution

methods allow for the decomposition of the initial mixture data matrix D into the

product of two data matrices C and ST, each of them containing the pure response

profiles of the n mixture or process components associated with the row and the

column directions of the initial data matrix, respectively (see Figure 11.2). In matrix

notation, the expression for all resolution methods is:

D = CST + E



(11.1)



where D (r × c) is the original data matrix, C (r × n) and ST (n × c) are the matrices

containing the pure-component profiles related to the data variation in the row

direction and in the column direction, respectively, and E (r × c) is the error matrix,

i.e., the residual variation of the data set that is not related to any chemical contribution. The variables r and c represent the number of rows and the number of

columns of the original data matrix, respectively, and n is the number of chemical

components in the mixture or process. C and ST often refer to concentration profiles

and spectra (hence their abbreviations and the denomination we will adopt often in

this chapter), although resolution methods are proven to work in many other diverse

problems [13–20].

From the early days in resolution research, the mathematical decomposition of

a single data matrix, no matter the method used, has been known to be subject to

ambiguities [1, 2]. This means that many pairs of C- and ST-type matrices can be

found that reproduce the original data set with the same fit quality. In plain words,



© 2006 by Taylor & Francis Group, LLC



DK4712_C011.fm Page 420 Thursday, March 16, 2006 3:40 PM



420



Practical Guide to Chemometrics



Mixed information



Pure component information

s1

sn



=

c1

D



ST



cn



Absorbance



C



Rete



ntion



time



s



hs

ngt

vele

a

W



Retention times

Pure concentration profiles



Wavelengths

Pure signals



FIGURE 11.2 Resolution of a multicomponent chromatographic HPLC-DAD run (D matrix)

into their pure concentration profiles (C matrix, chromatograms) and responses (ST matrix,

spectra) [10].



the correct reproduction of the original data matrix can be achieved by using component profiles differing in shape (rotational ambiguity) or in magnitude (intensity

ambiguity) from the sought (true) ones [21].

These two kinds of ambiguities can be easily explained. The basic equation

associated with resolution methods, D = CST, can be transformed as follows:

D = C (T T−1) ST



(11.2)



D = (CT) (T−1 ST)



(11.3)



D = C′ S′T



(11.4)



where C¢ = CT and S¢T = (T−1 ST) describe the D matrix as correctly as the true

C and ST matrices do, though C¢ and S¢T are not the sought solutions. As a result

of the rotational ambiguity problem, a resolution method can potentially provide as

many solutions as T matrices can exist. Often this may represent an infinite set of

solutions, unless C and S are forced to obey certain conditions. In a hypothetical

case with no rotational ambiguity, that is, in the case where the shapes of the profiles

in C and S are correctly recovered, the basic resolution model could still be subject

to intensity ambiguity, as shown in Equation 11.5

n



D=







i



i =1



© 2006 by Taylor & Francis Group, LLC



1



∑  k c  ( k s )

i



i



T

i



(11.5)



DK4712_C011.fm Page 421 Thursday, March 16, 2006 3:40 PM



Multivariate Curve Resolution



421



where ki are scalars and n refers to the number of components. Each concentration

profile of the new C¢ matrix (Equation 11.4) would have the same shape as the

real one, but it would be ki times smaller, whereas the related spectra of the new

S¢T matrix (Equation 11.4) would be equal in shape to the real spectra, though ki

times more intense.

The correct performance of any curve-resolution (CR) method depends strongly

on the complexity of the multicomponent system. In particular, the ability to correctly

recover dyads of pure profiles and spectra for each of the components in the system

depends on the degree of overlap among the pure profiles of the different components

and the specific way in which the regions of existence of these profiles (the so-called

concentration or spectral windows) are distributed along the row and column directions of the data set. Manne stated the necessary conditions for correct resolution

of the concentration profile and spectrum of a component in the 2 following

theorems [22]:

1. The true concentration profile of a compound can be recovered when all

of the compounds inside its concentration window are also present outside.

2. The true spectrum of a compound can be recovered if its concentration

window is not completely embedded inside the concentration window of

a different compound.

According to Figure 11.3, the pure concentration profile of component B can

be recovered because A is inside and outside B's concentration window; however,

B's pure spectrum cannot be recovered because its concentration profile is totally

embedded under the major compound, A. Analogously, the pure spectrum of A can

be obtained, but not the pure concentration profile because B is present inside its

concentration window, but not outside.

The same formulation of these two theorems holds when, instead of looking at

the concentration windows in rows, the “spectral” windows in columns are considered. In this context, the theorems show that the goodness of the resolution results

depends more strongly on the features of the data set than on the mathematical

background of the CR method selected. Therefore, a good knowledge of the properties of the data sets before carrying out a resolution calculation provides a clear

idea about the quality of the results that can be expected.

A



B



FIGURE 11.3 Concentration profiles for a two-component system (see comments in text

related to resolution).



© 2006 by Taylor & Francis Group, LLC



DK4712_C011.fm Page 422 Thursday, March 16, 2006 3:40 PM



422



Practical Guide to Chemometrics



11.2 HISTORICAL BACKGROUND

The field of curve resolution was born in response to the need for a tool to analyze

multivariate experimental data from multicomponent dynamic systems. The common

goal of all curve-resolution methods is to mathematically decompose the global

instrumental response into the pure-component profiles of each of the components

in the system. The use of these methods has become a valuable aid for resolving

complex systems, especially when obtaining selective signals for individual species

is not experimentally possible, too complex, or too time consuming.

Two pioneering papers on curve resolution were published by Lawton and Sylvestre early in the 1970s [1, 2]. In particular, a mixture analysis resolution problem was

described in mathematical terms for the case of a simple two-component spectral

mixture. Interestingly, several concepts introduced in these early papers were the

precursors of the ideas underlying most of the curve-resolution methods developed

afterward. For instance, the concept of pure-component solutions as a linear combination of the measured spectra and vice versa was presented; the concept of a

subspace spanned by “true” solutions in relation to the subspace spanned by PCA

(principal component analysis) solutions was presented; and the concept of a range

or band of feasible solutions, and how to reduce the width of this band by means

of constraints, such as nonnegativity and closure (mass balance) equations, was

presented. Later on, these ideas were reformulated more precisely using the concepts

of rotational and intensity ambiguities [23], which are found ubiquitously in all

factor-analysis matrix bilinear decomposition methods.

The extension of Lawton and Sylvestre’s curve resolution from two- to threecomponent systems was presented by Borgen et al., [3, 4] focusing on the optimization of ranges of feasible solutions. At the same time, the first edition of

Malinowski's book [24] Factor Analysis in Chemistry appeared [25], which presented

a review of updated concepts and applications. In a way, Malinowski’s book could be

considered for many researchers in this field as the consolidation of the incipient subject

of chemometrics, at a time when this term was still not widely accepted.

The main goal of factor analysis, i.e., the recovery of the underlying “true”

factors causing the observed variance in the data, is identical to the main goal of

curve-resolution methods. In factor analysis, “abstract” factors are clearly distinguished from “true” factors, and the key operation is to find a transformation from

abstract factors to the true factors using rotation methods. Two types of rotations

are usually used, orthogonal rotations and oblique rotations. Principal component

analysis, PCA, (or principal factor analysis, PFA) produces an orthogonal bilinear

matrix decomposition, where components or factors are obtained in a sequential

way to explain maximum variance (see Chapter 4, Section 4.3, for more details).

Using these constraints plus normalization during the bilinear matrix decomposition,

PCA produces unique solutions. These “abstract” unique and orthogonal (independent) solutions are very helpful in deducing the number of different sources of

variation present in the data. However, these solutions are “abstract” solutions in

the sense that they are not the “true” underlying factors causing the data variation,

but orthogonal linear combinations of them. On the other hand, in curve-resolution

methods, the goal is to unravel the “true” underlying sources of data variation. It is



© 2006 by Taylor & Francis Group, LLC



DK4712_C011.fm Page 423 Thursday, March 16, 2006 3:40 PM



Multivariate Curve Resolution



423



not only a question of how many different sources are present and how they can be

interpreted, but to find out how they are in reality. The price to pay is that unique

solutions are not usually obtained by means of curve-resolution methods unless

external information is provided during the matrix decomposition.

Different approaches have been proposed during recent years to improve the

solutions obtained by curve-resolution methods, and some of them are summarized

in the next sections. The field is already mature and, as it has been recently pointed

out [26], multivariate curve resolution can be considered as a “sleeping giant of

chemometrics,” with a slow but persistent growth.

Whenever the goals of curve resolution are achieved, the understanding of a

chemical system is dramatically increased and facilitated, avoiding the use of

enhanced and much more costly experimental techniques. Through multivariateresolution methods, the ubiquitous mixture analysis problem in chemistry (and other

scientific fields) is solved directly by mathematical and software tools instead of

using costly analytical chemistry and instrumental tools, for example, as in sophisticated “hyphenated” mass spectrometry-chromatographic methods.



11.3 LOCAL RANK AND RESOLUTION: EVOLVING

FACTOR ANALYSIS AND RELATED TECHNIQUES

Manne’s resolution theorems clearly stated how the distribution of the concentration

and spectral windows of the different components in a data set could affect the

quality of the pure profiles recovered after data analysis [22]. The correct knowledge

of these windows is the cornerstone of some resolution methods, and in others where

it is not essential, information derived from this knowledge can be introduced to

generally improve the results obtained.

Setting the boundaries of windows of the different components can only be done

if we are able to know how the number and nature of the components change in the

data set. Obtaining this information is the main goal of local-rank analysis methods,

which are used to locate and describe the evolution of each component in a system.

This is accomplished by combining the information obtained from multiple rank

analyses performed locally on limited zones (row or column windows) of the data set.

Some of the local-rank analysis methods, such as evolving-factor analysis (EFA)

[27–29], are more process oriented and rely on the sequential evolution of the

components as a function of time or any other variable in the data set, while others,

such as fixed-size moving-window–evolving-factor analysis (FSMW-EFA) [30, 31],

can be applied to processes and mixtures. EFA and FSMW-EFA are the two

pioneering local-rank analysis methods and can still be considered the most representative and widely used.

Evolving-factor analysis was born as the chemometric way to monitor chemicalevolving processes, such as HPLC diode-array data, batch reactions, or titration data

[27–28]. The evolution of a chemical system is gradually measured by recording a

new response vector at each stage of the process under study. Mimicking the experimental protocol, EFA performs principal component analyses on submatrices of

gradually increasing size in the process direction, enlarged by adding a row



© 2006 by Taylor & Francis Group, LLC



DK4712_C011.fm Page 424 Thursday, March 16, 2006 3:40 PM



424



Practical Guide to Chemometrics



(response), one at a time. This procedure is performed from top to bottom of the

data set (forward EFA) and from bottom to top (backward EFA) to investigate the

emergence and the decay of the process contributions, respectively. Figure 11.4b

displays the information provided by EFA for an HPLC-DAD example and how to

interpret the results.

Each time a new row is added to the expanding submatrix (Figure 11.4b), a

PCA model is computed and the corresponding singular values or eigenvalues are

saved. The forward EFA curves (thin solid lines) are produced by plotting the saved

singular values or log (eigenvalues) obtained from PCA analyses of the submatrix

expanding in the forward direction. The backward EFA curves (thin dashed lines)

are produced by plotting the singular values or log (eigenvalues) obtained from the

PCA analysis of the submatrix expanding in the backward direction. The lines

connecting corresponding singular values (s.v.), i.e., all of the first s.v., the second

s.v., the ith s.v., indicate the evolution of the singular values along the process and,

as a consequence, the variation of the process components. Emergence of a new

singular value above the noise level delineated by the pool of nonsignificant singular

values indicates the emergence of a new component (forward EFA) or the disappearance of a component (backward EFA) in the process.

Figure 11.4b also shows how to build initial estimates of concentration profiles

from the overlapped forward and backward EFA curves as long as the process evolves

in a sequential way (see the thick lines in Figure 11.4b). For a system with n

significant components, the profile of the first component is obtained combining the

curve representing the first s.v. of the forward EFA plot and the curve representing

the nth s.v. of the backward EFA plot. Note that the nth s.v. in the backward EFA

plot is related to the disappearance of the first component in the forward EFA plot.

The profile of the second component is obtained by splicing the curve representing the

second s.v. in the forward EFA plot to the curve representing (n − 1)th s.v. from the

backward EFA plot, and so forth. Combining the two profiles into one profile is easily

accomplished in a computer program by selecting the minimum value from the two

s.v. lines to be combined. It can be seen that the resulting four elution profiles

obtained by EFA are good approximations of the real profiles shown in Figure 11.4a.

The information provided by the EFA plots can be used for the detection and

location of the emergence and decay of the compounds in an evolving process.

As a consequence, the concentration window and the zero-concentration region

for each component in the system are easily determined for any process that evolves

such that the emergence and decay of each component occurs sequentially. For

example, the concentration window of the first component to elute is shown as a

shadowed zone in Figure 11.4b. Uses of this type of information have given rise

to most of the noniterative resolution methods, explained in Section 11.4 [32–39].

Iterative resolution methods, explained in Section 11.5, use the EFA-derived estimates of the concentration profiles as a starting point in an iterative optimization

[40, 41]. The location of selective zones and zones with a number of compounds

smaller than the total rank can also be introduced as additional information to

minimize the ambiguity in the resolved profiles [21, 41, 42].

As mentioned earlier, FSMW-EFA is not restricted in its applicability to evolving

processes, although the interpretation of the final results is richer for this kind of problem.



© 2006 by Taylor & Francis Group, LLC



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

1 INTRODUCTION: GENERAL CONCEPT, AMBIGUITIES, RESOLUTION THEOREMS

Tải bản đầy đủ ngay(0 tr)

×