Tải bản đầy đủ - 0 (trang)
2 CHEMICAL MEASUREMENTS — THE THREE-LEGGED PLATFORM

2 CHEMICAL MEASUREMENTS — THE THREE-LEGGED PLATFORM

Tải bản đầy đủ - 0trang

DK4712_C001.fm Page 3 Tuesday, January 31, 2006 11:49 AM



Introduction to Chemometrics



3



quickly and rapidly with low noise, rather than measuring its absorbance at a single

wavelength. By properly considering the distribution of multiple variables simultaneously, we obtain more information than could be obtained by considering each

variable individually. This is one of the so-called multivariate advantages. The additional information comes to us in the form of correlation. When we look at one variable

at a time, we neglect correlation between variables, and hence miss part of the picture.

A recent paper by Bro described four additional advantages of multivariate

methods compared with univariate methods [1]. Noise reduction is possible when

multiple redundant variables are analyzed simultaneously by proper multivariate

methods. For example, low-noise factors can be obtained when principal component

analysis is used to extract a few meaningful factors from UV spectra measured at

hundreds of wavelengths. Another important multivariate advantage is that partially

selective measurements can be used, and by use of proper multivariate methods,

results can be obtained free of the effects of interfering signals. A third advantage

is that false samples can be easily discovered, for example in spectroscopic analysis.

For any well characterized chemometric method, aliquots of material measured in

the future should be properly explained by linear combinations of the training set

or calibration spectra. If new, foreign materials are present that give spectroscopic

signals slightly different from the expected ingredients, these can be detected in the

spectral residuals and the corresponding aliquot flagged as an outlier or “false

sample.” The advantages of chemometrics are often the consequence of using multivariate methods. The reader will find these and other advantages highlighted

throughout the book.



1.4 HOW TO USE THIS BOOK

This book is suitable for use as an introductory textbook in chemometrics or for use

as a self-study guide. Each of the chapters is self-contained, and together they cover

many of the main areas of chemometrics. The early chapters cover tutorial topics

and fundamental concepts, starting with a review of basic statistics in Chapter 2,

including hypothesis testing. The aim of Chapter 2 is to review suitable protocols

for the planning of experiments and the analysis of the data, primarily from a

univariate point of view. Topics covered include defining a research hypothesis, and

then implementing statistical tools that can be used to determine whether the stated

hypothesis is found to be true. Chapter 3 builds on the concept of the univariate

normal distribution and extends it to the multivariate normal distribution. An example

is given showing the analysis of near infrared spectral data for raw material testing,

where two degradation products were detected at 0.5% to 1% by weight. Chapter 4

covers principal component analysis (PCA), one of the workhorse methods of

chemometrics. This is a topic that all basic or introductory courses in chemometrics

should cover. Chapter 5 covers the topic of multivariate calibration, including partial

least-squares, one of the single most common application areas for chemometrics.

Multivariate calibration refers generally to mathematical methods that transform and

instrument’s response to give an estimate of a more informative chemical or physical

variable, e.g., the target analyte. Together, Chapters 3, 4, and 5 form the introductory

core material of this book.



© 2006 by Taylor & Francis Group, LLC



DK4712_C001.fm Page 4 Tuesday, January 31, 2006 11:49 AM



4



Practical Guide to Chemometrics



The remaining chapters of the book introduce some of the advanced topics of

chemometrics. The coverage is fairly comprehensive, in that these chapters cover

some of the most important advanced topics. Chapter 6 presents the concept of

robust multivariate methods. Robust methods are insensitive to the presence of

outliers. Most of the methods described in Chapter 6 can tolerate data sets contaminated with up to 50% outliers without detrimental effects. Descriptions of algorithms

and examples are provided for robust estimators of the multivariate normal distribution, robust PCA, and robust multivariate calibration, including robust PLS. As

such, Chapter 6 provides an excellent follow-up to Chapters 3, 4, and 5.

Chapter 7 covers the advanced topic of nonlinear multivariate model estimation,

with its primary examples taken from chemical kinetics. Chapter 8 covers the

important topic of experimental design. While its position in the arrangement of this

book comes somewhat late, we feel it will be much easier for the reader or student

to recognize important applications of experimental design by following chapters

on calibration and nonlinear model estimation. Chapter 9 covers the topic of multivariate classification and pattern recognition. These types of methods are designed

to seek relationships that describe the similarity or dissimilarity between diverse

groups of data, thereby revealing common properties among the objects in a data

set. With proper multivariate approaches, a large number of features can be studied

simultaneously. Examples of applications in this area of chemometrics include the

identification of the source of pollutants, detection of unacceptable raw materials,

intact classification of unlabeled pharmaceutical products for clinical trials through

blister packs, detection of the presence or absence of disease in a patient, and food

quality testing, to name a few.

Chapter 10, Signal Processing and Digital Filtering, is concerned with mathematical methods that are intended to enhance signals by decreasing the contribution

of noise. In this way, the “true” signal can be recovered from a signal distorted by

other effects. Chapter 11, Multivariate Curve Resolution, describes methods for the

mathematical resolution of multivariate data sets from evolving systems into descriptive models showing the contributions of pure constituents. The ability to correctly

recover pure concentration profiles and spectra for each of the components in the

system depends on the degree of overlap among the pure profiles of the different

components and the specific way in which the regions of these profiles are overlapped.

Chapter 12 describes three-way calibration methods, an active area of research in

chemometrics. Chapter 12 includes descriptions of methods such as the generalized

rank annihilation method (GRAM) and parallel factor analysis (PARAFAC). The main

advantage of three-way calibration methods is their ability to estimate analyte concentrations in the presence of unknown, uncalibrated spectral interferents. Chapter 13

reviews some of the most active areas of research in chemometrics.



1.4.1 SOFTWARE APPLICATIONS

Our experience in learning chemometrics and teaching it to others has demonstrated repeatedly that people learn new techniques by using them to solve interesting problems. For this reason, many of the contributing authors to this book

have chosen to illustrate their chemometric methods with examples using



© 2006 by Taylor & Francis Group, LLC



DK4712_C001.fm Page 5 Tuesday, January 31, 2006 11:49 AM



Introduction to Chemometrics



5



Microsoft® Excel, MATLAB, or other powerful computer applications. For many

research groups in chemometrics, MATLAB has become a workhorse research tool,

and numerous public-domain MATLAB software packages for doing chemometrics

can be found on the World Wide Web. MATLAB is an interactive computing environment that takes the drudgery out of using linear algebra to solve complicated

problems. It integrates computer graphics, numerical analysis, and matrix computations into one simple-to-use package. The package is available on a wide range

of personal computers and workstations, including IBM-compatible and Macintosh

computers. It is especially well-suited to solving complicated matrix equations using

a simple “algebra-like” notation. Because some of the authors have chosen to use

MATLAB, we are able to provide you with some example programs. The equivalent

programs in BASIC, Pascal, FORTRAN, or C would be too long and complex for

illustrating the examples in this book. It will also be much easier for you to experiment with the methods presented in this book by trying them out on your data sets

and modifying them to suit your special needs. Those who want to learn more about

MATLAB should consult the manuals shipped with the program and numerous web

sites that present tutorials describing its use.



1.5 GENERAL READING ON CHEMOMETRICS

A growing number of books, some of a specialized nature, are available on chemometrics. A brief summary of the more general texts is given here as guidance for

the reader. Each chapter, however, has its own list of selected references.



JOURNALS

1. Journal of Chemometrics (Wiley) — Good for fundamental papers and applications

of advanced algorithms.

2. Journal of Chemometrics and Intelligent Laboratory Systems (Elsevier) — Good for

conference information; has a tutorial approach and is not too mathematically heavy.

3. Papers on chemometrics can also be found in many of the more general analytical

journals, including: Analytica Chimica Acta, Analytical Chemistry, Applied Spectroscopy, Journal of Near Infrared Spectroscopy, Journal of Process Control, and Technometrics.



BOOKS

1. Adams, M.J., Chemometrics in Analytical Spectroscopy, 2nd ed., The Royal Society

of Chemistry: Cambridge. 2004.

2. Beebe, K.R., Pell, R.J., and Seasholtz, M.B. Chemometrics: A Practical Guide., John

Wiley & Sons: New York. 1998.

3. Box, G.E.P., Hunter, W.G., and Hunter, J.S. Statistics for Experimenters. John Wiley

& Sons: New York. 1978.

4. Brereton, R.G. Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

John Wiley & Sons: Chichester, U.K. 2002.

5. Draper, N.R. and Smith, H.S. Applied Regression Analysis, 2nd ed., John Wiley &

Sons: New York. 1981.



© 2006 by Taylor & Francis Group, LLC



DK4712_C001.fm Page 6 Tuesday, January 31, 2006 11:49 AM



6



Practical Guide to Chemometrics

6. Jackson, J.E. A User’s Guide to Principal Components. John Wiley & Sons: New

York. 1991.

7. Jollife, I.T. Principal Component Analysis. Springer-Verlag: New York. 1986.

8. Kowalski, B.R., Ed. NATO ASI Series. Series C, Mathematical and Physical Sciences,

Vol. 138: Chemometrics, Mathematics, and Statistics in Chemistry. Dordrecht; Lancaster:

Published in cooperation with NATO Scientific Affairs Division [by] Reidel, 1984.

9. Kowalski, B.R., Ed. Chemometrics: Theory and Application. ACS Symposium Series

52. American Chemical Society: Washington, DC. 1977.

10. Malinowski, E.R. Factor Analysis of Chemistry. 2nd ed., John Wiley & Sons: New

York. 1991.

11. Martens, H. and Næs, T. Multivariate Calibration. John Wiley & Sons: Chichester,

U.K. 1989.

12. Massart, D.L., Vandeginste, B.G.M., Buyden, L.M.C., De Jong, S., Lewi, P.J., and

Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics, Part A and B.

Elsevier: Amsterdam. 1997.

13. Miller, J.C. and Miller, J.N. Statistics and Chemometrics for Analytical Chemistry,

4th ed., Prentice Hall: Upper Saddle River N.J. 2000.

14. Otto, M. Chemometrics: Statistics and Computer Application in Analytical Chemistry.

John Wiley & Sons-VCH: New York. 1999.

15. Press, W.H.; Teukolsky, S.A., Flannery, B.P., and Vetterling, W.T. Numerical Recipes

in C. The Art of Scientific Computing, 2nd ed., Cambridge University Press: New

York. 1992.

16. Sharaf, M.A., Illman, D.L., and Kowalski, B.R. Chemical Analysis, Vol. 82: Chemometrics. John Wiley & Sons: New York. 1986.



REFERENCES

1. Bro, R., Multivariate calibration. What is in chemometrics for the analytical chemist?

Analytica Chimica Acta, 2003. 500(1-2): 185–194.



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 7 Thursday, March 2, 2006 5:04 PM



2



Statistical Evaluation

of Data

Anthony D. Walmsley



CONTENTS

Introduction................................................................................................................8

2.1 Sources of Error ...............................................................................................9

2.1.1 Some Common Terms........................................................................10

2.2 Precision and Accuracy..................................................................................12

2.3 Properties of the Normal Distribution ...........................................................14

2.4 Significance Testing .......................................................................................18

2.4.1 The F-test for Comparison of Variance

(Precision) ..........................................................................................19

2.4.2 The Student t-Test ..............................................................................22

2.4.3 One-Tailed or Two-Tailed Tests.........................................................24

2.4.4 Comparison of a Sample Mean with a Certified

Value...................................................................................................24

2.4.5 Comparison of the Means from Two Samples..................................25

2.4.6 Comparison of Two Methods with Different Test Objects

or Specimens ......................................................................................26

2.5 Analysis of Variance ......................................................................................27

2.5.1 ANOVA to Test for Differences Between

Means .................................................................................................28

2.5.2 The Within-Sample Variation (Within-Treatment

Variation) ............................................................................................29

2.5.3 Between-Sample Variation (Between-Treatment

Variation) ............................................................................................29

2.5.4 Analysis of Residuals.........................................................................30

2.6 Outliers ...........................................................................................................33

2.7 Robust Estimates of Central Tendency and Spread ......................................36

2.8 Software..........................................................................................................38

2.8.1 ANOVA Using Excel .........................................................................39

Recommended Reading ...........................................................................................40

References................................................................................................................40



7



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 8 Thursday, March 2, 2006 5:04 PM



8



Practical Guide to Chemometrics



INTRODUCTION

Typically, one of the main errors made in analytical chemistry and chemometrics

is that the chemical experiments are performed with no prior plan or design. It is

often the case that a researcher arrives with a pile of data and asks “what does it

mean?” to which the answer is usually “well what do you think it means?” The

weakness in collecting data without a plan is that one can quite easily acquire

data that are simply not relevant. For example, one may wish to compare a new

method with a traditional method, which is common practice, and so aliquots or

test materials are tested with both methods and then the data are used to test which

method is the best (Note: for “best” we mean the most suitable for a particular

task, in most cases “best” can cover many aspects of a method from highest purity,

lowest error, smallest limit of detection, speed of analysis, etc. The “best” method

can be defined for each case). However, this is not a direct comparison, as the

new method will typically be one in which the researchers have a high degree of

domain experience (as they have been developing it), meaning that it is an optimized method, but the traditional method may be one they have little experience

with, and so is more likely to be nonoptimized. Therefore, the question you have

to ask is, “Will simply testing objects with both methods result in data that can

be used to compare which is the better method, or will the data simply infer that

the researchers are able to get better results with their method than the traditional

one?” Without some design and planning, a great deal of effort can be wasted and

mistakes can be easily made. It is unfortunately very easy to compare an optimized

method with a nonoptimized method and hail the new technique as superior, when

in fact, all that has been deduced is an inability to perform both techniques to the

same standard.

Practical science should not start with collecting data; it should start with a

hypothesis (or several hypotheses) about a problem or technique, etc. With a set of

questions, one can plan experiments to ensure that the data collected is useful in

answering those questions. Prior to any experimentation, there needs to be a consideration of the analysis of the results, to ensure that the data being collected are

relevant to the questions being asked. One of the desirable outcomes of a structured

approach is that one may find that some variables in a technique have little influence

on the results obtained, and as such, can be left out of any subsequent experimental

plan, which results in the necessity for less rather than more work.

Traditionally, data was a single numerical result from a procedure or assay; for

example, the concentration of the active component in a tablet. However, with

modern analytical equipment, these results are more often a spectrum, such as a

mid-infrared spectrum for example, and so the use of multivariate calibration models

has flourished. This has led to more complex statistical treatments because the result

from a calibration needs to be validated rather than just a single value recorded. The

quality of calibration models needs to be tested, as does the robustness, all adding

to the complexity of the data analysis. In the same way that the spectroscopist relies

on the spectra obtained from an instrument, the analyst must rely on the results

obtained from the calibration model (which may be based on spectral data); therefore,

the rigor of testing must be at the same high standard as that of the instrument



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 9 Thursday, March 2, 2006 5:04 PM



Statistical Evaluation of Data



9



manufacturer. The quality of any model is very dependent on the test specimens

used to build it, and so sampling plays a very important part in analytical methodology. Obtaining a good representative sample or set of test specimens is not easy

without some prior planning, and in cases where natural products or natural materials

are used or where no design is applicable, it is critical to obtain a representative

sample of the system.

The aim of this chapter is to demonstrate suitable protocols for the planning of

experiments and the analysis of the data. The important question to keep in mind

is, “What is the purpose of the experiment and what do I propose as the outcome?”

Usually, defining the question takes greater effort than performing any analysis.

Defining the question is more technically termed defining the research hypothesis,

following which the statistical tools can be used to determine whether the stated

hypothesis is found to be true.

One can consider the application of statistical tests and chemometric tools to be

somewhat akin to torture—if you perform it long enough your data will tell you

anything you wish to know—but most results obtained from torturing your data are

likely to be very unstable. A light touch with the correct tools will produce a much

more robust and useable result then heavy-handed tactics ever will. Statistics, like

torture, benefit from the correct use of the appropriate tool.



2.1 SOURCES OF ERROR

Experimental science is in many cases a quantitative subject that depends on

numerical measurements. A numerical measurement is almost totally useless

unless it is accompanied by some estimate of the error or uncertainty in the

measurement. Therefore, one must get into the habit of estimating the error or

degree of uncertainty each time a measurement is made. Statistics are a good way

to describe some types of error and uncertainty in our data. Generally, one can

consider that simple statistics are a numerical measure of “common sense” when

it comes to describing errors in data. If a measurement seems rather high compared

with the rest of the measurements in the set, statistics can be employed to give a

numerical estimate as to how high. This means that one must not use statistics

blindly, but must always relate the results from the given statistical test to the data

to which the data has been applied, and relate the results to given knowledge of

the measurement. For example, if you calculate the mean height of a group of

students, and the mean is returned as 296 cm, or more than 8 ft, then you must

consider that unless your class is a basketball team, the mean should not be so

high. The outcome should thus lead you to consider the original data, or that an

error has occurred in the calculation of the mean.

One needs to be extremely careful about errors in data, as the largest error will

always dominate. If there is a large error in a reference method, for example, small

measurement errors will be superseded by the reference errors. For example, if one

used a bench-top balance accurate to one hundredth of a gram to weigh out one

gram of substance to standardize a reagent, the resultant standard will have an

accuracy of only one part per hundredth, which is usually considered to be poor for

analytical data.



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 10 Thursday, March 2, 2006 5:04 PM



10



Practical Guide to Chemometrics



Statistics must not be viewed as a method of making sense out of bad data, as

the results of any statistical test are only as good as the data to which they are

applied. If the data are poor, then any statistical conclusion that can be made will

also be poor.

Experimental scientists generally consider there to be three types of error:

1. Gross error is caused, for example, by an instrumental breakdown such

as a power failure, a lamp failing, severe contamination of the specimen

or a simple mislabeling of a specimen (in which the bottle’s contents are

not as recorded on the label). The presence of gross errors renders an

experiment useless. The most easily applied remedy is to repeat the

experiment. However, it can be quite difficult to detect these errors, especially if no replicate measurements have been made.

2. Systematic error arises from imperfections in an experimental procedure,

leading to a bias in the data, i.e., the errors all lie in the same direction

for all measurements (the values are all too high or all too low). These

errors can arise due to a poorly calibrated instrument or by the incorrect

use of volumetric glassware. The errors that are generated in this way can

be either constant or proportional. When the data are plotted and viewed,

this type of error can usually be discovered, i.e., the intercept on the

y-axis for a calibration is much greater than zero.

3. Random error (commonly referred to as noise) produces results that are

spread about the average value. The greater the degree of randomness,

the larger the spread. Statistics are often used to describe random errors.

Random errors are typically ones that we have no control over, such as

electrical noise in a transducer. These errors affect the precision or reproducibility of the experimental results. The goal is to have small random

errors that lead to good precision in our measurements. The precision of

a method is determined from replicate measurements taken at a similar

time.



2.1.1 SOME COMMON TERMS

Accuracy: An experiment that has small systematic error is said to be accurate,

i.e., the measurements obtained are close to the true values.

Precision: An experiment that has small random errors is said to be precise,

i.e., the measurements have a small spread of values.

Within-run: This refers to a set of measurements made in succession in the

same laboratory using the same equipment.

Between-run: This refers to a set of measurements made at different times,

possibly in different laboratories and under different circumstances.

Repeatability: This is a measure of within-run precision.

Reproducibility: This is a measure of between-run precision.

Mean, Variance, and Standard Deviation: Three common statistics can be

calculated very easily to give a quick understanding of the quality of a

dataset and can also be used for a quick comparison of new data with some



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 11 Thursday, March 2, 2006 5:04 PM



Statistical Evaluation of Data



11



prior datasets. For example, one can compare the mean of the dataset with

the mean from a standard set. These are very useful exploratory statistics,

they are easy to calculate, and can also be used in subsequent data analysis

tools. The arithmetic mean is a measure of the average or central tendency

of a set of data and is usually denoted by the symbol x . The value for the

mean is calculated by summing the data and then dividing this sum by the

number of values (n).



x=



∑x



i



(2.1)



n



The variance in the data, a measure of the spread of a set of data, is related to

the precision of the data. For example, the larger the variance, the larger the spread

of data and the lower the precision of the data. Variance is usually given the symbol

s2 and is defined by the formula:

s2 =



∑ (x − x )



2



i



(2.2)



n



The standard deviation of a set of data, usually given the symbol s, is the square

root of the variance. The difference between standard deviation and variance is that

the standard deviation has the same units as the data, whereas the variance is in units

squared. For example, if the measured unit for a collection of data is in meters (m)

then the units for the standard deviation is m and the unit for the variance is m2. For

large values of n, the population standard deviation is calculated using the formula:



s=



∑ (x − x )



2



i



(2.3)



n



If the standard deviation is to be estimated from a small set of data, it is more

appropriate to calculate the sample standard deviation, denoted by the symbol sˆ,

which is calculated using the following equation:



sˆ =



∑ (x − x )

i



n −1



2



(2.4)



The relative standard deviation (or coefficient of variation), a dimensionless

quantity (often expressed as a percentage), is a measure of the relative error, or noise

in some data. It is calculated by the formula:

RSD =



s

x



(2.5)



When making some analytical measurements of a quantity (x), for example the

concentration of lead in drinking water, all the results obtained will contain some



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 12 Thursday, March 2, 2006 5:04 PM



12



Practical Guide to Chemometrics



random errors; therefore, we need to repeat the measurement a number of times (n).

The standard error of the mean, which is a measure of the error in the final answer,

is calculated by the formula:

sM =



s

n



(2.6)



It is good practice when presenting your results to use the following representation:





s

n



(2.7)



Suppose the boiling points of six impure ethanol specimens were measured using

a digital thermometer and found to be: 78.9, 79.2, 79.4, 80.1, 80.3, and 80.9°C. The

mean of the data, x , is 79.8°C, the standard deviation, s, is 0.692°C. With the value

of n = 6, the standard error, sm, is found to be 0.282°C, thus the true temperature of

the impure ethanol is in the range 79.8 ± 0.282°C (n = 6).



2.2 PRECISION AND ACCURACY

The ability to perform the same analytical measurements to provide precise and

accurate results is critical in analytical chemistry. The quality of the data can be

determined by calculating the precision and accuracy of the data. Various bodies have

attempted to define precision. One commonly cited definition is from the International

Union of Pure and Applied Chemistry (IUPAC), which defines precision as “relating

to the variations between variates, i.e., the scatter between variates.”[1] Accuracy can

be defined as the ability of the measured results to match the true value for the data.

From this point of view, the standard deviation is a measure of precision and the mean

is a measure of the accuracy of the collected data. In an ideal situation, the data would

have both high accuracy and precision (i.e., very close to the true value and with a

very small spread). The four common scenarios that relate to accuracy and precision

are illustrated in Figure 2.1. In many cases, it is not possible to obtain high precision

and accuracy simultaneously, so common practice is to be more concerned with the

precision of the data rather than the accuracy. Accuracy, or the lack of it, can be

compensated in other ways, for example by using aliquots of a reference material, but

low precision cannot be corrected once the data has been collected.

To determine precision, we need to know something about the manner in which

data is customarily distributed. For example, high precision (i.e., the data are very

close together) produces a very narrow distribution, while low precision (i.e., the

data are spread far apart) produces a wide distribution. Assuming that the data are

normally distributed (which holds true for many cases and can be used as an

approximation in many other cases) allows us to use the well understood mathematical distribution known as the normal or Gaussian error distribution. The advantage

to using such a model is that we can compare the collected data with a well

understood statistical model to determine the precision of the data.



© 2006 by Taylor & Francis Group, LLC



DK4712_C002.fm Page 13 Thursday, March 2, 2006 5:04 PM



Statistical Evaluation of Data



13



Target



Target



Precise but not

accurate

(a)



Accurate but not

precise

(b)



Target



Target



Inaccurate and

imprecise



Accurate and

precise



(c)



(d)



FIGURE 2.1 The four common scenarios that illustrate accuracy and precision in data: (a)

precise but not accurate, (b) accurate but not precise, (c) inaccurate and imprecise, and (d)

accurate and precise.



Although the standard deviation gives a measure of the spread of a set of results

about the mean value, it does not indicate the way in which the results are distributed.

To understand this, a large number of results are needed to characterize the distribution. Rather than think in terms of a few data points (for example, six data points)

we need to consider, say 500 data points, so the mean, x , is an excellent estimate

of the true mean or population mean, µ. The spread of a large number of collected

data points will be affected by the random errors in the measurement (i.e., the

sampling error and the measurement error) and this will cause the data to follow

the normal distribution. This distribution is shown in Equation 2.8:

y=



exp[−( x − µ )2 / 2σ 2 ]



σ 2π



(2.8)



where µ is the true mean (or population mean), x is the measured data, and σ is the

true standard deviation (or the population standard deviation). The shape of the

distribution can be seen in Figure 2.2, where it can be clearly seen that the smaller

the spread of the data, the narrower the distribution curve.

It is common to measure only a small number of objects or aliquots, and so one

has to rely upon the central limit theorem to see that a small set of data will behave

in the same manner as a large set of data. The central limit theorem states that “as

the size of a sample increases (number of objects or aliquots measured), the data

will tend towards a normal distribution.” If we consider the following case:

y = x1 + x2 + … + xn



© 2006 by Taylor & Francis Group, LLC



(2.9)



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2 CHEMICAL MEASUREMENTS — THE THREE-LEGGED PLATFORM

Tải bản đầy đủ ngay(0 tr)

×