9 Method Validation/Evaluation of a New Method
Tải bản đầy đủ - 0trang
4.10 How to Interpret the Regression Equation?
■
and standard deviation. The detection limit (also called the lower limit of
detection) is the mean 12 SD value. However, the guidelines of the
Clinical Laboratory Standard Institute (CLSI, E17 protocol) advise that a
specimen with no analyte (blank specimen) should be run; then the
Limit of Blank (LoB) 5 Mean 1 1.654 SD. This should be established by
running blank specimens 60 times, but if a company already established
a guideline, then 20 runs are enough. Limit of Quantification is usually
defined as a concentration where CV is 20% or less [4].
Comparison of a new method with an existing method is a very
important step in method validation. For this purpose, at least 100
patient specimens must be run in the laboratory at the same time with
both the existing method and the new method. It is advisable to batch
patient samples and then run these specimens by both methods on the
same day, and, if possible, at the same time (by splitting specimens).
Then results obtained by the existing method should be plotted in the
x-axis (reference method) and corresponding values obtained by the new
method should be plotted in the y-axis. Linear regression is the simplest
way of comparing results obtained by the existing method in the
laboratory and the new method. The linear regression equation is the line
of best fit with all data points. A computer can produce the linear
regression line as well as an equation called a linear regression equation,
which is the equation representing a straight line (regression line),
Equation 4.9:
y 5 mx 1 b
■
ð4:9Þ
Here, “m” is called the slope of the line and “b” is the intercept. The
computer calculates the equation of the regression line using a least
squares approach. The software also calculates “r,” the correlation
coefficient, using a complicated formula.
4.10 HOW TO INTERPRET THE REGRESSION
EQUATION?
The regression equation (y 5 mx 1 b) provides a lot of important information regarding how the new method (y) compares with the reference method
(x). Interpretations of a linear regression equation include:
■
■
Ideal value: m 5 1, b 5 0, and y 5 x. In reality this never happens.
If the value of m is less than 1.0, then the method shows negative bias
compared to the reference method. Bias can be calculated as 1 2 m; for
example, if the value of “m” is 0.95, then the negative bias is
1 2 0.95 5 0.05, or 0.05 3 100 5 5%.
59
60
CHAPTER 4:
Laboratory Statistics and Quality Control
■
■
■
If the value of m is over 1.0, it indicates positive bias in the new method.
For example, if m is 1.07, then positive bias in the new method is
1.07 2 1 5 0.07, or 0.07 3 100 5 7%.
The intercept “b” can be a positive or negative value and must be a
relatively small number.
An ideal value of “r” (correlation coefficient) is 1, but any value above
0.95 is considered good, and a value of 0.99 is considered excellent. The
correlation coefficient indicates how well the new method compares with
the existing method, but cannot tell anything about any inherent bias in
the new method. Therefore, slope must be taken into account to
determine bias.
In our laboratory, we evaluated a new immunoassay for mycophenolic acid,
an immunosuppressant, with a HPLC-UV method, the current method in our
laboratory, using specimens from 60 transplant recipients after de-identifying
specimens [5]. The regression equation was as follows (Equation 4.10):
y 5 1:1204 3 1 0:0881 ðr 5 0:98Þ
ð4:10Þ
This equation indicated that there was an average 12.04% positive bias with
the new immunoassay method compared to the reference HPLC-UV method
in determining mycophenolic acid concentration. This was most likely due
to cross-reactivity of mycophenolic acid acyl glucuronide with the mycophenolic acid assay antibody because metabolite does not interfere with mycophenolic acid determination using HPLC-UV. However, the correlation
coefficient of 0.98 indicates good agreement between both methods.
4.11 BLANDÀALTMAN PLOT
Although linear regression analysis is useful for method comparison, such
analysis is affected by extreme values (where one or a series of “x” values differs widely from the corresponding “y” values) because equal weights are
given to all points. A BlandÀAltman plot compares two methods by plotting
the difference between the two measurements on the y-axis, and the average
of the two measurements on the x-axis. The difference between two methods
can be expressed as a percentage difference between two methods or a fixed
difference such as 1 SD or 2 SD or a fixed number. It is easier to see bias
between two methods using a BlandÀAltman plot.
4.12 RECEIVERÀOPERATOR CURVE
A receiverÀoperator curve (ROC) is often used to make an optimal decision
for a test. ROC plots the true positive rate of a test (sensitivity) either as a
4.13 What is Six Sigma?
1
True positive (sensitivity)
0.9
0.8
0.7
Decision point 3
Decision point 2
0.6
0.5
0.4
Decision point 1
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
False positive (1-specificity)
FIGURE 4.7
ReceiverÀoperator curve (ROC) showing various decision points.
scale of 0À1 (1 is highest sensitivity) or as a percent on the y-axis versus a false
positive rate (1-specificity). As sensitivity increases, the specificity decreases. In
Figure 4.7, a hypothetical ROC curve is given. If decision point 1 is selected
for the test value then sensitivity of the test is 0.57 or 57% but specificity is
very high (99%, in the scale 1-specificity: 0.01). On the other hand if a higher
value of the test is selected for a decision point (decision point 3), the sensitivity has been increased to nearly 90% but specificity was decreased to 42% (in
the scale 1-specificity: 0.58) (Figure 4.7). Therefore, a decision point can be
made which can be used for making a clinical decision. In general, the closer
the decision point is to the y-axis, the better the specificity.
4.13 WHAT IS SIX SIGMA?
Six sigma originated from Motorola Corporation’s approach for total quality
management during manufacturing with an objective to reduce defects in
manufacturing. Although six sigma was originally developed for a
manufacturing process, the principles can be applied to total quality
improvement of any operation, including a clinical laboratory operation. The
goal of six sigma is to achieve an error rate of 3.4 out of one million for a
process or an error rate of only 0.00034%. An error rate of 0.001% is considered a 5.8 sigma. The goal of a clinical laboratory operation is to reduce the
error rate to at least 0.1% (4.6 sigma), but preferably 0.01% (5.2 sigma) or
higher. Improvement can be made during any process of the laboratory operation (pre-analytical, analytical, or post-analytical) with an overall goal of
reducing laboratory errors.
61
62
CHAPTER 4:
Laboratory Statistics and Quality Control
4.14 ERRORS ASSOCIATED WITH REFERENCE
RANGE
Reference ranges are given with patients’ values to help clinicians interpret
laboratory test results. However, most reference ranges include values in the
range of mean 6 2 SD as observed with the normal population. Therefore,
reference range only accounts for 95% of the values observed in healthy individuals for the particular tests, and statistically 5% of the values of the normal population should fall outside the reference range. If more than one test
is used, then a greater percentage of the values should fall outside the reference range. The likelihood of “n” test results falling within the reference
range can be calculated with Equation 4.11:
% Results falling within normal range 5 0:95n 3 100
ð4:11Þ
The percent of results falling outside the reference range in normal people is
shown in Equation 4.12:
ð1 2 0:95n Þ 3 100
ð4:12Þ
For example, if five tests are ordered for health screening of a healthy person,
then Equation 4.13 holds true:
% Results falling outside normal range 5 ð1 2 0:955 Þ 3 100
5 ð1 2 0:773Þ 3 100 5 22:7%
ð4:13Þ
In Table 4.2, examples of a number of tests falling within and outside the
reference range are given.
Table 4.2 Testing and Reference Range*
Number of Tests
Results within
Reference Range
Outside Reference
Range
1
2
3
4
5
6
10
95%
90%
85.7%
81.4%
77.3%
73.5%
59.8%
5%
10%
14.3%
18.6%
22.7%
26.5%
40.2%
*For multiple tests ordered in a healthy subject, chances of the number of tests falling within the
reference range and the number of tests falling outside the reference range.
Key Points
4.15 BASIC STATISTICAL ANALYSIS: STUDENT
t-TEST AND RELATED TESTS
A new method can be validated against an existing method by using regression analysis as stated earlier in the chapter. Bias can be calculated based on
the analysis of the slope or BlandÀAltman plot. However, in some instances,
bias between the two methods can be significant and in this case a laboratory
professional needs to know if values on an analyte determined by the reference method are significantly different from the values determined by the
new method. This can be calculated by the mean of two sets of values and
the standard deviation using Student t-test:
■
■
■
■
■
The Student t-test is useful to determine if one set of values is different
from another set of values based on the difference between mean values
and standard deviations. This statistical test is also useful in clinical
research to see if values of an analyte in the normal state are significantly
different from the values observed in a disease state.
The Student t-test is only applicable if both distributions of values are
normal (Gaussian).
If the “t” value is significant based on the degrees of freedom (n1 1 n2 2 1,
where n1 and n2 represent the number of values in set 1 and set 2
distributions), then the null hypothesis (there is no difference between two
sets of values) is rejected and it is assumed that values in the set 1
distribution are statistically different from values in the set 2 distribution.
The value of t can be easily obtained from published tables.
The F-test is a measure of differences in variances and can also be used to
see if one set of data is different from another set of data. The F-test can
be used for analysis of multiple sets of data, when it is called ANOVA
(analysis of variance).
If the distribution of data is non-Gaussian, then neither the t-test nor the
F-test can be used. In this case, the Wilcoxon rank sum test (also known
as the MannÀWhitney U test) should be used.
The formulas for the t-test and MannÀWhitney U test can be found in any
textbook on statistics. However, a detailed discussion on these statistical
methods is beyond the scope of this book.
KEY POINTS
■
■
■
The formula for coefficient of variation (CV): CV 5 SD/mean 3 100.
Standard error of mean 5 SD/On, where n is the number of data points in the set.
If a distribution is normal, the value of the mean, median, and mode is the same.
However, the value of the mean, median, and mode may be different if the
distribution is skewed (not a Gaussian distribution).
63
64
CHAPTER 4:
Laboratory Statistics and Quality Control
■
■
■
■
■
■
■
■
■
In Gaussian distributions, the mean 6 1 SD contains 68.2% of all values, the
mean 6 2 SD contains 95.5% of all values, and the mean 6 3 SD contains 99.7% of
all values in the distribution.
The reference range when determined by measuring an analyte in at least 100
healthy people and the distribution of values in a normal Gaussian distribution is
calculated as mean 6 2 SD.
For calculating sensitivity, specificity, and predictive value of a test, the following
formulas can be used, where TP 5 true positive, FP 5 False positive, TN 5 True
negative, and FN 5 False negative: (a) Sensitivity (individuals with disease who
show positive test results) 5 (TP/(TP 1 FN)) 3 100; (b) Specificity (individuals
without disease who show negative test results) 5 (TN/(TN 1 FP)) 3 100; and
(c) Positive predictive value 5 (TP/(TP 1 FP)) 3 100.
In a clinical laboratory, three types of control materials are used: assayed control where
the value of the analyte is predetermined, un-assayed control where the target value is
not predetermined, and homemade control where the control material is not easily
commercially available (e.g. an esoteric test).
Quality control in the laboratory may be both internal and external. Internal quality
control is essential and results are plotted in a LeveyÀJennings chart; the most
common example of external quality control is analysis of CAP (College of
American Pathologists) proficiency samples.
“Waived tests” are not complex and laboratories can perform such tests as long as
they follow manufacturer’s protocol. Enrolling in an external proficiency-testing
program such as a CAP survey is not required for waived tests.
“Non-waived tests” are moderately complex or complex tests and laboratories
performing such tests are subjected to all CLIA regulations and must be inspected
by CLIA inspectors every two years or by inspectors from non-government
organizations such as CAP or Joint-Commission on Accreditation of Healthcare
Organization (JCAHO). In addition, for all non-waived tests laboratories must
participate in an external proficiency program, most commonly CAP proficiency
surveys, and must successfully pass proficiency testing in order to operate legally.
A laboratory must produce correct results for four of five external proficiency
specimens for each analyte, and must have at least an 80% score for three
consecutive challenges.
Since April 2003, clinical laboratories must perform method validation for each
new test, even if such test is already FDA approved.
A LeveyÀJennings chart is a graphical representation of all control values for an
assay during an extended period of laboratory operation. In this graphical
representation, values are plotted with respect to the calculated mean and
standard deviation. If all controls are within the mean and 6 2 SD, then all
control values were within acceptable limits and all runs during that period have
acceptable performance. A LeveyÀJennings chart must be constructed for each
control (low and high control, or low, medium, and high control) for each assay
the laboratory offers. The laboratory director or designee must review all