6 Risks: Differences, Ratios, and Odds Ratios
Tải bản đầy đủ
10.6 Risks: Differences, Ratios, and Odds Ratios
Exposed (E)
Nonexposed (E c )
Total
381
Disease present (D) No disease present (C)
a
b
c
d
m1 = a + c
m2 = b + d
Total
n1 = a + b
n2 = c + d
n = a+b+c+d
In clinical trial studies, the risk factor status (E/E c ) can be replaced by a treatment/control or new treatment/old treatment, while the disease status (D/D c )
can be replaced by a improvement/nonimprovement.
Remark. In the context of epidemiology, the studies leading to tabulated
data can be prospective and retrospective. In a prospective study, a group of
n diseasefree individuals is identified and followed over a period of time. At
the end of the study, the group, typically called the cohort, is assessed and
tabulated with respect to disease development and exposure to the risk factor
of interest.
In a retrospective study, groups of m 1 individuals with the disease (cases)
and m 2 diseasefree individuals (controls) are identified and their prior exposure histories are assessed. In this case, the table summarizes the numbers of
exposure to the risk factor under consideration among the cases and controls.
10.6.1 Risk Differences
Let p 1 and p 2 be the population risks of a disease for exposed and nonexposed (control) subjects. These are probabilities that the subjects will develop
the disease during the fixed interval of time for the two groups, exposed and
nonexposed.
Let pˆ 1 = a/n 1 be an estimator of the risk of a disease for exposed subjects
and pˆ 2 = c/n 2 be an estimator of the risk of that disease for control subjects.
The (1 − α)100% confidence interval for the risk difference coincides with
the confidence interval for the difference of proportions from p. 378:
pˆ 1 − pˆ 2 ±
z1−α/2
pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
.
n1
n2
Sometimes, better precision is achieved by a confidence interval with continuity corrections:
382
10 Two Samples
pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
,
n1
n2
pˆ 1 − pˆ 2 ± (1/(2n 1 ) + 1/(2n 2 )) − z1−α/2
pˆ 1 − pˆ 2 ± (1/(2n 1 ) + 1/(2n 2 )) + z1−α/2
pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
,
n1
n2
where the sign of the correction factor 1/(2n 1 ) + 1/(2n 2 ) is taken as “+” if pˆ 1 −
pˆ 2 < 0 and as “−” if pˆ 1 − pˆ 2 > 0. The recommended sample sizes for the validity
of the interval should satisfy min{ n 1 p 1 (1 − p 1 ), n 2 p 2 (1 − p 2 )} ≥ 10. For large
sample sizes, the difference between “continuitycorrected” and uncorrected
intervals is negligible.
10.6.2 Risk Ratio
The risk ratio in a population is the quantity R = p 1 /p 2 . It is estimated by
r = pˆ 1 / pˆ 2 . The empirical distribution of r does not have a simple form, and,
moreover, it is typically skewed (Fig. 10.3b). If the logarithm is taken, the risk
ratio is “symmetrized,” the log ratio is equivalent to the difference between
logarithms, and, given the independence of populations, the CLT applies. It is
evident in Fig. 10.3c that the log risk ratios are well approximated by a normal
distribution.
1400
1200
2000
1200
1000
1000
1500
800
800
600
1000
600
400
400
500
200
0
200
−0.2
−0.1
0
(a)
0.1
0.2
0
1
2
3
(b)
4
5
0
−1
−0.5
0
0.5
1
1.5
(c)
Fig. 10.3 Two samples of size 10,000 are generated from B in(80, 0.21) and B in(60, 0.25)
populations and risks pˆ 1 and pˆ 2 are estimated for each pair. The panels show histograms of
(a) risk differences, (b) risk ratios, and (c) log risk ratios.
The following MATLAB code (
simulrisks.m) simulates 10,000 pairs
from B in(80, 0.21) and B in(60, 0.25) populations representing exposed and
10.6 Risks: Differences, Ratios, and Odds Ratios
383
nonexposed subjects. From each pair risks are assessed and histograms of risk
differences, risk ratios, and log risk ratios are shown in Fig. 10.3a–c.
disexposed = binornd(60, 0.25, [1 10000]);
disnonexposed = binornd(80, 0.21, [1 10000]);
p1s = disexposed/60; p2s =disnonexposed/80;
figure; hist(p1s  p2s, 25)
figure; hist(p1s./p2s, 25)
figure; hist( log( p1s./p2s ), 25 )
10.6.3 Odds Ratios
For a particular proportion, p, the odds are defined as
p 1 and p 2 , the odds ratio is defined as O =
pˆ /(1− pˆ )
p
1− p . For two proportions
p 1 /(1− p 1 )
p 2 /(1− p 2 ) , and its sample counterpart
is o = pˆ 12 /(1− pˆ 12 ) .
As evident in Fig. 10.4, the odds ratio is symmetrized by the log transformation, and it is the log domain where the normal approximations are
used. The sample standard deviation for log o is s log o =
the (1 − α)100% confidence interval for the log odds ratio is
log o − z1−α/2
1 1 1 1
+ + + , log o + z1−α/2
a b c d
1
a
+ 1b + 1c + d1 , and
1 1 1 1
+ + +
.
a b c d
Of course, the confidence interval for the odds ratio is obtained by taking
the exponents of the bounds:
exp log o − z1−α/2
1 1 1 1
+ + +
a b c d
, exp log o + z1−α/2
1 1 1 1
+ + +
a b c d
.
Many authors argue that only odds ratios should be reported and used
because of their superior properties over risk differences and risk ratios (Edwards, 1963; Mosteller, 1968). For small sample sizes replacing counts a, b, c,
and d by a + 1/2, b + 1/2, c + 1/2, and d + 1/2 leads to a more stable inference.
384
10 Two Samples
1500
2500
2000
1000
1500
1000
500
500
0
1
2
3
4
5
6
7
0
8
−1.5
−1
−0.5
(a)
0
0.5
1
1.5
2
(b)
Fig. 10.4 For the data leading to Fig. 10.3, the histograms of (a) odds ratios and (b) log odds
ratios are shown.
Risk difference
Relative risk
Parameter
D = p1 − p2
R = p 1 /p 2
Estimator
d = pˆ 1 − pˆ 2
r = pˆ 1 / pˆ 2
St. deviation s d =
p1 q1
n1
+
p2 q2
n2
s log r =
q1
n1 p1
+
Odds ratio
O=
o=
q2
n2 p2
s log o =
p 1 /(1− p 1 )
p 2 /(1− p 2 )
pˆ 1 /(1− pˆ 1 )
pˆ 2 /(1− pˆ 2 )
1
1
1
a+ b+ c
+
1
d
Interpretation of values for RR and OR are provided in the following table:
Value in
[0, 0.4)
[0.4, 0.6)
[0.6, 0.9)
[0.9, 1.1]
(1.1, 1.6]
(1.6, 2.5]
> 2.5
Effect of exposure
Strong benefit
Moderate benefit
Weak benefit
No effect
Weak hazard
Moderate hazard
Strong hazard
Example 10.12. Framingham Data. The table below gives the coronary
heart disease status after 18 years, by level of systolic blood pressure (SBP).
The levels of SBP ≥ 165 are considered as an exposure to a risk factor.
SBP (mmHg) Coronary disease No coronary disease Total
≥ 165
95
201
296
< 165
173
894
1067
Total
268
1095
1363
Find 95% confidence intervals for the risk difference, risk ratio, and odds ratio.
The function
risk.m calculates confidence intervals for risk differences, risk
ratios, and odds ratios and will be used in this example.
10.6 Risks: Differences, Ratios, and Odds Ratios
385
function [rd rdl rdu rr rrl rru or orl oru] = risk(a, b, c, d, alpha)
%%

Disease
No disease
Total
% % Exposed

a
b

n1
% Nonexposed

c
d

n2
% if nargin < 5
alpha=0.05;
end
%n1 = a + b;
n2 = c + d;
hatp1 = a/n1; hatp2 = c/n2;
%risk difference (rd) and CI [rdl, rdu] rd = hatp1  hatp2;
stdrd = sqrt(hatp1 * (1hatp1)/n1 + hatp2 * (1 hatp2)/n2 );
rdl = rd  norminv(1alpha/2) * stdrd;
rdu = rd + norminv(1alpha/2) * stdrd;
%risk ratio (rr) and CI [rrl, rru] rr = hatp1/hatp2;
lrr = log(rr);
stdlrr = sqrt(b/(a * n1) + d/(c*n2));
lrrl = lrr  norminv(1alpha/2)*stdlrr;
rrl = exp(lrrl);
lrru = lrr + norminv(1alpha/2)*stdlrr;
rru = exp(lrru);
%odds ratio (or) and CI [orl, oru] or = ( hatp1/(1hatp1) )/(hatp2/(1hatp2))
lor = log(or);
stdlor = sqrt(1/a + 1/b + 1/c + 1/d);
lorl = lor  norminv(1alpha/2)*stdlor;
orl = exp(lorl);
loru = lor + norminv(1alpha/2)*stdlor;
oru = exp(loru);
The solution is:
[rd rdl rdu rr rrl rru or orl oru] = risk(95,201,173,894)
%rd = 0.1588
%[rdl, rdu] = [0.1012, 0.2164]
%rr =
1.9795
%[rrl, rru]= [1.5971, 2.4534]
%or = 2.4424
%[orl, oru] = [1.8215,
3.2750]
Example 10.13. Retrospective Analysis of Smoking Habits. This example is adopted from Johnson and Albert (1999), who use data collected in a
386
10 Two Samples
study by Dorn (1954). A sample of 86 lungcancer patients and a sample of
86 controls were questioned about their smoking habits. The two groups were
chosen to represent random samples from a subpopulation of lungcancer patients and an otherwise similar population of cancerfree individuals. Of the
cancer patients, 83 out of 86 were smokers; among the control group, 72 out
of 86 were smokers. The scientific question of interest was to assess the difference between the smoking habits in the two groups. Uniform priors on the
population proportions were used as a noninformative choice.
model{
for(i in 1:2){
r[i] ~ dbin(p[i],n[i])
p[i] ~ dunif(0,1)
}
RD < p[1]  p[2]
RD.gt0 < step(RD)
RR < p[1]/p[2]
RR.gt1 < step(RR  1)
OR < (p[1]/(1p[1]))/(p[2]/(1p[2]))
OR.gt1 < step(OR  1)
}
DATA
list(r=c(83,72),n=c(86,86))
INITS
#Generate Inits
OR
OR.gt1
RD
RD.gt0
RR
RR.gt1
p[1]
p[2]
mean
5.818
0.9978
0.125
0.9978
1.153
0.9978
0.9546
0.8296
sd
4.556
0.04675
0.0455
0.04675
0.06276
0.04675
0.02209
0.03991
MC error val2.5pc median val97.5pc start sample
0.01398
1.556 4.613
17.29 1001 100000
1.469E4
1.0
1.0
1.0 1001 100000
1.478E4
0.0385 0.1237
0.2179 1001 100000
1.469E4
1.0
1.0
1.0 1001 100000
2.038E4
1.044 1.148
1.291 1001 100000
1.469E4
1.0
1.0
1.0 1001 100000
7.06E5
0.9022 0.958
0.9873 1001 100000
1.26E4
0.7444 0.8322
0.9002 1001 100000
Note that 95% credible sets for the risk ratio and odds ratio are above 1,
and that the set for the risk difference does not contain 0. By all three measures the proportion of smokers among subjects with cancer is significantly
larger than the proportion among the controls. In Bayesian testing the hyp
p
potheses H1 : p 1 > p 2 , H1 : p 1 /p 2 > 1, and H1 : 1− 1p1 1− 2p2 > 1 have posterior
probabilities of 0.9978 each. Therefore, in this retrospective study, smoking
status is indicated as a significant risk factor for lung cancer.
10.7 Two Poisson Rates*
387
10.7 Two Poisson Rates*
There are several methods for devising confidence intervals on differences or
the ratios of two Poisson rates. We will focus on the method for the ratio that
modifies wellknown binomial confidence intervals.
Let X 1 ∼ P oi(λ1 t 1 ) and X 2 ∼ P oi(λ2 t 2 ) be two Poisson counts with rates
λ1 and λ2 observed during time intervals of length t 1 and t 2 .
We are interested the confidence interval for the ratio λ = λ1 /λ2 .
Since X 1 , given the sum X 1 + X 2 = n, is binomial B in(n, p) with p =
λ1 t 1
(Exercise 5.5), the strategy is to find the confidence interval for p
λ1 t 1 +λ2 t 2
and, from its confidence bounds LB p and UB p , work out the bounds for the
ratio λ.
LBλ =
LB p
t2
1 − LB p t 1
UBλ =
UB p
t2
.
1 − UB p t 1
For finding the LB p and UB p several methods are covered in Chap. 7. Note
that there pˆ = X 1 /n and n = X 1 + X 2 .
The design question can be addressed as well, but the “sample size” formulation needs to be expressed in terms of sampling durations t 1 and t 2 .
The sampling time frames t 1 and t 2 , if assumed equal, can be determined on
the basis of elicited precision for the confidence interval and preliminary estimates of the rates. Let λ1 and λ2 be preexperimental assessments of the rates
and let the precision be elicited in the form of (a) the length of the interval
UBλ − LBλ = w or (b) the ratio of the bounds UBλ /LBλ = w.
Then, for achieving (1 − α)100% confidence with an interval of length w,
the sampling time frame required is
(a)
t (= t 1 = t 2 ) =
z12−α/2 1/λ1 + 1/λ2
arcsin
λ2
λ1
×w
2
and
(b)
t (= t 1 = t 2 ) =
4z12−α/2 1/λ1 + 1/λ2
log2 (w)
.
Example 10.14. Wire Failures. Price and Bonett (2000) provide an example with data from Gardner and Ringlee (1968), who found that bare wire
had X 1 = 69 failures in a sample of t 1 = 1079.6 thousand footyears, and a
388
10 Two Samples
polyethylenecovered tree wire had X 2 = 12 failures in a sample of t 2 = 467.9
thousand footyears. We are interested in a 95% confidence interval for the
ratio of population failure rates.
The associated MATLAB file
ratiopoissons.m calculates the 95% confidence interval for the ratio λ = λ1 /λ2 using Wilson’s proposal (“add two successes and two failures”). There, pˆ = (X 1 + 2)/(n + 4), and the interval for p
is [0.7564, 0.9141]. After transforming the bounds to the λ domain, the final
interval is [1.3461, 4.6147].
Suppose we want to replicate this study using a new shipment of each type
of wire. We want to estimate the failure rate ratio with 99% confidence and
UBλ /LBλ = 2. Using λ1 = 69/1079.6 = 0.0691 and λ2 = 12/467.9 = 0.0833 as our
2
+1/0.0833)
=
planning estimates of λ1 and λ 2 , we would sample t = 4(2.5758) (1/0.0691
2
log (2)
3018 footyears from each shipment. If we want to complete the study in k
years, then we would sample 3018/k linear feet of wire from each shipment.
%CI for Ratio of Two Poissons
X1=69; t1 = 1079.6;
X2=12; t2=467.9;
n=X1 + X2;
phat = X1/n; %0.8519
phat1 = (X1 +2)/(n + 4); %0.8353
qhat1 = 1  phat1;
%0.1647
% AgrestiCoull CI for prop was selected.
LBp=phat1norminv(0.975)*sqrt(phat1*qhat1/(n+4)) %0.7564
UBp=phat1+norminv(0.975)*sqrt(phat1*qhat1/(n+4)) %0.9141
LBlam = LBp/(1  LBp) * t2/t1; %back to lambda
UBlam = UBp/(1  UBp) * t2/t1;
[LBlam, UBlam]
%[1.3461
4.6147]
%Frame size in Poisson Sampling
lambar1 = 69/1079.6; %0.0639
lambar2 = 12/467.9;
%0.0256
w = 2;
td =4* norminv(0.995)^2 *(1/lambar1+1/lambar2)/...
(asin(lambar2/lambar1 * w/2));
%3511.8
tr = 4 * norminv(0.995)^2 *...
( 1/lambar1 + 1/lambar2 )/(log(w))^2;
%3018.1
Cox (1953) gives an approximate test and confidence interval for the ratio
that uses an F distribution. He shows that the statistic
F=
t 1 λ1 X 2 + 1/2
t 2 λ2 X 1 + 1/2
has an approximate F distribution with 2X 1 +1 and 2X 2 +1 degrees of freedom.
From this, an approximate (1 − α)100% confidence interval for λ1 /λ2 is
t 2 X 1 + 1/2
F2X 1 +1,2X 2 +1,α/2 ,
t 1 X 2 + 1/2
t 2 X 1 + 1/2
F2X 1 +1,2X 2 +1,1−α/2 .
t 1 X 2 + 1/2
10.8 Equivalence Tests*
389
In the context of Example 10.14, the 95% confidence interval for the ratio λ1 /λ2
is [1.3932, 4.7497].
%Cox
LBlamc= t2/t1*(X1+1/2)/(X2+1/2)*finv(0.025, 2*X1+1, 2*X2+1);
UBlamc= t2/t1*(X1+1/2)/(X2+1/2)*finv(0.975, 2*X1+1, 2*X2+1);
[LBlamc, UBlamc]
%1.3932
4.7497
Note that this interval does not contain 1, which is equivalent to a rejection
of H0 : λ1 = λ2 in a test against the twosided alternative, at the level α = 0.05.
The test of H0 : λ1 = λ2 can be conducted using the statistic
F=
t 1 X 2 + 1/2
,
t 2 X 1 + 1/2
which under H0 has an F distribution with d f 1 = 2X 1 + 1 and d f 2 = 2X 2 + 1
degrees of freedom.
Alternative
αlevel rejection region
pvalue
H1 : λ1 < λ2
[F d f 1 ,d f 2 ,1−α , ∞)
1fcdf(F,df1,df2)
H1 : λ1 = λ2 [0, F d f 1 ,d f 2 ,α/2 ] ∪ [F d f 1 ,d f 2 ,1−α/2 , ∞) 2*fcdf(min(F,1/F),df1,df2)
H1 : λ1 > λ2
[0, F d f 1 ,d f 2 ,α ]
fcdf(F,df1,df2)
In Example 10.14, the failure rate λ1 for the bare wire is found to be significantly larger (pvalue of 0.00066) than that of polyethylenecovered wire,
λ2 .
%test against H_1: lambda1 > lambda2
pval =fcd(t1/t2*(X2+1/2)/(X1+1/2), 2*X1 + 1, 2*X2 + 1)
%6.6417e004
10.8 Equivalence Tests*
In standard testing of two means, the goal is to show that one population mean
is significantly smaller, larger, or different than the other. The null hypothesis
is that there is no difference between the means. By not rejecting the null,
the equality of means is not established – the test simply did not find enough
statistical evidence for the alternative hypothesis. Absence of evidence is not
evidence of absence.
In many situations (drug and medical procedure testing, device performance, etc.), one wishes to test the equivalence hypothesis, which states that
the population means or population proportions differ for no more than a small
tolerance value preset by a regulatory agency. If, for example, manufacturers
of a generic drug are able to demonstrate bioequivalence to the brandname
390
10 Two Samples
product, they do not need to conduct costly clinical trials in order to demonstrate the safety and efficacy of their generic product. More importantly, established bioequivalence protects the public from unsafe or ineffective drugs.
In this kind of inference it is desired that “no difference” constitutes the
research hypothesis H1 and that significance level α relates to the probability
of falsely rejecting the hypothesis that there is a difference when in fact the
means are equivalent. In other words, we want to control the type I error and
design the power properly in this context.
In drug equivalence testing typical measurements are the area under the
concentration curve (AUC) or maximum concentration (C max ). The two drugs
are bioequivalent if the population means of the AUC and C max are sufficiently
close.
Let η T denote the population mean AUC for the generic (test) drug and let
η R denote the population mean for the brandname (reference) drug.
We are interested in testing
H0 : η T /η R < δL or η T /η R > δU
versus
H1 : δL ≤ η T /η R ≤ δU ,
where δL and δU are the lower and upper tolerance limits, respectively. The
FDA recommends δL = 4/5 and δU = 5/4 (FDA, 2001).
This hypothesis can be tested in the domain of original measurements
(Berger and Hsu, 1996) or after taking the logarithm. This second approach
is more common in practice since (i) AUC and Cmax measurements are consistent with the lognormal distribution (the pharmacokinetic rationale based
on multiplicative compartmental models) and (ii) normal theory can be applied
to logarithms of observations. The FDA also recommends a logtransformation
of data by providing three rationales: clinical, pharmacokinetic, and statistical
(FDA, 2001, Appendix D).
Since for lognormal distributions the mean η is connected with the parameters of the associated normal distribution, µ and σ2 (p. 218), by assuming
equal variances we get η T = exp{µT + σ2 /2} and η R = exp{µR + σ2 /2}. The equivalence hypotheses for the logtransformed data now take the form
H0 : µT − µR ≤ θL or µT − µR ≥ θU ,
versus
H1 : θL < µT − µR < θU ,
where θL = log(δL ) and θU = log(δU ) are known constants. Note that if δU =
1/δL , then the bounds θL and θU are symmetric about zero, θL = −θU .
Equivalence testing is an active research area and many classical and
Bayesian solutions exist, as dictated by experimental designs in practice. The
monograph by Wellek (2010) provides comprehensive coverage. We focus only
on the case of testing the equivalence of two population means when unknown
population variances are the same.
TOST. Schuirmann (1981) proposed two onesided tests (TOSTs) for testing
bioequivalence. Two tstatistics are calculated:
10.8 Equivalence Tests*
tL =
391
X T − X R − θL
sp
1/n 1 + 1/n 2
and
tU =
X T − X R − θU
sp
1/n 1 + 1/n 2
,
where X T and X R are test and reference means, n 1 and n 2 are test and reference sample sizes, and s p is the pooled sample standard deviation, as on
p. 357. Note that here, the test statistic involves the acceptable bounds θL
and θU in the numerator, unlike the standard twosample ttest, where the
numerator would be X T − X R .
The TOST is now carried out as follows.
(i) Using the statistic t L , test H0 : µT − µR = θL versus H1 : µT − µR > θL .
(ii) Using the statistic tU , test H0 : µT − µR = θU versus H1 : µT − µR < θU .
(iii) Reject H0 at level α, that is, declare the drugs equivalent if both hypotheses H0
and H0 are rejected at level α, that is, if
t L > t n1 +n2 −2,1−α
and
tU < t n1 +n2 −2,α .
Equivalently, if p L and pU are the pvalues associated with statistics t L and tU , H0
is rejected when max{ p L , pU } < α.
Westlake’s Confidence Interval. An equivalent methodology to test for
equivalence is Westlake’s confidence interval (Westlake, 1976). Bioequivalence
is established at significance level α if a tinterval of confidence (1 − 2α)100%
is contained in the interval (θL , θU ):
X T − X R − t n1 +n2 −2,1−α s p
1/n 1 + 1/n 2 ,
X T − X R + t n1 +n2 −2,1−α s p
1/n 1 + 1/n 2 ∈ (θL , θU ).
Here, the usual t n1 +n2 −2,1−α/2 is replaced by t n1 +n2 −2,1−α , and Westlake’s
interval coincides with the standard (1 − 2α)100% confidence interval for a
difference of normal means.
Example 10.15. Equivalence of Generic and BrandName Drugs.
A
manufacturer wishes to demonstrate that their generic drug for a particular
metabolic disorder is equivalent to a brandname drug. One indication of the
disorder is an abnormally low concentration of levocarnitine, an amino acid
derivative, in the plasma. Treatment with the brandname drug substantially
increases this concentration.