Tải bản đầy đủ
6 Risks: Differences, Ratios, and Odds Ratios

# 6 Risks: Differences, Ratios, and Odds Ratios

Tải bản đầy đủ

10.6 Risks: Differences, Ratios, and Odds Ratios

Exposed (E)
Nonexposed (E c )
Total

381

Disease present (D) No disease present (C)
a
b
c
d
m1 = a + c

m2 = b + d

Total
n1 = a + b
n2 = c + d
n = a+b+c+d

In clinical trial studies, the risk factor status (E/E c ) can be replaced by a treatment/control or new treatment/old treatment, while the disease status (D/D c )
can be replaced by a improvement/nonimprovement.
Remark. In the context of epidemiology, the studies leading to tabulated
data can be prospective and retrospective. In a prospective study, a group of
n disease-free individuals is identified and followed over a period of time. At
the end of the study, the group, typically called the cohort, is assessed and
tabulated with respect to disease development and exposure to the risk factor
of interest.
In a retrospective study, groups of m 1 individuals with the disease (cases)
and m 2 disease-free individuals (controls) are identified and their prior exposure histories are assessed. In this case, the table summarizes the numbers of
exposure to the risk factor under consideration among the cases and controls.

10.6.1 Risk Differences
Let p 1 and p 2 be the population risks of a disease for exposed and nonexposed (control) subjects. These are probabilities that the subjects will develop
the disease during the fixed interval of time for the two groups, exposed and
nonexposed.
Let pˆ 1 = a/n 1 be an estimator of the risk of a disease for exposed subjects
and pˆ 2 = c/n 2 be an estimator of the risk of that disease for control subjects.
The (1 − α)100% confidence interval for the risk difference coincides with
the confidence interval for the difference of proportions from p. 378:

pˆ 1 − pˆ 2 ±

z1−α/2

pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
.
n1
n2

Sometimes, better precision is achieved by a confidence interval with continuity corrections:

382

10 Two Samples

pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
,
n1
n2

pˆ 1 − pˆ 2 ± (1/(2n 1 ) + 1/(2n 2 )) − z1−α/2

pˆ 1 − pˆ 2 ± (1/(2n 1 ) + 1/(2n 2 )) + z1−α/2

pˆ 1 (1 − pˆ 1 ) pˆ 2 (1 − pˆ 2 )
+
,
n1
n2

where the sign of the correction factor 1/(2n 1 ) + 1/(2n 2 ) is taken as “+” if pˆ 1 −
pˆ 2 < 0 and as “−” if pˆ 1 − pˆ 2 > 0. The recommended sample sizes for the validity
of the interval should satisfy min{ n 1 p 1 (1 − p 1 ), n 2 p 2 (1 − p 2 )} ≥ 10. For large
sample sizes, the difference between “continuity-corrected” and uncorrected
intervals is negligible.

10.6.2 Risk Ratio
The risk ratio in a population is the quantity R = p 1 /p 2 . It is estimated by
r = pˆ 1 / pˆ 2 . The empirical distribution of r does not have a simple form, and,
moreover, it is typically skewed (Fig. 10.3b). If the logarithm is taken, the risk
ratio is “symmetrized,” the log ratio is equivalent to the difference between
logarithms, and, given the independence of populations, the CLT applies. It is
evident in Fig. 10.3c that the log risk ratios are well approximated by a normal
distribution.
1400

1200
2000

1200

1000
1000

1500

800

800
600

1000

600

400

400
500

200
0

200
−0.2

−0.1

0

(a)

0.1

0.2

0

1

2

3

(b)

4

5

0

−1

−0.5

0

0.5

1

1.5

(c)

Fig. 10.3 Two samples of size 10,000 are generated from B in(80, 0.21) and B in(60, 0.25)
populations and risks pˆ 1 and pˆ 2 are estimated for each pair. The panels show histograms of
(a) risk differences, (b) risk ratios, and (c) log risk ratios.

The following MATLAB code (
simulrisks.m) simulates 10,000 pairs
from B in(80, 0.21) and B in(60, 0.25) populations representing exposed and

10.6 Risks: Differences, Ratios, and Odds Ratios

383

nonexposed subjects. From each pair risks are assessed and histograms of risk
differences, risk ratios, and log risk ratios are shown in Fig. 10.3a–c.
disexposed = binornd(60, 0.25, [1 10000]);
disnonexposed = binornd(80, 0.21, [1 10000]);
p1s = disexposed/60; p2s =disnonexposed/80;
figure; hist(p1s - p2s, 25)
figure; hist(p1s./p2s, 25)
figure; hist( log( p1s./p2s ), 25 )

10.6.3 Odds Ratios
For a particular proportion, p, the odds are defined as
p 1 and p 2 , the odds ratio is defined as O =
pˆ /(1− pˆ )

p
1− p . For two proportions

p 1 /(1− p 1 )
p 2 /(1− p 2 ) , and its sample counterpart

is o = pˆ 12 /(1− pˆ 12 ) .
As evident in Fig. 10.4, the odds ratio is symmetrized by the log transformation, and it is the log domain where the normal approximations are
used. The sample standard deviation for log o is s log o =
the (1 − α)100% confidence interval for the log odds ratio is

log o − z1−α/2

1 1 1 1
+ + + , log o + z1−α/2
a b c d

1
a

+ 1b + 1c + d1 , and

1 1 1 1
+ + +
.
a b c d

Of course, the confidence interval for the odds ratio is obtained by taking
the exponents of the bounds:

exp log o − z1−α/2

1 1 1 1
+ + +
a b c d

, exp log o + z1−α/2

1 1 1 1
+ + +
a b c d

.

Many authors argue that only odds ratios should be reported and used
because of their superior properties over risk differences and risk ratios (Edwards, 1963; Mosteller, 1968). For small sample sizes replacing counts a, b, c,
and d by a + 1/2, b + 1/2, c + 1/2, and d + 1/2 leads to a more stable inference.

384

10 Two Samples

1500

2500

2000
1000

1500

1000
500

500

0

1

2

3

4

5

6

7

0

8

−1.5

−1

−0.5

(a)

0

0.5

1

1.5

2

(b)

Fig. 10.4 For the data leading to Fig. 10.3, the histograms of (a) odds ratios and (b) log odds
ratios are shown.

Risk difference

Relative risk

Parameter

D = p1 − p2

R = p 1 /p 2

Estimator

d = pˆ 1 − pˆ 2

r = pˆ 1 / pˆ 2

St. deviation s d =

p1 q1
n1

+

p2 q2
n2

s log r =

q1
n1 p1

+

Odds ratio
O=
o=

q2
n2 p2

s log o =

p 1 /(1− p 1 )
p 2 /(1− p 2 )
pˆ 1 /(1− pˆ 1 )
pˆ 2 /(1− pˆ 2 )
1
1
1
a+ b+ c

+

1
d

Interpretation of values for RR and OR are provided in the following table:
Value in
[0, 0.4)
[0.4, 0.6)
[0.6, 0.9)
[0.9, 1.1]
(1.1, 1.6]
(1.6, 2.5]
> 2.5

Effect of exposure
Strong benefit
Moderate benefit
Weak benefit
No effect
Weak hazard
Moderate hazard
Strong hazard

Example 10.12. Framingham Data. The table below gives the coronary
heart disease status after 18 years, by level of systolic blood pressure (SBP).
The levels of SBP ≥ 165 are considered as an exposure to a risk factor.
SBP (mmHg) Coronary disease No coronary disease Total
≥ 165
95
201
296
< 165
173
894
1067
Total

268

1095

1363

Find 95% confidence intervals for the risk difference, risk ratio, and odds ratio.
The function
risk.m calculates confidence intervals for risk differences, risk
ratios, and odds ratios and will be used in this example.

10.6 Risks: Differences, Ratios, and Odds Ratios

385

function [rd rdl rdu rr rrl rru or orl oru] = risk(a, b, c, d, alpha)
%-------%
|
Disease
No disease
Total
% -------------------------------------------------------% Exposed
|
a
b
|
n1
% Nonexposed
|
c
d
|
n2
% -------------------------------------------------------if nargin < 5
alpha=0.05;
end
%-------n1 = a + b;
n2 = c + d;
hatp1 = a/n1; hatp2 = c/n2;
%-------risk difference (rd) and CI [rdl, rdu] --------rd = hatp1 - hatp2;
stdrd = sqrt(hatp1 * (1-hatp1)/n1 + hatp2 * (1- hatp2)/n2 );
rdl = rd - norminv(1-alpha/2) * stdrd;
rdu = rd + norminv(1-alpha/2) * stdrd;
%----------risk ratio (rr) and CI [rrl, rru] ----------rr = hatp1/hatp2;
lrr = log(rr);
stdlrr = sqrt(b/(a * n1) + d/(c*n2));
lrrl = lrr - norminv(1-alpha/2)*stdlrr;
rrl = exp(lrrl);
lrru = lrr + norminv(1-alpha/2)*stdlrr;
rru = exp(lrru);
%---------odds ratio (or) and CI [orl, oru] -----------or = ( hatp1/(1-hatp1) )/(hatp2/(1-hatp2))
lor = log(or);
stdlor = sqrt(1/a + 1/b + 1/c + 1/d);
lorl = lor - norminv(1-alpha/2)*stdlor;
orl = exp(lorl);
loru = lor + norminv(1-alpha/2)*stdlor;
oru = exp(loru);

The solution is:
[rd rdl rdu rr rrl rru or orl oru] = risk(95,201,173,894)
%rd = 0.1588
%[rdl, rdu] = [0.1012, 0.2164]
%rr =
1.9795
%[rrl, rru]= [1.5971, 2.4534]
%or = 2.4424
%[orl, oru] = [1.8215,

3.2750]

Example 10.13. Retrospective Analysis of Smoking Habits. This example is adopted from Johnson and Albert (1999), who use data collected in a

386

10 Two Samples

study by Dorn (1954). A sample of 86 lung-cancer patients and a sample of
86 controls were questioned about their smoking habits. The two groups were
chosen to represent random samples from a subpopulation of lung-cancer patients and an otherwise similar population of cancer-free individuals. Of the
cancer patients, 83 out of 86 were smokers; among the control group, 72 out
of 86 were smokers. The scientific question of interest was to assess the difference between the smoking habits in the two groups. Uniform priors on the
population proportions were used as a noninformative choice.
model{
for(i in 1:2){
r[i] ~ dbin(p[i],n[i])
p[i] ~ dunif(0,1)
}
RD <- p[1] - p[2]
RD.gt0 <- step(RD)
RR <- p[1]/p[2]
RR.gt1 <- step(RR - 1)
OR <- (p[1]/(1-p[1]))/(p[2]/(1-p[2]))
OR.gt1 <- step(OR - 1)
}
DATA
list(r=c(83,72),n=c(86,86))
INITS
#Generate Inits

OR
OR.gt1
RD
RD.gt0
RR
RR.gt1
p[1]
p[2]

mean
5.818
0.9978
0.125
0.9978
1.153
0.9978
0.9546
0.8296

sd
4.556
0.04675
0.0455
0.04675
0.06276
0.04675
0.02209
0.03991

MC error val2.5pc median val97.5pc start sample
0.01398
1.556 4.613
17.29 1001 100000
1.469E-4
1.0
1.0
1.0 1001 100000
1.478E-4
0.0385 0.1237
0.2179 1001 100000
1.469E-4
1.0
1.0
1.0 1001 100000
2.038E-4
1.044 1.148
1.291 1001 100000
1.469E-4
1.0
1.0
1.0 1001 100000
7.06E-5
0.9022 0.958
0.9873 1001 100000
1.26E-4
0.7444 0.8322
0.9002 1001 100000

Note that 95% credible sets for the risk ratio and odds ratio are above 1,
and that the set for the risk difference does not contain 0. By all three measures the proportion of smokers among subjects with cancer is significantly
larger than the proportion among the controls. In Bayesian testing the hyp
p
potheses H1 : p 1 > p 2 , H1 : p 1 /p 2 > 1, and H1 : 1− 1p1 1− 2p2 > 1 have posterior
probabilities of 0.9978 each. Therefore, in this retrospective study, smoking
status is indicated as a significant risk factor for lung cancer.

10.7 Two Poisson Rates*

387

10.7 Two Poisson Rates*
There are several methods for devising confidence intervals on differences or
the ratios of two Poisson rates. We will focus on the method for the ratio that
modifies well-known binomial confidence intervals.
Let X 1 ∼ P oi(λ1 t 1 ) and X 2 ∼ P oi(λ2 t 2 ) be two Poisson counts with rates
λ1 and λ2 observed during time intervals of length t 1 and t 2 .
We are interested the confidence interval for the ratio λ = λ1 /λ2 .
Since X 1 , given the sum X 1 + X 2 = n, is binomial B in(n, p) with p =
λ1 t 1
(Exercise 5.5), the strategy is to find the confidence interval for p
λ1 t 1 +λ2 t 2
and, from its confidence bounds LB p and UB p , work out the bounds for the
ratio λ.

LBλ =

LB p

t2
1 − LB p t 1

UBλ =

UB p

t2
.
1 − UB p t 1

For finding the LB p and UB p several methods are covered in Chap. 7. Note
that there pˆ = X 1 /n and n = X 1 + X 2 .
The design question can be addressed as well, but the “sample size” formulation needs to be expressed in terms of sampling durations t 1 and t 2 .
The sampling time frames t 1 and t 2 , if assumed equal, can be determined on
the basis of elicited precision for the confidence interval and preliminary estimates of the rates. Let λ1 and λ2 be preexperimental assessments of the rates
and let the precision be elicited in the form of (a) the length of the interval
UBλ − LBλ = w or (b) the ratio of the bounds UBλ /LBλ = w.
Then, for achieving (1 − α)100% confidence with an interval of length w,
the sampling time frame required is
(a)

t (= t 1 = t 2 ) =

z12−α/2 1/λ1 + 1/λ2
arcsin

λ2
λ1

×w
2

and
(b)

t (= t 1 = t 2 ) =

4z12−α/2 1/λ1 + 1/λ2
log2 (w)

.

Example 10.14. Wire Failures. Price and Bonett (2000) provide an example with data from Gardner and Ringlee (1968), who found that bare wire
had X 1 = 69 failures in a sample of t 1 = 1079.6 thousand foot-years, and a

388

10 Two Samples

polyethylene-covered tree wire had X 2 = 12 failures in a sample of t 2 = 467.9
thousand foot-years. We are interested in a 95% confidence interval for the
ratio of population failure rates.
The associated MATLAB file
ratiopoissons.m calculates the 95% confidence interval for the ratio λ = λ1 /λ2 using Wilson’s proposal (“add two successes and two failures”). There, pˆ = (X 1 + 2)/(n + 4), and the interval for p
is [0.7564, 0.9141]. After transforming the bounds to the λ domain, the final
interval is [1.3461, 4.6147].
Suppose we want to replicate this study using a new shipment of each type
of wire. We want to estimate the failure rate ratio with 99% confidence and
UBλ /LBλ = 2. Using λ1 = 69/1079.6 = 0.0691 and λ2 = 12/467.9 = 0.0833 as our
2
+1/0.0833)
=
planning estimates of λ1 and λ 2 , we would sample t = 4(2.5758) (1/0.0691
2
log (2)

3018 foot-years from each shipment. If we want to complete the study in k
years, then we would sample 3018/k linear feet of wire from each shipment.

%CI for Ratio of Two Poissons
X1=69; t1 = 1079.6;
X2=12; t2=467.9;
n=X1 + X2;
phat = X1/n; %0.8519
phat1 = (X1 +2)/(n + 4); %0.8353
qhat1 = 1 - phat1;
%0.1647
% Agresti-Coull CI for prop was selected.
LBp=phat1-norminv(0.975)*sqrt(phat1*qhat1/(n+4)) %0.7564
UBp=phat1+norminv(0.975)*sqrt(phat1*qhat1/(n+4)) %0.9141
LBlam = LBp/(1 - LBp) * t2/t1; %back to lambda
UBlam = UBp/(1 - UBp) * t2/t1;
[LBlam, UBlam]
%[1.3461
4.6147]
%Frame size in Poisson Sampling
lambar1 = 69/1079.6; %0.0639
lambar2 = 12/467.9;
%0.0256
w = 2;
td =4* norminv(0.995)^2 *(1/lambar1+1/lambar2)/...
(asin(lambar2/lambar1 * w/2));
%3511.8
tr = 4 * norminv(0.995)^2 *...
( 1/lambar1 + 1/lambar2 )/(log(w))^2;
%3018.1

Cox (1953) gives an approximate test and confidence interval for the ratio
that uses an F distribution. He shows that the statistic
F=

t 1 λ1 X 2 + 1/2
t 2 λ2 X 1 + 1/2

has an approximate F distribution with 2X 1 +1 and 2X 2 +1 degrees of freedom.
From this, an approximate (1 − α)100% confidence interval for λ1 /λ2 is
t 2 X 1 + 1/2
F2X 1 +1,2X 2 +1,α/2 ,
t 1 X 2 + 1/2

t 2 X 1 + 1/2
F2X 1 +1,2X 2 +1,1−α/2 .
t 1 X 2 + 1/2

10.8 Equivalence Tests*

389

In the context of Example 10.14, the 95% confidence interval for the ratio λ1 /λ2
is [1.3932, 4.7497].
%Cox
LBlamc= t2/t1*(X1+1/2)/(X2+1/2)*finv(0.025, 2*X1+1, 2*X2+1);
UBlamc= t2/t1*(X1+1/2)/(X2+1/2)*finv(0.975, 2*X1+1, 2*X2+1);
[LBlamc, UBlamc]
%1.3932
4.7497

Note that this interval does not contain 1, which is equivalent to a rejection
of H0 : λ1 = λ2 in a test against the two-sided alternative, at the level α = 0.05.
The test of H0 : λ1 = λ2 can be conducted using the statistic
F=

t 1 X 2 + 1/2
,
t 2 X 1 + 1/2

which under H0 has an F distribution with d f 1 = 2X 1 + 1 and d f 2 = 2X 2 + 1
degrees of freedom.

Alternative
α-level rejection region
p-value
H1 : λ1 < λ2
[F d f 1 ,d f 2 ,1−α , ∞)
1-fcdf(F,df1,df2)
H1 : λ1 = λ2 [0, F d f 1 ,d f 2 ,α/2 ] ∪ [F d f 1 ,d f 2 ,1−α/2 , ∞) 2*fcdf(min(F,1/F),df1,df2)
H1 : λ1 > λ2
[0, F d f 1 ,d f 2 ,α ]
fcdf(F,df1,df2)

In Example 10.14, the failure rate λ1 for the bare wire is found to be significantly larger (p-value of 0.00066) than that of polyethylene-covered wire,
λ2 .
%test against H_1: lambda1 > lambda2
pval =fcd(t1/t2*(X2+1/2)/(X1+1/2), 2*X1 + 1, 2*X2 + 1)
%6.6417e-004

10.8 Equivalence Tests*
In standard testing of two means, the goal is to show that one population mean
is significantly smaller, larger, or different than the other. The null hypothesis
is that there is no difference between the means. By not rejecting the null,
the equality of means is not established – the test simply did not find enough
statistical evidence for the alternative hypothesis. Absence of evidence is not
evidence of absence.
In many situations (drug and medical procedure testing, device performance, etc.), one wishes to test the equivalence hypothesis, which states that
the population means or population proportions differ for no more than a small
tolerance value preset by a regulatory agency. If, for example, manufacturers
of a generic drug are able to demonstrate bioequivalence to the brand-name

390

10 Two Samples

product, they do not need to conduct costly clinical trials in order to demonstrate the safety and efficacy of their generic product. More importantly, established bioequivalence protects the public from unsafe or ineffective drugs.
In this kind of inference it is desired that “no difference” constitutes the
research hypothesis H1 and that significance level α relates to the probability
of falsely rejecting the hypothesis that there is a difference when in fact the
means are equivalent. In other words, we want to control the type I error and
design the power properly in this context.
In drug equivalence testing typical measurements are the area under the
concentration curve (AUC) or maximum concentration (C max ). The two drugs
are bioequivalent if the population means of the AUC and C max are sufficiently
close.
Let η T denote the population mean AUC for the generic (test) drug and let
η R denote the population mean for the brand-name (reference) drug.
We are interested in testing
H0 : η T /η R < δL or η T /η R > δU

versus

H1 : δL ≤ η T /η R ≤ δU ,

where δL and δU are the lower and upper tolerance limits, respectively. The
FDA recommends δL = 4/5 and δU = 5/4 (FDA, 2001).
This hypothesis can be tested in the domain of original measurements
(Berger and Hsu, 1996) or after taking the logarithm. This second approach
is more common in practice since (i) AUC and Cmax measurements are consistent with the lognormal distribution (the pharmacokinetic rationale based
on multiplicative compartmental models) and (ii) normal theory can be applied
to logarithms of observations. The FDA also recommends a log-transformation
of data by providing three rationales: clinical, pharmacokinetic, and statistical
(FDA, 2001, Appendix D).
Since for lognormal distributions the mean η is connected with the parameters of the associated normal distribution, µ and σ2 (p. 218), by assuming
equal variances we get η T = exp{µT + σ2 /2} and η R = exp{µR + σ2 /2}. The equivalence hypotheses for the log-transformed data now take the form
H0 : µT − µR ≤ θL or µT − µR ≥ θU ,

versus

H1 : θL < µT − µR < θU ,

where θL = log(δL ) and θU = log(δU ) are known constants. Note that if δU =
1/δL , then the bounds θL and θU are symmetric about zero, θL = −θU .
Equivalence testing is an active research area and many classical and
Bayesian solutions exist, as dictated by experimental designs in practice. The
monograph by Wellek (2010) provides comprehensive coverage. We focus only
on the case of testing the equivalence of two population means when unknown
population variances are the same.
TOST. Schuirmann (1981) proposed two one-sided tests (TOSTs) for testing
bioequivalence. Two t-statistics are calculated:

10.8 Equivalence Tests*

tL =

391

X T − X R − θL
sp

1/n 1 + 1/n 2

and

tU =

X T − X R − θU
sp

1/n 1 + 1/n 2

,

where X T and X R are test and reference means, n 1 and n 2 are test and reference sample sizes, and s p is the pooled sample standard deviation, as on
p. 357. Note that here, the test statistic involves the acceptable bounds θL
and θU in the numerator, unlike the standard two-sample t-test, where the
numerator would be X T − X R .
The TOST is now carried out as follows.
(i) Using the statistic t L , test H0 : µT − µR = θL versus H1 : µT − µR > θL .
(ii) Using the statistic tU , test H0 : µT − µR = θU versus H1 : µT − µR < θU .
(iii) Reject H0 at level α, that is, declare the drugs equivalent if both hypotheses H0
and H0 are rejected at level α, that is, if
t L > t n1 +n2 −2,1−α

and

tU < t n1 +n2 −2,α .

Equivalently, if p L and pU are the p-values associated with statistics t L and tU , H0
is rejected when max{ p L , pU } < α.

Westlake’s Confidence Interval. An equivalent methodology to test for
equivalence is Westlake’s confidence interval (Westlake, 1976). Bioequivalence
is established at significance level α if a t-interval of confidence (1 − 2α)100%
is contained in the interval (θL , θU ):

X T − X R − t n1 +n2 −2,1−α s p

1/n 1 + 1/n 2 ,

X T − X R + t n1 +n2 −2,1−α s p

1/n 1 + 1/n 2 ∈ (θL , θU ).

Here, the usual t n1 +n2 −2,1−α/2 is replaced by t n1 +n2 −2,1−α , and Westlake’s
interval coincides with the standard (1 − 2α)100% confidence interval for a
difference of normal means.
Example 10.15. Equivalence of Generic and Brand-Name Drugs.
A
manufacturer wishes to demonstrate that their generic drug for a particular
metabolic disorder is equivalent to a brand-name drug. One indication of the
disorder is an abnormally low concentration of levocarnitine, an amino acid
derivative, in the plasma. Treatment with the brand-name drug substantially
increases this concentration.