6 EXAMPLE: TWO-WAY TREATMENT STRUCTURE WITH ONE COVARIATE
Tải bản đầy đủ - 0trang
10
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 8.9
SAS® System Code to Perform Preliminary Computations to Compare
WOOD*STOVE models.
*Fit one model to all data;
PROC REG; MODEL ENERGY = MOISTURE;
*Fit a model to each level of wood;
PROC SORT; BY WOOD;
PROC REG; BY WOOD; MODEL ENERGY = MOISTURE;
*Fit a model to each level of stove type;
PROC SORT; BY STOVE;
PROC REG; BY STOVE; MODEL ENERGY = MOISTURE;
*Fit a model to each combination of WOOD*STOVE;
PROC SORT; BY STOVE WOOD;
PROC REG; BY STOVE WOOD; MODEL ENERGY = MOISTURE;
TABLE 8.10
Residual Sums of Squares and Degrees of
Freedom for One Model for All Data, One
Model for Each Wood and One Model
for Each Stove to Compute SSRES(COMBINED),
SSRES(WOOD), and SSRES(STOVE)
(1) One model SSRES(COMBINED) = 326.64870 d.f. = 86
(2) WOOD
Black Walnut
Osage Orange
Red Oak
White Pine
SSRes(WOOD)
(3) STOVE
Type A
Type B
Type C
SSRES(STOVE)
SSRes(WOODi )
19.05508
8.70268
9.66393
10.71815
df
20
22
18
20
48.13984
80
SSRes(STOVEj )
173.27729
70.59879
82.32894
df
28
23
31
326.20502
82
for the levels of wood, the equality of models for the levels of stove, and testing for
wood by stove interaction effects on the models. By summing the model over the
levels of stove for each level of wood, a wood level model can be obtained as
yi ⋅ k = α i . + βi . Mi ⋅ k + εi ⋅ k
© 2002 by CRC Press LLC
Comparing Models for Several Treatments
11
TABLE 8.11
Residual Sums of Squares and Degrees
of Freedom for Various Wood*Stove
Combinations Used to Compute
SSRES(POOLED)
STOVE
Type A
Type A
Type A
Type A
Type B
Type B
Type B
Type B
Type C
Type C
Type C
Type C
WOOD
Black Walnut
Osage Orange
Red Oak
White Pine
Black Walnut
Osage Orange
Red Oak
White Pine
Black Walnut
Osage Orange
Red Oak
White Pine
SSRES(POOLED)
SSRes(Wij ,Sj )
0.10304
0.34459
0.77695
0.66661
0.54988
1.00689
0.25559
1.17348
1.02172
1.47882
1.29265
0.48916
df
4
7
3
8
6
4
4
3
6
7
7
5
9.15941
64
To compute the sum of squares due to test for equality of wood models, fit a model
to the data from each level of wood. The code in the second part of Table 8.9 is
used to fit a model to each level of wood. The sums of squares residuals for each
level of wood are in part (2) of Table 8.10 where SSRES(WOOD) = 48.13984 and
is based on 80 degrees of freedom. The sum of squares due to deviations from the
equal models for the levels of wood hypothesis is SS_WOOD = 326.64870 –
48.13984 = 278.50886 and is based on 86 – 80 = 6 degrees of freedom. The value
of the F statistic is 324.34 as displayed in Table 8.12. The significance level is very
TABLE 8.12
Sums of Squares, Degrees of Freedom and F Statistics
for Wood, Stove and Wood*Stove
σˆ 2 = 0.143116
SSWOOD = 326.64870 – 48.13984 = 317.48929
df = 6
52.91488
Fcw = ---------------------- = 324.34
0.143116
SS STOVE = 326.64870 – 326.20502 = 0.44368
df = 4
0.11092
Fcs = ---------------------- = 0.78
0.143116
SS WOOD × STOVE = 48.13984 + 326.20502 – 326.64820 – 9.159410
= 38.53675, d.f. = 80 + 82 – 86 – 64 = 12
3.21140
Fcws = ---------------------- = 22.4391
0.143116
© 2002 by CRC Press LLC
12
Analysis of Messy Data, Volume III: Analysis of Covariance
small, providing sufficient evidence to conclude the levels of wood models are not
all identical. Recall this testing a Type II hypothesis (Milliken and Johnson (1992)).
Next investigate the effect of the levels of stove on the regression models. This
is accomplished by testing the equality of the models for the levels of stove. By
summing the model over the levels of wood for each level of stove, a stove level
model can be obtained as
y. jk = α. j + β. j M. jk + ε. jk
To compute the sum of squares due to test for equality of stove models, fit a model
to the data from each level of stove. The code in the third part of Table 8.9 is used
to fit a model to each level of stove. The sums of squares residuals for each level
of stove are in part (3) of Table 8.10 where SSRES(STOVE) = 326.20502 and is
based on 82 degrees of freedom. The sum of squares due to deviations from the
equal models for the levels of stove hypothesis is SS_STOVE = 326.64870 –
326.20502 = 0.44368 and is based on 86 – 82 = 4 degrees of freedom. The value
of the F statistic is 0.78 as displayed in Table 8.12. The significance level is very
large, so there is no evidence the levels of stove models are not all identical.
The final step is to investigate the possibility of wood by stove interaction effect
on the regression models. The no interaction effect on the regression models hypothesis is equivalent to the hypothesis that all 2 × 2 table differences (Milliken and
Johnson (1992)) of the vectors of parameters are equal to zero. This no interaction
hypothesis can be stated as
α ij α is α rj α rs
H 0 WS : − − + = 0 for all i, j, s, r vs. Ha : not H 0 WS
βij βis β rj β rs
(
)
The sum of squares due to deviations from the above hypothesis is SS_WOOD ×
STOVE = 48.13984 + 326.20502 – 326.64870 – 9.159410 – 38.53675 and is based
on 80 + 82 – 86 – 64 = 12 degrees of freedom. The value of the F statistic is 22.44
as shown in Table 8.12. The significance level is <0.0001, indicating that there is
sufficient evidence to conclude there is an interaction effect on the wood by stove
models. Table 8.13 contains the analysis of variance table summarizing the above
factorial effects on the models. Remember the test statistics are comparing the
models not just an individual parameter of the models. So the numbers of degrees
of freedom for the main effects and interaction in Table 8.13 are the usual numbers
of degrees of freedom times the number of parameters in a treatment combination
model. Remember, these are Type II sums of squares.
8.7 DISCUSSION
A very useful method is presented to test identical model hypotheses. The method
could be used to perform prior tests of equality of models before examining the
individual parameters in order to control the Type I error rate. That is, one would
© 2002 by CRC Press LLC
Comparing Models for Several Treatments
13
TABLE 8.13
Analysis of Variance Table to Summarize
the Factorial Effects of Wood and Stove
on the Regression Models
Source
WOOD
STOVE
WOOD*STOVE
ERROR
df
6
4
12
64
SS(II)
317.48929
0.44368
38.53675
9.15941
F
324.35
0.78
22.44
only continue with the analysis of covariance process of simplifying the form of the
model if there were adequate evidence to believe the models are not identical. The
process is easily implemented with a multiple regression computer package, i.e.,
these comparisons can be accomplished without a general linear models computer
package.
REFERENCES
Hinds, M. A. and Milliken, G. A. (1987). Statistical methods to use nonlinear models to
compare silage treatments, Biometrical Journal 29(6), 825–834.
Milliken, G. A. and Johnson, D. E. (1992). Analysis of Messy Data, Volume I: Design
Experiments, London: Chapman & Hall.
Schaff, D. A., Milliken, G. A., and Clayberg, C. D. (1988). A method for analyzing nonlinear
models when the data are from a split-plot or repeated measures design, Biometrical
Journal 30(2), 139–146.
EXERCISES
EXERCISE 8.1: Compare the equality of the models for the data in Section 3.4.
EXERCISE 8.2: Compare the equality of the models for the data in Section 4.5.
EXERCISE 8.3: Compare the equality of the models for the data in Section 5.4.
EXERCISE 8.4: Compare the equality of the models for the data in Section 5.7.
EXERCISE 8.5: For the data in Section 5.5, use contrast statements to obtain tests
for equal wood models, equal stove models, and the no interaction hypothesis and
compare the results to those in Section 8.6. The contrast statements will provide
tests of Type III hypotheses.
© 2002 by CRC Press LLC
9
Two Treatments
in a Randomized
Complete Block
Design Structure
9.1 INTRODUCTION
The introduction of blocking into the analysis of covariance models presents another
dimension in the analysis. That dimension involves obtaining information about the
slopes of the lines from the block means or totals, a process called the recovery of
interblock information, as well as obtaining information about the slopes from the
within block comparisons. The recovery of interblock information about treatment
effects is used in the analysis of incomplete block designs, but there is no interblock
information about treatment effects in complete block designs. This chapter develops
the general methodology for analyzing analysis of covariance models when the data
are collected in complete blocks. The next step is to consider the analysis of covariance in incomplete block designs which include split-plot and repeated measures
designs (discussed in later chapters). A simple experiment involving two treatments
in six blocks is used throughout this chapter to demonstrate the various concepts.
The last section gives an example with equal slopes in 20 blocks.
9.2 COMPLETE BLOCK DESIGNS
An experiment was conducted to evaluate the effect of two herbicides on soybean
yield. The experimental design consists of a one-way treatment structure in a randomized complete block design structure. The herbicides were preemergence herbicides and were incorporated into the soil. The activity of the herbicides was thought
to be influenced by the amount of organic matter in the soil, so the organic matter
content of each plot was determined and was used as a possible covariate. For
purposes of discussion, it is assumed that organic matter affects the yield of the
plots linearly. As in any modeling process, this assumption must be substantiated
before the analysis can continue. Since the herbicides were of differing chemical
compositions, there was the possibility of unequal slopes.
A model that can be used to describe the yield of the soybeans grown on a plot
in the jth block treated by the ith herbicide is
© 2002 by CRC Press LLC
2
Analysis of Messy Data, Volume III: Analysis of Covariance
y ij = α i + βi x ij + b j + ε ij , i = 1, 2, j = 1, 2, …, 6,
(9.1)
where yij denotes the observed yield per plot in bushels per acre,
αi denotes the mean response of the ith herbicide when the value of the
covariate (organic matter) is zero,
xij denotes the value of the covariate (organic matter) measured on the experimental unit in the jth block receiving the ith herbicide,
βi is the slope of the regression line for the ith herbicide,
bj is the random block effect associated with the jth block where the block
effects are assumed to be distributed iid N (0, σ b2) , and
εij denotes the random error where the errors are assumed to be distributed
iid N (0, σ ε2 ) .
Model 9.1 is a mixed model in that its parameters include two components of
variance (Milliken and Johnson, 1992, and Littel et al., 1996). To help understand
how the mixed models analysis operates, the within block analysis, the between
block analysis, and the combined within and between block analysis are discussed.
Before the development of mixed models software, the analysis of Model 9.1 could
be carried out in two parts: the Within Block Analysis (that is done by most computer
codes) and the Between Block Analysis (which is not done by most computer codes).
9.3 WITHIN BLOCK ANALYSIS
The within block analysis provides estimates of the slopes which are based on the
within block information, i.e., the estimates of the slopes are based on contrasts of
the observations computed within each block. Within block information is free of
block effects and the variance of a within block estimate is a scalar multiple of σ2ε.
To demonstrate this idea, consider the data in Table 9.1 which represent the yield of
soybeans in bushels per acre where the treatments are two herbicides, the covariate
is the percent organic matter, and there are six blocks.
The within block analysis of Model 9.1 can be carried out by taking contrasts
within the blocks and analyzing the corresponding models. Since there are only two
treatments per block, the only contrast within each block is the difference dj = y1j
– y2j. The within block model is constructed by taking the difference of the models,
i.e., the model for dj is
(
d j = α1 + β1x1 j + b j + ε1 j − α 2 + β2 x 2 j + b j + ε 2 j
)
= α1 + β1x1 j + ε1 j − α 2 − β2 x 2 j − ε 2 j
= α1 − α 2 + β1x1 j − β2 x 2 j + ε1 j − ε 2 j
The difference model is free of the block effects and variance of each difference is
2 σ 2ε. By letting αd = α1 – α2 and ej = ε1j – ε2j, the model for the differences (called
© 2002 by CRC Press LLC
Two Treatments in a Randomized Complete Block Design Structure
3
TABLE 9.1
Yield of Soybeans (in bu/acre) for
Herbicide Treatments with Percent
Organic Matter as a Covariate
in RCB Design Structure
Herbicide 1
Block
1
2
3
4
5
6
Yield
26.6
31.1
34.7
34.4
32.1
28.5
OM
0.91
1.22
1.43
1.45
1.33
1.10
Herbicide 2
Yield
30.2
29.2
32.1
31.9
30.2
31.0
OM
1.02
0.89
1.39
1.47
1.27
1.12
a within block model) can be expressed as the two independent variable multiple
regression model
d j = α d + β1x1 j − β2 x 2 j + e j .
By fitting the multiple regression model to the data in Table 9.1, one obtains estimates
of the estimable functions of the parameters of the original model, i.e., α1 – α2, β1,
β2, and σ 2ε. The matrix form of the model for the differences computed from the
data in Table 9.1 is
−3.60 1
1.90 1
2.60 1
=
2.50 1
2.00 1
−2.50 1
0.91
1.22
1.43
1.45
1.33
1.10
−1.02
−0.89
−1.39
−1.47
−1.27
−1.12
[
α d
β1 + e.
β2
]
or d = Z η + e, where η′ = α d , β1, β2 .
The least squares estimates of the parameters of the model for the differences are
αˆ d
−13.82302281
−1
ˆ
16.95662402 .
=
β
Z
Z
Z
d
=
1 ( ′ ) ′
βˆ 2
5.64513210
© 2002 by CRC Press LLC
(9.2)
4
Analysis of Messy Data, Volume III: Analysis of Covariance
The residual sum of squares for the difference model is 1.44, which is based on
3 degrees of freedom. The residual mean square, 1.44/3 = 0.48, is an estimate of
2 σˆ ε2, the variance of the dj. The estimated covariance matrix of the estimated parameter vector is
2σˆ (Z′Z)
2
ε
−1
3.633765
= −2.0436818
0.8542064
−2.0436818
5.1684106
3.65479537
0.8542064
3.6579537 = Σˆ w .
4.5168137
(9.3)
The estimated covariance matrix corresponding to the within block analysis estimates
of the two slopes is the lower 2 × 2 partition of Σˆ w , or
5.1684106
Σˆ β =
w
3.6579537
3.6579537
.
4.168137
The estimates of the standard errors of the parameter estimates are obtained by
taking the square root of the diagonal elements of the covariance matrix. These
standard errors can be used to construct confidence intervals about the respective
parameters. The above estimates of the parameters are what would be obtained from
a computer code that performs the within block analysis, but there is additional
information about the slopes and treatment effects from the between block analysis.
9.4 BETWEEN BLOCK ANALYSIS
The between block analysis utilizes the model of the block totals and provides
information about the parameters contained in those block totals. Let tj = y1j + y2j
denote the total of the two observations in the jth block. The block total model is
constructed by taking the totals of the corresponding models, i.e.,
t j = α1 + β1x1 j + ε1 j + α 2 + β2 x 2 j + b j + ε 2 j
= α1 + α 2 + β1x1 j + β2 x 2 j + 2 b j + ε1 j + ε 2 j
which can be expressed as: tj = αt + β1x1j + β2x2j + rj, where αt = α1 + α2 and rj =
2bj + ε1j + ε2j. The variance of a block total is 2(σε2 + 2σ2b). A multiple regression
program can be used to fit the above model to the block totals and obtain the between
block estimates of αt, β1, β2, and 2(σε2 + 2σ2b).
The matrix form of the block total model for the data in Table 9.1 is
56.8 1
60.3 1
66.8 1
=
66.3 1
62.4 1
59.5 1
© 2002 by CRC Press LLC
0.91
1.22
1.43
1.45
1.33
1.10
1.02
0.89
1.39
1.47
1.27
1.12
α t
β1 + r.
β2
Two Treatments in a Randomized Complete Block Design Structure
[
5
]
or t = M τ + e, where τ′ = α t , β1, β2 .
The between block estimates are
αˆ t
38.48854914
βˆ 1 = (M′ M)−1 M′t = 13.83912245
βˆ 2
5.32201593
(9.4)
The residual sum of squares from the between block model is 3.25 and is based
on 3 degrees of freedom. The between block residual mean square is 3.25/3, which
is an estimated 2(σε2 + 2σ2b). The estimated covariance matrix of the between block
estimates is
8.218374
(3.25 3) (M′M)−1 = −4.622627
−1.932139
−4.622627
−1.932139
−8.274010 = Σˆ b .
10.216685
11.690547
−8.274010
(9.5)
The estimated covariance matrix corresponding to the between block analysis estimates of the two slopes is the lower 2 × 2 partition of Σˆ b, or
11.690547
Σˆ β =
b
−8.274010
−8.274010
.
10.211685
9.5 COMBINING WITHIN BLOCK AND BETWEEN
BLOCK INFORMATION
The vector of parameters for Model 9.1 is = [α1, α2, β1, β2]′. The within block
model provides estimates of 1 = [α1 – α2, β1, β2]′ and the between block model
provides estimates of 2 = [α1 + α2, β1, β2]. 1 and 2 are linear transforms of
expressed as 1 = H1 and 2 = H2 where
1
H1 = 0
0
−1
0
0
0
1
0
0
1
0 and H2 = 0
0
1
1
0
0
0
1
0
The estimators can be expressed as beta-hat models (Chapter 6)
ˆ 1 = H1 + e1 where e1 ~ N(0, Σ W )
and
ˆ 2 = H2 + e2 where e2 ~ N(0, Σ b )
© 2002 by CRC Press LLC
0
0 .
1
6
Analysis of Messy Data, Volume III: Analysis of Covariance
The two estimators are independently distributed; thus the joint model is
θˆ 1 H1
0 Σ W
e1
e1
ˆ = θ + where ~ N ,
θ2 H2
e 2
e 2
0 0
0
.
Σ b
If the variances are known, then the BLUE (Best Linear Unbiased Estimator) of θ is
[
θˆ B = H1′Σ −W1H1 + H′2 Σ −b1H2
]
−1
[H Σ
1
−1
W 1
θˆ + H′2 Σ −b1θˆ 2
]
with sampling distribution
[
(
θˆ B ~ N θ, H1′Σ w−1H1 + H′2 Σ −b1H2
)
−1
]
or θˆ B ~ N(θ, Σθ).
The variance components are most likely unknown; thus the weighted least
squares combined estimator of θ is
−1
−1
−1
−1
−1
θˆ B = H1′ Σˆ H1 + H′2 Σˆ H2 H1 Σˆ θˆ 1 + H2 Σˆ θˆ 2
⋅b
⋅ b
⋅ W
⋅ W
with approximate estimated covariance matrix
() [
Var θˆ = H1′Σˆ −W1H1 + H′2 Σˆ −b1H2
]
−1
= Σˆ θ .
θˆ is the mixed models estimate of the parameters obtained when the above estimates
of the variance components are used in place of the actual variance components
(Littell et al., 1996). The combined estimate of θ for the data in Table 9.1 is
αˆ 1 11.94177
ˆ
ˆθ = α 2 = 25.41262
βˆ 1 15.55772
ˆ
β2 4.48663
with estimated covariance matrix
2.6379
0.7031
Σˆ θ =
−2.0748
−0.5681
0.7031
2.1475
−0.5467
−1.7450
−2.0748
−0.5467
1.6733
0.4581
−0.5681
−1.7450
0.4581
1.4623
where θˆ 1, Σˆ w , θˆ 2, and Σˆ b are from equations 9.2, 9.3, 9.4, and 9.5, respectively.
© 2002 by CRC Press LLC
Two Treatments in a Randomized Complete Block Design Structure
7
The combined estimate of the slopes should be used when the variance of the
combined estimate is smaller than the variance of the within block estimates of the
slopes (Ash, 1982). When the number of blocks is small, the between block information may not be useful. There needs to be one more block than treatments before
a between block estimate can be computed and there needs to be more blocks in
order to obtain a within block residual mean square with adequate degrees of freedom.
The within block model was used to obtain estimates of α1 – α2, β1, and β2 and
the between block model was used to obtain estimates of α1 + α2, β1, and β2. Neither
the within block model nor the between block model provides estimates of α1 and
α2, but the combined estimator does provide estimates of all of the parameters, α1,
α2, β1, and β2, where α1 = (αt + αd)/2 and α2 = (αt – αd)/2 . For further discussion
on intra-block models (within block), interblock models (between block), and the
process of combining estimators, see John (1971) and Fergen (1997).
Once the estimate of the parameters of θ have been obtained and the covariance
matrix has been estimated, it is generally of interest to estimate linear combinations
of θ, such as a′′θ. Some choices for a are to (1) provide estimates of the regression
models evaluated at X = X0 by letting a′′1 = (1, 0, X0, 0) and a′′2 = (0, 1, 0, X0) or
(2) to provide estimates of the differences of the regression models evaluated at X =
X0 by letting a′ = (1, –1, X0, –X0). The approximate sampling distribution of a linear
combination of θˆ is a′ θˆ ~ N(a′θ, a′Σθa).
9.6 DETERMINING THE FORM OF THE MODEL
Thus far the discussion about the model in Equation 9.1 assumes that the slopes are
unequal. The next step in the analysis, after determining that straight lines are
adequate to describe the data for each treatment, is to test the equality of slopes.
Generally there is sufficient within block information to test the equal slopes hypothesis without considering the between block information, although, if there were
many blocks, a test based on the combined estimate could be quite a bit more
powerful. The model comparison method can be used to construct a statistic based
on the within block information to test the equal slope hypothesis
H 0 : β1 = β2 = β0 vs. H 0 : ( not H 0 :)
where β0 is unspecified.
The model under the conditions of the null hypothesis is
y ij = α i + β0 x ij + b j + ε ij , i = 1, 2, j = 1, 2, …, a.
(9.6)
Let RSS(H0) denote the residual sum of squares for the model under the conditions
of H0: which is based on (2 – 1)(a – 1) – 1 = dfRSS(H0) degrees of freedom, where
“a” is the number of blocks in the experiment. Let RSS denote the residual sum of
squares for the unrestricted Model 9.1, which is based on (2 – 1)(a – 1) – 2 = dfRSS
degrees of freedom. The sum of squares due to deviations from the equal slope
© 2002 by CRC Press LLC