8 EXAMPLE: ONE-WAY TREATMENT STRUCTURE WITH UNEQUAL VARIANCES
Tải bản đầy đủ - 0trang
Analysis of Covariance Models with Heterogeneous Errors
11
TABLE 14.2
Variances for Each of the Study Methods
Using Unequal Slopes Model with Pre
as the Covariate
proc sort data=oneway; by method;
proc reg; model post=pre; by method;
Method
1
2
3
4
5
df
6
4
6
5
5
MS
15.6484
0.2266
16.5795
17.4375
0.7613
TABLE 14.3
PROC GLM Code to Carry Out Levene’s Test of Equality
of Variances
proc glm data = oneway; class method;
model post=method pre*method/ solution;
output out=prebymethod r=r;
data prebymethod; set prebymethod; absr=abs(r);
proc glm data = prebymethod;
class method;
model absr=method; lsmeans method/pdiff;
Source
Model
Error
Corrected Total
df
4
31
35
SS
48.1557
92.4285
140.5842
MS
12.0389
2.9816
FValue
4.04
ProbF
0.0095
Source
Method
df
4
SS
48.1557
MS
12.0389
FValue
4.04
ProbF
0.0095
Method
1
2
3
4
5
LSMean
2.8598
0.3488
3.0895
2.6136
0.5984
_2
0.0113
_3
0.7920
0.0062
_4
0.7848
0.0249
0.5982
_5
0.0167
0.7967
0.0090
0.0367
RowName
1
2
3
4
5
© 2002 by CRC Press LLC
_1
0.0113
0.7920
0.7848
0.0167
0.0062
0.0249
0.7967
0.5982
0.0090
0.0367
12
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 14.4
PROC GLM Code to Provide Tests for Equality
of Variances between Groups (Code) and within
Groups of Study Methods
proc glm data = prebymethod;
class code method;
model absr=code method(code);
lsmean code/pdiff;
Source
Model
Error
Corrected Total
df
4
31
35
SS
48.1557
92.4285
140.5842
MS
12.0389
2.9816
FValue
4.04
ProbF
0.0095
Source
Code
Method(code)
df
1
3
SS(III)
46.8294
1.0468
MS
46.8294
0.3489
FValue
15.71
0.12
ProbF
0.0004
0.9494
Code
1
2
Method
LSMean
2.8543
0.4736
ProbtDiff
0.0004
Effect
Code
Code
statement was used to provide the means of the absolute values of the residuals for
each of the study methods in an attempt to possibly group the methods as to the
magnitude of the variances. The bottom part of Table 14.3 contains the significance
levels for the pairwise comparisons of the study method means of the absolute values
of residuals. The approximate multiple comparison process indicates that methods
2 and 5 have smaller variances than methods 1, 3, and 4. A new variable, code, was
constructed where code=1 for methods 1, 3, and 4 and code=2 for methods 2 and
5. The PROC GLM code in Table 14.4 provides another analysis of the absolute
values of the residuals where the effects in the model are code and method(code).
The Fvalue corresponding to source code provides a Levene’s type test of the
hypothesis of the equality of the two code variances. The significance level is 0.0004,
indicating there is sufficient evidence to conclude that the two variances are not
equal. The Fvalue corresponding to the source method(code) provides a test of the
hypothesis that the three variances for methods 1, 3, and 4 are equal and the two
variances for methods 2 and 5 are equal. The significance level is 0.9494, indicating
there is not sufficient evidence to conclude the variances within each value of code
are unequal. Thus, the simplest model for the variances involves two variances one
for each level of code. Table 14.5 contains the PROC GLM code used to compute
the mean square residual for each code. The variance for code =1 is 16.5033 which
is based on 17 degrees of freedom and for code=2 is 0.5237 which is based on
9 degrees of freedom. These estimates of the variances can be computed by pooling
the respective variances from Table 14.2.
Another method that can be used to determine the adequate form of the variance
part of the model is to use some information criteria (Littell et al., 1996) that is
© 2002 by CRC Press LLC
Analysis of Covariance Models with Heterogeneous Errors
13
TABLE 14.5
Variances for Each Group of Study
Methods (Code)
proc sort data=oneway; by code;
proc glm data=oneway; class method;
model post=method pre*method; by code;
Code
1
2
df
17
9
MS
16.5033
0.5237
TABLE 14.6
PROC MIXED Code to Fit Equal Variance Model with Unequal Slopes
to the Study Method Data
proc mixed data = oneway cl covtest ic;
class method;
model post=method pre pre*method/ddfm=satterth solution;
CovParm
Residual
Neg2LogLike
175.3
Effect
method
pre
pre*method
Estimate
10.9719
StdErr
3.0430
ZValue
3.61
ProbZ
0.0002
Alpha
0.05
Lower
6.8046
Parameters
1
AIC
177.3
AICC
177.5
HQIC
177.6
BIC
178.5
CAIC
179.5
NumDF
4
1
4
DenDF
26
26
26
FValue
1.77
15.92
2.04
ProbF
0.1642
0.0005
0.1184
Upper
20.6061
available in PROC MIXED. For the demonstration here, the value of AIC was used
where the smaller the AIC value, the better the variance structure. Table 14.6 contains
the PROC MIXED code to fit Model 14.1 where the variance structure consists of
assuming all variances are equal. The value of AIC is 177.3. Table 14.7 contains the
PROC MIXED code to fit a variance model with unequal variances for each of the
methods. The statement “repeated/group=method;” specifies that the residuals have a different variance for each level of method. This model does not involve
repeated measures, but the repeated statement is being used. In this case just think
of the repeated statement as the residual statement since it is the variances of the
residual part of the model that are being specified. The value of AIC is 163.3. The
estimates of the variances in Table 14.7 are the same as those in Table 14.2. Finally,
the PROC MIXED code in Table 14.8 fits the model with different variances for
each level of code. The value of AIC is 158.8. Based on the assumption that the
variance structure with the smaller AIC is more adequate, the structure with different
variances for each level of code would be selected. When the variance structures of
© 2002 by CRC Press LLC
14
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 14.7
PROC MIXED Code to Fit the Unequal Variance Model for Each Study Method
with Unequal Slopes
proc mixed data = oneway cl covtest ic;
class method;
model post=method pre pre*method/ddfm=satterth solution;
repeated/group=method;
CovParm
Residual
Residual
Residual
Residual
Residual
Neg2LogLike
153.3
Effect
method
pre
pre*method
Group
Method 1
Method 2
Method 3
Method 4
Method 5
Estimate
15.6484
0.2266
16.5795
17.4375
0.7613
StdErr
9.0346
0.1602
9.5722
11.0285
0.4815
ZValue
1.73
1.41
1.73
1.58
1.58
ProbZ
0.0416
0.0786
0.0416
0.0569
0.0569
Alpha
0.05
0.05
0.05
0.05
0.05
Parameters
5
AIC
163.3
AICC
166.3
HQIC
166.1
BIC
171.3
CAIC
176.3
NumDF
4
1
4
DenDF
8.6
17.1
8.8
FValue
22.12
20.15
2.32
ProbF
0.0001
0.0003
0.1371
Lower
6.4979
0.0814
6.8845
6.7943
0.2966
Upper
75.8805
1.8713
80.3958
104.8923
4.5796
TABLE 14.8
PROC MIXED Code to Fit the Unequal Variance Model for Each Group
of Study Methods with Unequal Study Method Slopes
proc mixed data = oneway cl covtest ic;
class method;
model post=method pre pre*method/ddfm=satterth solution;
repeated/group=code;
CovParm
Residual
Residual
Neg2LogLike
154.8
Effect
method
pre
pre*method
Group
Group 1
Group 2
Estimate
16.5033
0.5237
StdErr
5.6606
0.2469
ZValue
2.92
2.12
ProbZ
0.0018
0.0169
Alpha
0.05
0.05
Parameters
2
AIC
158.8
AICC
159.4
HQIC
159.9
BIC
162.0
CAIC
164.0
NumDF
4
1
4
DenDF
15.2
18.0
14.6
FValue
21.75
20.11
2.51
ProbF
0.0000
0.0003
0.0870
Lower
9.2927
0.2478
Upper
37.0899
1.7454
models become more and more complicated, a Levene’s type test statistic will not
necessarily exist and so an information criteria can be used to select an adequate
covariance structure.
© 2002 by CRC Press LLC
Analysis of Covariance Models with Heterogeneous Errors
15
For this model, one could use Bartlett’s test and/or Hartley’s test (modified for
unequal sample sizes) to test the equal variance hypothesis. The Levene’s type test
is easy to compute when your software has the ability to compute the residuals,
store them, and then analyze the absolute value of the residuals. PROC GLM has
the ability to provide several different tests of homogeneity of variance when the
model is a one-way treatment structure in a CRD design structure. Some of those
methods use other functions of the residuals which can be easily adapted for the
analysis of covariance model.
Based on the above selected covariance structure, the next step is to investigate
the form of the covariate part of the model. (Note: There is the covariate part of the
model that has to do with the covariates and the covariance part of the model that
has to do with the variances and covariances of the data.) Using the fixed effects
analysis from Table 14.8, the significance level corresponding to pre*method is
0.0870. The conclusion is that there is not sufficient evidence to conclude the slopes
are unequal, so an equal slopes model was selected to continue the analysis (a plot
of the residuals should be carried out for each method before this type of conclusion
is reached). The PROC MIXED code in Table 14.9 is used to fit a common slope
model with a different variance for each level of code. (Note that code is included
TABLE 14.9
PROC MIXED Code to Fit the Unequal Variance Model for Each Group
of Study Methods with Equal Study Method Slopes
proc mixed data = oneway cl covtest ic;
class method code;
model post=method pre/ddfm=satterth solution;
repeated/group=code;
CovParm
Residual
Residual
Group
Group 1
Group 2
Estimate
21.0000
0.5555
StdErr
6.7071
0.2521
ZValue
3.13
2.20
ProbZ
0.0009
0.0138
Alpha
0.05
0.05
Parameters
2
AIC
160.4
AICC
160.9
HQIC
161.6
BIC
163.6
CAIC
165.6
Effect
method
pre
NumDF
4
1
DenDF
16.9
10.8
FValue
166.97
30.02
ProbF
0.0000
0.0002
Effect
Intercept
method
method
method
method
method
pre
Method
Estimate
67.3975
–11.8149
–11.4306
–5.8980
–1.3898
0.0000
0.1616
StdErr
0.8367
1.6551
0.4495
1.6455
1.7650
df
10.66
21.31
9.865
20.86
21.11
tValue
80.55
–7.14
–25.43
–3.58
–0.79
Probt
0.0000
0.0000
0.0000
0.0018
0.4398
0.0295
10.78
5.48
0.0002
Neg2LogLike
156.4
© 2002 by CRC Press LLC
1
2
3
4
5
Lower
12.2351
0.2689
Upper
44.1750
1.7473
16
Analysis of Messy Data, Volume III: Analysis of Covariance
in the class statement, but it does not have to be in the model statement.) Since the
covariate part of the model has been simplified, the estimates of the two variances
are a little larger. The estimate of the slope is 0.1616, indicating that the post exam
score increased 0.1616 points for each additional point on the pre exam. The F-value
corresponding to source method provides a test of the equal intercepts hypothesis,
but since the lines are parallel, it is also a test of the equal models evaluated at some
value of pre hypothesis. In this case, the significance level is 0.000, indicating there
is sufficient evidence to conclude that the models are not equal. The LSMEAN
statements in Table 14.10 provide estimated values for the post scores at three values
TABLE 14.10
LSMEAN Statements to Provide Adjusted Means at Three Values
of Pre and Pairwise Comparison of the Means at Pre = 23.5,
the Mean of the Pre Scores
lsmeans method/pdiff at means;
lsmeans method/pdiff at pre=10;
lsmeans method/pdiff at pre=40;
Method
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Pre
23.5
23.5
23.5
23.5
23.5
10.0
10.0
10.0
10.0
10.0
40.0
40.0
40.0
40.0
40.0
Estimate
59.3800
59.7642
65.2968
69.8051
71.1949
57.1985
57.5828
63.1153
67.6236
69.0134
62.0462
62.4305
67.9631
72.4714
73.8612
StdErr
1.6228
0.3143
1.6272
1.7346
0.2972
1.6488
0.4412
1.7108
1.7584
0.5677
1.7205
0.6419
1.6546
1.8270
0.4826
df
19.7
9.8
19.9
19.7
9.8
21.0
10.3
23.7
20.8
10.5
24.1
10.5
21.2
23.6
10.4
tValue
36.59
190.17
40.13
40.24
239.54
34.69
130.52
36.89
38.46
121.56
36.06
97.25
41.08
39.67
153.06
Probt
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
Method
1
1
1
1
2
2
2
3
3
4
_Method
2
3
4
5
3
4
5
4
5
5
Pre
23.5
23.5
23.5
23.5
23.5
23.5
23.5
23.5
23.5
23.5
Estimate
–0.3843
–5.9169
–10.4251
–11.8149
–5.5326
–10.0409
–11.4306
–4.5083
–5.8980
–1.3898
StdErr
1.6486
2.3042
2.3717
1.6551
1.6644
1.7586
0.4495
2.3844
1.6455
1.7650
df
21.0
20.0
19.6
21.3
21.7
20.8
9.9
20.0
20.9
21.1
tValue
–0.23
–2.57
–4.40
–7.14
–3.32
–5.71
–25.43
–1.89
–3.58
–0.79
© 2002 by CRC Press LLC
Probt
0.8179
0.0183
0.0003
0.0000
0.0031
0.0000
0.0000
0.0732
0.0018
0.4398
Analysis of Covariance Models with Heterogeneous Errors
17
Exam Scores by Pretest Scores for Study Methods
80
Score on Exam
75
70
65
60
55
50
0.00
10.00
20.00
30.00
40.00
50.00
Score on Pretest
1
2
3
4
5
FIGURE 14.1 Plot of the regression lines for each of the five study methods using the parallel
lines model.
of the pre scores, 23.4 (mean), 10, and 40. The adjusted means at 23.4 can be used
to report the results and the adjusted means at 10 and 40 can be used to provide a
graph of the estimated regression lines as displayed in Figure 14.1. The bottom part
of Table 14.10 provides LSD type pairwise comparisons between the study method
means where (1,2), (3,4), and (4,5) methods are not significantly different and all
other comparisons are significantly different (p < 0.05).
The analysis of this model is a preview of the process to be used in later chapters.
The process is to first determine an adequate set of regression models that describes
the mean of the dependent variable as a function of the covariates. Then use that
covariate model to study the relationship among the variances of the treatment
combinations. Once an adequate covariance structure is selected, go back to the
regression or covariate part of the model and simplify it as much as possible using
strategies described in previous chapters.
14.9 EXAMPLE: TWO-WAY TREATMENT STRUCTURE
WITH UNEQUAL VARIANCES
The data in Table 14.11 are from an experiment designed to evaluate the effect of
the speed in rpm of a circular bar of steel and the feed rate into a cutting tool on
the roughness of the surface finish of the final turned product, where the depth of
cut was set to 0.02 in. The treatment structure is a two-way with four levels of speed
(rpm) and four levels of feed rate feed. The design structure is a completely randomized design with eight replications per treatment combination. There is variation
© 2002 by CRC Press LLC
18
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 14.11
Cutting Tool Data Ran at Different Feed Rates and Speeds with Roughness
the Dependent Variable and Hardness the Possible Covariate
Speed 100 rpm
Speed 200 rpm
Speed 400 rpm
Speed 800 rpm
Feed
Rate Roughness Hardness Roughness Hardness Roughness Hardness Roughness Hardness
0.01
50
61
65
59
84
64
111
50
0.01
53
65
55
44
104
70
142
52
0.01
56
57
59
52
73
50
147
47
0.01
41
43
55
46
89
55
135
44
0.01
44
46
62
56
89
58
134
62
0.01
43
51
63
59
87
55
148
53
0.01
42
41
67
66
84
58
162
68
0.01
48
53
59
48
83
56
139
65
0.02
0.02
0.02
0.02
0.02
0.02
0.02
0.02
64
61
64
62
65
58
72
63
54
58
60
57
60
50
67
63
81
81
70
61
67
71
79
69
65
53
43
48
43
44
69
40
108
118
136
109
101
104
93
104
41
67
69
69
61
62
44
62
192
152
190
166
167
152
139
188
68
49
46
46
65
45
64
60
0.04
0.04
0.04
0.04
0.04
0.04
0.04
0.04
97
79
86
90
92
74
92
82
62
48
55
54
59
42
57
44
103
105
121
101
107
103
102
108
61
53
69
50
61
57
57
63
123
137
153
137
111
131
137
155
41
41
64
58
50
46
42
68
197
216
190
187
212
211
207
220
59
46
55
58
63
47
40
64
0.08
0.08
0.08
0.08
0.08
0.08
0.08
0.08
141
142
132
119
147
119
145
147
66
61
49
41
69
42
57
63
158
154
166
159
164
171
162
163
56
49
59
55
65
68
64
68
192
195
180
187
210
174
204
198
69
57
52
56
62
44
63
48
279
293
266
303
281
256
289
287
55
52
52
69
61
41
58
47
in the hardness values of the bar stock used in the experiment; thus the hardness of
each piece of bar was measured to be used as a possible covariate. A model to
describe the linear relationship between roughness and hardness (plot the data to
see if this is an adequate assumption) is
© 2002 by CRC Press LLC