4 EFFECT OF DIET ON CHOLESTEROL LEVEL: AN EXCEPTION TO THE BASIC ANALYSIS OF COVARIANCE STRATEGY
Tải bản đầy đủ - 0trang
C0317 ch03 frame Page 67 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
67
TABLE 3.25
PROC GLM Code and Analysis of Variance Table to Provide a Test
of the Hypothesis That All of the Slopes are Zero for Example 3.4
proc glm data=one; class diet;
model post_chol = diet pre_chol*diet/noint ss3 solution;
Source
Model
Error
Uncor Total
df
8
24
32
SS
1428650.09
5852.91
1434503.00
MS
178581.26
243.87
FValue
732.28
ProbF
0.0000
Source
diet
Pre_Chol*diet
df
4
4
SS (Type III)
59368.57
2336.71
MS
14842.14
584.18
FValue
60.86
2.40
ProbF
0.0000
0.0785
TABLE 3.26
PROC GLM Code and Analysis of Variance for the Less
Than Full Rank Model to Test the Equal Slopes Hypothesis
for Example 3.4
proc glm; class diet;
model Post_chol = diet pre_chol
Pre_chol*diet/noint;
Source
Model
Error
Uncorrected Total
df
8
24
32
SS
1428650.09
5852.91
1434503.00
MS
178581.26
243.87
FValue
732.28
ProbF
0.0000
Source
diet
Pre_Chol
Pre_Chol*diet
df
4
1
3
SS (Type III)
59368.57
1.82
2334.81
MS
14842.14
1.82
778.27
FValue
60.86
0.01
3.19
ProbF
0.0000
0.9318
0.0417
lines reveals that the slopes for diets 1 and 2 are positive while the slopes for diets
3 and 4 are negative. The negative slopes and the positive slopes are not quite
significantly different from zero (using a Bonferroni adjustment), but the positive
slopes are significantly different from the negative slopes. Thus, a model with
unequal slopes is needed to adequately describe the data set. Comparisons among
the diets are accomplished by using the unequal slopes model. Since the slopes are
unequal, the diets need to be compared at least at three values of PRE_CHOL. For
this study, the three values are the 75th percentile, median, and 25th percentile of the
studies PRE_CHOL data, which are 281, 227.5, and 180. The least squares means
computed at the three above values are in Table 3.28. Pair- wise comparisons among
the levels of DIET were carried using the Tukey method for multiple comparisons
© 2002 by CRC Press LLC
C0317 ch03 frame Page 68 Monday, June 25, 2001 10:04 PM
68
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 3.27
Estimates of the Intercepts and Slopes for Full
Rank Model of Table 3.25
Parameter
diet 1
diet 2
diet 3
diet 4
Pre_Chol*diet
Pre_Chol*diet
Pre_Chol*diet
Pre_Chol*diet
Estimate
137.63
195.74
223.73
276.60
0.2333
0.0450
–0.0232
–0.2333
1
2
3
4
StdErr
27.27
32.03
33.71
23.67
0.1113
0.1367
0.1476
0.1038
tValue
5.05
6.11
6.64
11.69
2.10
0.33
–0.16
–2.25
Probt
0.0000
0.0000
0.0000
0.0000
0.0467
0.7448
0.8763
0.0341
TABLE 3.28
PROC GLM Code and Corresponding Least
Squares or Adjusted Means Evaluated at Three
Values of PRE_CHOL of Example 3.4
lsmeans diet/pdiff stderr at pre_chol=281
adjust=Tukey; ***75th percentile;
lsmeans diet/pdiff stderr at pre_chol=227.5
adjust=Tukey; ***median or 50th percentile;
lsmeans diet/pdiff stderr at pre_chol=180
adjust=Tukey; ***25th percentile;
Post Chol
281
Diet
1
2
3
4
LSMean
203.19
208.39
217.20
211.05
StdErr
7.16
8.81
9.91
8.26
LSMean
Number
1
2
3
4
227.5
1
2
3
4
190.71
205.98
218.45
223.53
5.69
5.54
5.53
5.55
1
2
3
4
180
1
2
3
4
179.63
203.84
219.55
234.61
8.66
8.87
8.67
7.02
1
2
3
4
within each level of PRE_CHOL. The significance levels of the Tukey comparisons
are in Table 3.29.
There were no significant differences among the DIET’s means of POST_CHOL
at PRE_CHOL=281: the mean of DIET 1 is significantly lower than the means of
© 2002 by CRC Press LLC
C0317 ch03 frame Page 69 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
69
TABLE 3.29
Tukey Significance Levels for Comparing the Levels
of Diet at Each Value of Pre_Chol
Pre Chol
281
Row Name
1
2
3
4
0.9675
0.6653
0.8886
1
2
3
4
0.2456
0.0094
0.0020
1
2
3
4
0.2334
0.0164
0.0003
227.5
180
1
2
0.9675
0.9092
0.9961
0.2456
0.4013
0.1416
0.2334
0.5918
0.0541
3
0.6653
0.9092
4
0.8886
0.9961
0.9635
0.9635
0.0094
0.4013
0.0020
0.1416
0.9149
0.9149
0.0164
0.5918
0.0003
0.0541
0.5411
0.5411
Diet and Cholesterol
Post Cholesterol Level
250
#
#
220
#
*
*
190
+
* +
*
*
+
160
180
210
*
#
+
#
+
+
*
+
150
#
*
#
240
270
300
Pre Cholesterol Level
+ + +
###
Diet 1
Diet 4
Diet 3
Data
Data
Model
* * *
Diet 2
Diet 1
Diet 4
Data
Model
Model
Diet 3
Diet 2
Data
Model
FIGURE 3.12 Graph of the diet models and data as a function of the pre-diet values for
Example 3.4.
DIETs 3 and 4 at PRE_CHOL=227.5 and 180. A graph of the data and of the
estimated models is in Figure 3.12, indicating there are large diet differences at low
pre-diet cholesterol levels and negligible differences between the diets at high prediet cholesterol levels.
© 2002 by CRC Press LLC
C0317 ch03 frame Page 70 Monday, June 25, 2001 10:04 PM
70
Analysis of Messy Data, Volume III: Analysis of Covariance
This example demonstrates that, even though there is not enough information
from the individual models to conclude that any slopes are different from zero, the
slopes could be significantly different from each other when some are positive and
some are negative. Hence, the analyst must be careful to check for this case when
doing analysis of covariance.
3.5 CHANGE FROM BASE LINE ANALYSIS USING
EFFECT OF DIET ON CHOLESTEROL LEVEL DATA
There is a lot of confusion about the analysis of change from base line data. It might
be of much interest to the dietician to evaluate the change in cholesterol level from
the base line measurement or pre-diet cholesterol level discussed in Section 3.4.
Some researchers think that by calculating the change of base line and then using
analysis of variance to analyze that change, there is no need to consider base line
as a covariate in the modeling process. The data set from Example 3.4 is used in
the following to shed some light on the analysis of change from base line data.
Table 3.30 contains the analysis of variance of the change from base line data
calculated for each person as post cholesterol minus pre cholesterol. The estimate
of the variance from Table 3.30 is 2695.58 compared to 243.87 for the analysis of
covariance model from Table 3.15. In fact, the estimate of the variance based on the
analysis of variance of just the post cholesterol values (ignoring the pre measurements) is 292.49 (analysis is not shown). So the change from base line data has
tremendously more variability than the post diet cholesterol data. The analysis in
Table 3.30 provides an F statistic for comparing diet means with a significance level
of 0.2621. The analysis of variance on the post cholesterol (without the covariate)
provides an F statistic with a significance level of 0.0054 and using the multiple
comparisons, one discovers that the mean cholesterol level of diet 1 is significantly
less than the means of diets 3 and 4. Therefore, the analysis of change from base
line data is not necessarily providing appropriate information about the effect of
diets on a person’s cholesterol level.
TABLE 3.30
PROC GLM Code and Analysis of Variance of Change
from Baseline, Pre Minus Post without the Covariate
proc glm data=one;class diet;
model change=diet;
Source
Model
Error
Corrected Total
df
3
28
31
SS
11361.09
75476.13
86837.22
MS
3787.03
2695.58
FValue
1.40
ProbF
0.2621
Source
diet
df
3
SS(Type III)
11361.09
MS
3787.03
FValue
1.40
ProbF
0.2621
© 2002 by CRC Press LLC
C0317 ch03 frame Page 71 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
71
What happens if the covariate is also used in the analysis of the change from
base line data? Assume there are t treatments where y represents the response variable
or post measurement and x denotes the covariate or pre measurement. Also, assume
the simple linear regression model describes the relationship between the mean of
y given x and x for each of the treatments. Then the model is
y ij = α i + βi x ij + ε ij , i = 1, …, t, j = 1, …, n i
Next compute the change from base line data as cij = yij – xij. The corresponding
model for cij is
cij = y ij − x ij = α i + (βi − 1) x ij + ε ij , i = 1, …, t, j = 1, …, n i
= α i + γ i x ij + ε ij ,
where the slope of the model for cij is equal to the slope for the yij model minus 1.
Thus testing H0:γ1 = … = γt = 0 vs. Ha: (not Ho:) is equivalent to testing H0:β1 = …
= βt = 1 vs. Ha: (not H0:). Therefore, in order for the analysis on the change from
base line data without the covariate to be appropriate is for the slopes of the models
for yij to all be equal to 1. The following analyses demonstrate the importance of
using the pre value as a covariate in the analysis of the change from base line. Tables
3.31 and 3.32 contain the results of fitting Model 2.1 to the change=post cholesterol
minus pre cholesterol values where the full rank model is fit to get the results in
Table 3.31 and the less than full rank model is fit to get the results in Table 3.32.
The estimate of the variance from Table 3.31 is 243.87, the same as the estimate of
the variance obtained from the analysis of covariance model in Table 3.25. The
F statistic for source Pre_Chol*diet tests the equal slopes hypothesis of Equation 2.11.
The significance level is 0.0417, the same as in Table 3.26.
TABLE 3.31
PROC GLM Code and Analysis of Variance Table
for Change from Base Line Data with the Covariate
to Test Slopes Equal Zero
proc glm data=one;class diet;
model change=diet Pre_chol*diet/solution noint;
Source
Model
Error
Uncorrected Total
df
8
24
32
SS
92122.09
5852.91
97975.00
MS
11515.26
243.87
FValue
47.22
ProbF
0.0000
Source
diet
Pre_Chol*diet
df
4
4
SS (Type III)
59368.57
69623.21
MS
14842.14
17405.80
FValue
60.86
71.37
ProbF
0.0000
0.0000
© 2002 by CRC Press LLC
C0317 ch03 frame Page 72 Monday, June 25, 2001 10:04 PM
72
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 3.32
PROC GLM Code and Analysis of Variance
for the Change from Base Line Data to Provide
the Test of the Slopes Equal Hypothesis
proc glm data=one;class diet;
model change=diet Pre_chol Pre_chol*diet;
Source
Model
Error
Corrected Total
df
7
24
31
SS
80984.31
5852.91
86837.22
MS
11569.19
243.87
FValue
47.44
ProbF
0.0000
Source
diet
Pre_Chol
Pre_Chol*diet
df
3
1
3
SS (Type III)
3718.77
60647.40
2334.81
MS
1239.59
60647.40
778.27
FValue
5.08
248.69
3.19
ProbF
0.0073
0.0000
0.0417
TABLE 3.33
Estimates of the Parameter from Full Rank Analysis
of Covariance Model for Change from Base Line
Parameter
diet 1
diet 2
diet 3
diet 4
Pre_Chol*diet
Pre_Chol*diet
Pre_Chol*diet
Pre_Chol*diet
1
2
3
4
Estimate
137.63
195.74
223.73
276.60
–0.767
–0.955
–1.023
–1.233
StdErr
27.27
32.03
33.71
23.67
0.111
0.137
0.148
0.104
tValue
5.05
6.11
6.64
11.69
–6.89
–6.98
–6.93
–11.88
Probt
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
The estimates of the intercepts and slopes for the model in Table 3.31 are
displayed in Table 3.33. The intercepts are identical to those in Table 3.27 and the
slopes are the slopes in Table 3.27 minus 1. Just like in the analysis of the post
cholesterol data, an unequal slopes model is needed to adequately describe the data
set. The adjusted means or least squares means are computed at pre cholesterol
levels of 281, 227.5, and 180. Those least squares means are presented in Table 3.34.
Most of the time it is not of interest to consider the t-tests associated with the least
squares means. The t-statistic is a test of the hypothesis that the respective population
mean is equal to zero. The adjusted means are changed from base line values and
diets 3 and 4 at Pre_chol=227.5 and diet 1 at Pre_chol=180 are not significantly
different from zero. Table 3.35 consists of the significance levels for Tukey adjusted
multiple comparisons for all pairwise comparisons of the diets means within the
Pre_chol values of 281, 227.5, and 180. These significance levels are identical to
those computed from Table 3.29.
© 2002 by CRC Press LLC
C0317 ch03 frame Page 73 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
73
TABLE 3.34
PROC GLM Code and Least Squares Means
for the Change from Base Line Analysis
lsmeans diet/pdiff stderr at pre_chol=281
adjust=Tukey;***75th percentile;
lsmeans diet/pdiff stderr at pre_chol=227.5
adjust=Tukey;***median or 50th percentile;
lsmeans diet/pdiff stderr at pre_chol=180
adjust=Tukey;***25th percentile;
Pre_CHOL
281
Diet
1
2
3
4
227.5
1
2
3
4
180
1
2
3
4
LSMean
–77.81
–72.61
–63.80
–69.95
StdErr
7.16
8.81
9.91
8.26
Probt
0.0000
0.0000
0.0000
0.0000
LSMeanNumber
1
2
3
4
–36.79
–21.52
–9.05
–3.97
5.69
5.54
5.53
5.55
0.0000
0.0007
0.1148
0.4820
1
2
3
4
–0.37
23.84
39.55
54.61
8.66
8.87
8.67
7.02
0.9660
0.0128
0.0001
0.0000
1
2
3
4
TABLE 3.35
Tukey Adjusted Significance Levels for Pairwise
Comparisons of the Diets’ Means at Three Levels
of Pre Cholesterol
PRE_CHOL
281
227.5
180
© 2002 by CRC Press LLC
RowName
1
2
3
4
_1
0.9675
0.6653
0.8886
1
2
3
4
0.2456
0.0094
0.0020
1
2
3
4
0.2334
0.0164
0.0003
_2
0.9675
0.9092
0.9961
0.2456
0.4013
0.1416
0.2334
0.5918
0.0541
_3
0.6653
0.9092
_4
0.8886
0.9961
0.9635
0.9635
0.0094
0.4013
0.0020
0.1416
0.9149
0.9149
0.0164
0.5918
0.5411
0.0003
0.0541
0.5411
C0317 ch03 frame Page 74 Monday, June 25, 2001 10:04 PM
74
Analysis of Messy Data, Volume III: Analysis of Covariance
In summary, when it is of interest to evaluate change from base line data, do
the analysis, but still consider the base line values as possible covariates. The only
time the analysis of variance of change from base line data is appropriate is when
the slopes are all equal to 1. As this example shows, change from base line values
can have considerable effect on the estimate of the variance and thus on the resulting
conclusions one draws from the data analysis. So, carry out the analysis on the
change from base line variables, but also consider the base line values as possible
covariates.
3.6 SHOE TREAD DESIGN DATA FOR EXCEPTION
TO THE BASIC STRATEGY
The data in Table 3.36 are the times it took males to run an obstacle course with a
particular tread design on the soles of their shoes (Tread Time, sec). To help remove
the effect of person-to-person differences, the time required to run the same course
while wearing a slick-soled shoe was also measured (Slick Time, sec). Fifteen
subjects were available for the study and were randomly assigned to one of three
tread designs, five per design. The data are from a one-way treatment structure with
one covariate in a completely randomized design structure. It is of interest to compare
mean times for tread designs for a constant time to run the course with the slick
sole shoes. Table 3.37 contains the analysis to test the hypothesis that the slopes are
all equal to zero (Equation 2.6), which one fails to reject (p = 0.2064). None of the
individual slopes are significantly different from zero, but they are all in the magnitude of 0.3.
The main problem is that there are only five observations per treatment group
and it is difficult to detect a non-zero slope when the sample size is small. The basic
TABLE 3.36
Obstacle Course Time Data for Three Shoe
Tread Designs
Tread Design
1
Slick
Time
34
40
48
35
42
2
Tread
Time
36
36
38
32
39
Slick
Time
37
50
38
52
45
3
Tread
Time
29
40
35
34
29
Slick
Time
58
57
36
55
48
Tread
Time
38
32
29
34
31
Note: Tread Time (sec) denotes time to run the course with the
assigned tread and Slick Time (sec) denotes time to run the same
course using a slick-soled shoe to be considered as a possible
covariate for Example 3.6.
© 2002 by CRC Press LLC
C0317 ch03 frame Page 75 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
75
TABLE 3.37
PROC GLM Code, Analysis of Variance Table, and Estimates
of the Parameters for the Full Rank Model for Example 3.6
proc glm data=two; class tread_ds;
model Tread_time = tread_ds Slick_Time*tread_ds/noint solution ss3;
Source
Model
Error
Uncorrected Total
df
6
9
15
SS
17620.18
93.51
17713.69
MS
2936.70
10.39
FValue
282.65
ProbF
0.0000
Source
tread_ds
Slick_Time*tread_ds
df
3
3
SS (TypeIII)
122.92
58.04
MS
40.97
19.35
FValue
3.94
1.86
ProbF
0.0476
0.2064
Estimate
23.51
20.07
18.24
0.325
0.301
0.286
StdErr
11.75
10.55
8.88
0.294
0.236
0.173
tValue
2.00
1.90
2.05
1.11
1.28
1.65
Probt
0.0765
0.0897
0.0703
0.2975
0.2342
0.1324
Parameter
tread_ds 1
tread_ds 2
tread_ds 3
Slick_Time*tread_ds 1
Slick_Time*tread_ds 2
Slick_Time*tread_ds 3
TABLE 3.38
PROC GLM Code and the Analysis of Variance Table for the
Analysis of Tread Time without the Covariate for Example 3.6
proc glm data=two; class tread_ds;where tread_time ne .;
model Tread_time = tread_ds /solution;
Source
Model
Error
Corrected Total
df
2
12
14
SS
38.05
151.55
189.60
MS
19.03
12.63
FValue
1.51
ProbF
0.2608
Source
tread_ds
df
2
SS
38.05
MS
19.03
FValue
1.51
ProbF
0.2608
strategy says to continue the analysis of the shoe tread designs via analysis of
variance, i.e., without the covariate. The analysis of variance of the time to run the
obstacle course to compare the tread designs without using the covariate is in
Table 3.38. The analysis of variance indicates there are no significant differences
among the shoe tread design means. Table 3.39 displays the means for each of the
tread designs and the significance levels indicate the means of the tread designs are
not significantly different. If the basic strategy is ignored and other models are used
(such as a common slope model), it becomes evident that the covariate is important
© 2002 by CRC Press LLC
C0317 ch03 frame Page 76 Monday, June 25, 2001 10:04 PM
76
Analysis of Messy Data, Volume III: Analysis of Covariance
TABLE 3.39
PROC GLM Code, Least Squares Means and p-Values
for Making Pairwise Comparisons among Shoe Tread
Design Means for Tread Time (sec) of Example 3.6
without the Covariate
lsmeans tread_ds/stderr pdiff;
tread_ds
1
2
3
LSMean
36.40
33.40
32.74
StdErr
1.59
1.59
1.59
Probt
0.0000
0.0000
0.0000
RowName
1
2
3
_1
_2
0.2067
_3
0.1294
0.7740
0.2067
0.1294
LSMeanNumber
1
2
3
0.7740
TABLE 3.40
PROC GLM Code, Analysis of Variance Table and Parameter
Estimates for the Common Slope Model of Example 3.6
proc glm data=two; class tread_ds;
model Tread_time = tread_ds Slick_Time/solution;
Source
Model
Error
Corrected Total
df
3
11
14
SS
95.96
93.64
189.60
MS
31.99
8.51
FValue
3.76
ProbF
0.0444
Source
tread_ds
Slick_Time
df
2
1
SS (Type III)
85.95
57.91
MS
42.97
57.91
FValue
5.05
6.80
ProbF
0.0278
0.0243
Parameter
Intercept
tread_ds 1
tread_ds 2
tread_ds 3
Slick_Time
Estimate
17.67
6.92
2.54
0.00
0.298
StdErr
5.92
2.23
1.98
tValue
2.98
3.10
1.28
Probt
0.0125
0.0100
0.2261
0.114
2.61
0.0243
in the comparison of the tread designs. The common slope model analysis is displayed in Table 3.40, which indicates there is a significant effect due to the covariate
(p = 0.0243), i.e., indicating the common slope is significantly different from zero.
Thus while there is not enough information from the individual tread design’s data
to conclude their slopes are different from zero, the combined data sets for a common
slope does provide an estimate of the slope which is significantly different than zero.
© 2002 by CRC Press LLC
C0317 ch03 frame Page 77 Monday, June 25, 2001 10:04 PM
Examples: One-Way Analysis of Covariance
77
TABLE 3.41
PROC GLM Code, Least Squares Means, and p-Values
for Comparing the Shoe Tread Design Means
lsmeans tread_ds/stderr pdiff e;
tread_ds
1
2
3
LSMean
37.94
33.57
31.03
StdErr
1.43
1.31
1.46
Probt
0.0000
0.0000
0.0000
RowName
1
2
3
_1
_2
0.0436
_3
0.0100
0.2261
0.0436
0.0100
LSMeanNumber
1
2
3
0.2261
The overall test of TREAD_DS in Table 3.40 indicates there is enough information
to conclude that the tread designs yield different times, a different decision than
from the analysis without the covariate (Table 3.38).
The adjusted means (LSMEANS at Slick Time = 44.88667) in Table 3.41 indicate
that designs 2 and 3 are possibly better than design 1. If a Bonferroni adjustment
is used, then runners using design 3 run significantly faster than design 1 (α = 0.05).
The graph of the estimated regression lines with a common slope is in Figure 3.13.
This example shows two important aspects of analysis of covariance. First, there
could be enough evidence to conclude that a common slope model is necessary to
Shoe Tread Designs
Time (sec) with Tread Soles
45
40
+
35
+
30
+
+
+
*
*
*
*
*
25
30
40
50
60
Time (sec) with Slick Soles
+ + +
Design 1
Design 1
Data
Model
* * *
Design 2
Design 2
Data
Model
Design 3
Design 3
Data
Model
FIGURE 3.13 Plot of data and estimated models for each of the three tread designs for the
common slope model of Example 3.6.
© 2002 by CRC Press LLC