Tải bản đầy đủ - 0 (trang)
3 EXAMPLE: ONE-WAY TREATMENT STRUCTURE WITH EQUAL SLOPES MODEL

3 EXAMPLE: ONE-WAY TREATMENT STRUCTURE WITH EQUAL SLOPES MODEL

Tải bản đầy đủ - 0trang

4



Analysis of Messy Data, Volume III: Analysis of Covariance



TABLE 7.2

Analysis of Covariance Using All Five of the Possible Covariates

proc glm data=ex_7_1; classes treat; model y=treat

x1 x2 x3 x4 x5/solution;

Source

Model

Error

Corr Total



df

8

31

39



SS

130.117

88.883

219.000



MS

16.265

2.867



FValue

5.673



ProbF

0.0002



Source

treat

x1

x2

x3

x4

x5



df

3

1

1

1

1

1



SS (III)

91.166

16.172

13.198

2.818

0.710

12.406



MS

30.389

16.172

13.198

2.818

0.710

12.406



FValue

10.599

5.640

4.603

0.983

0.248

4.327



ProbF

0.0001

0.0239

0.0399

0.3292

0.6222

0.0459



Estimate

0.337

–0.839

0.027

–0.074

0.162



StdErr

0.142

0.391

0.027

0.149

0.078



tValue

2.375

–2.145

0.991

–0.498

2.080



Probt

0.0239

0.0399

0.3292

0.6222

0.0459



Parameter

x1

x2

x3

x4

x5



TABLE 7.3

Adjusted Means and Pairwise Comparisons

Based on the Model with Five Covariates

lsmeans treat/pdiff stderr;

Treat

1

2

3

4

LSM Num

1

2

3

4



LSMEAN

17.720

18.312

20.280

21.688



StdErr

0.542

0.555

0.560

0.551



LSMEANNumber

1

2

3

4



_1



_2

0.4487



_3

0.0028

0.0207



0.4487

0.0028

0.0000



0.0207

0.0002



_4

0.0000

0.0002

0.0844



0.0844



the two variables are possibly not needed in the model. The estimate of the variance

is 2.867, which is based on 31 degrees of freedom. Table 7.3 contains the adjusted

or least squares means for the four treatments as well as pairwise comparisons of

the treatments. Using a Fisher’s protected LSD approach, the means of Treatments 1

© 2002 by CRC Press LLC



Variable Selection in the Analysis of Covariance Model



5



TABLE 7.4

PROC GLM Code to Fit the Analysis of Variance Model to the Response

Variable and Each of the Possible Covariates and Compute the Residuals

for Each

proc glm; classes treat;

model y x1 x2 x3 x4 x5=treat; * fit models 7.1 and 7.3;

output out=resids r=ry r1 r2 r3 r4 r5; * compute the residuals;



and 2 and of Treatments 3 and 4 are not significantly different while all other

comparisons have significance levels less than 0.05.

Since at least two of the possible covariates have slopes that are not significantly

different from zero, the model building process described in Section 7.2 is used to

carry out variable selection for determining the adequate set of covariates for the

model. The PROC GLM statement in Table 7.4 fits Model 7.1 to the response variable

y and Model 7.3 to each of the possible covarites x1, x2, x3, x4, and x5. The main

product of these analyses is the computation of the sets of residuals for each of the

variables. The output statement provides a file, called “resids,” that contains all of

the residuals, ry, r1, r2, r3, r4, and r5. The REG procedure in Tables 7.5 to 7.9 uses

the computed residuals and model selection procedures to select variables for the

analysis of covariance.

Five out of several available different variable selection methods (Draper and

Smith, 1981; SAS, 1989; and Ott, 1988) were used to demonstrate some of the

aspects of model building. The methods used were stepwise, forward, backward,

adjusted R2, and CP. There is no guarantee that these procedures will yield the same

model and in most cases that involves many possible covariates, the sets of selected

variables will not be identical. Tables 7.5 through 7.9 contain the results of the model

building processes.

The PROC REG code and results of using the stepwise method are in Table 7.5.

The stepwise variable selection method starts with no variables in the model and

includes variables in a stepwise manner. At each step after including a new variable,

a variable with the largest significance level is eliminated (when the significance

level is greater than a pre-set value). In this case variables r1, r5, and r2 were selected.



TABLE 7.5

PROC REG Code to Use the Stepwise Variable Selection Procedure

and Results

proc reg data=resids;

stepwise: model ry=r1 r2 r3 r4 r5/selection = stepwise;

Step

1

2

3



Entered

r1

r5

r2



© 2002 by CRC Press LLC



Var In

1

2

3



PartialR**2

0.161

0.082

0.094



ModelR**2

0.161

0.243

0.337



Cp

8.506

6.159

3.155



FValue

7.301

4.005

5.124



ProbF

0.010

0.053

0.030



6



Analysis of Messy Data, Volume III: Analysis of Covariance



TABLE 7.6

PROC REG Code to Use the Forward Variable Selection Procedure

and Results

proc reg data=resids;

forward: model ry = r1 r2 r3 r4 r5 /selection = forward;

Step

1

2

3

4



Entered

r1

r5

r2

r3



Var In

1

2

3

4



PartialR**2

0.161

0.082

0.094

0.017



ModelR**2

0.161

0.243

0.337

0.354



Cp

8.506

6.159

3.155

4.272



FValue

7.301

4.005

5.124

0.902



ProbF

0.010

0.053

0.030

0.349



TABLE 7.7

PROC REG Code to Use the Backward Variable Selection Procedure

and Results

proc reg data=resids;

backward: model ry = r1 r2 r3 r4 r5 /selection = backward;

Step

1

2



Removed

r4

r3



Var In

4

3



PartialR**2

0.005

0.017



ModelR**2

0.354

0.337



Cp

4.272

3.155



FValue

0.272

0.902



ProbF

0.606

0.349



TABLE 7.8

PROC REG Code to Use the adjrsq Variable

Selection Procedure and Results for Top Five

Combinations of Variables

proc reg data=resids;

adjrsq: model ry = r1 r2 r3 r4 r5

/selection = adjrsq;

Dependent

ry

ry

ry

ry

ry



Var In

3

4

5

4

2



Adjrsq

0.2822

0.2802

0.2649

0.2633

0.2022



RSquare

0.3374

0.3541

0.3592

0.3389

0.2431



VarsInModel

r1 r2 r5

r1 r2 r3 r5

r1 r2 r3 r4 r5

r1 r2 r4 r5

r1 r5



Thus, the analysis indicates that X1, X2, and X5 are needed as possible covariates

in the analysis of the response variable.

Table 7.6 contains the PROC REG code and results of using the forward method.

The forward variable selection process starts with no variables in the model and

includes the next most important variable at each step. The forward variable selection method selects X1, X2, X3, and X5, although the significance level for X3 in

© 2002 by CRC Press LLC



Variable Selection in the Analysis of Covariance Model



7



TABLE 7.9

PROC REG Code to Use the CP Variable Selection

Procedure and Results for Top Five Combinations

of Variables

proc reg data=resids;

cp: model ry = r1 r2 r3 r4 r5 /selection = cp;

Dependent

ry

ry

ry

ry

ry



Var In

3

4

4

5

2



Cp

3.155

4.272

5.078

6.000

6.159



RSquare

0.337

0.354

0.339

0.359

0.243



VarsInModel

r1 r2 r5

r1 r2 r3 r5

r1 r2 r4 r5

r1 r2 r3 r4 r5

r1 r5



the final model is 0.349. This indicates that X3 is most likely not needed in the

model.

The backward variable selection PROC REG code and results are in Table 7.7.

The backward variable selection method starts with all covariates in the model and

eliminates the least important variable at each step (that variable with the largest

significance level). The backward method eliminated variables r3 and r4, indicating

that r1, r2, and r5 are remaining in the model.

Table 7.8 contains the PROC REG code to use the method “adjrsq” to select

variables for the model. The process is to fit models that include all possible

combinations of the variables and compute the adjusted R2 for each model. The

selected model consists of that set of variables with the largest adjusted R2. With

five variables, this process fits 25 – 1 = 31 models. The results of the five sets of

variables with the largest adjusted R2 are included in Table 7.8. The set of variables

with the largest adjusted R2 consists of r1, r2, and r5.

Finally, Table 7.9 contains the PROC REG code and results of using the CP

method of variable selection. As for the “adjrsq” method, the CP method fits models

with all possible combinations of variables and selects that model where CP

approaches “p,” the number of parameters in the model including the intercept. That

combination of variables with the CP value closest to “p” is r1, r2, r3, and r5 with

CP = 4.272. When fitting a model with all four of these variables, the significance

level corresponding to X3 is 0.3705, indicating that given the other variables are in

the model, variable X3 is not needed. Just as for the adjusted R2 method, the CP

method fits all possible combinations of the variables, which can become an unmanageable number when the number of possible covariates becomes large.

Using the approach of not including any variables with large significance levels

in the model, all of the procedures indicate that the needed variables are X1, X2, and

X5. Remember that the degrees of freedom associated with the residual sum of

squares for any of the above models is larger than they are supposed to be since the

regression code does not take into account the fact that the data being analyzed are

residuals. In this case, four degrees of freedom for residual were used to estimate

the means for the treatments for use in computing the residuals. Thus, all of the

© 2002 by CRC Press LLC



8



Analysis of Messy Data, Volume III: Analysis of Covariance



TABLE 7.10

PROC GLM Code to Fit the Final Model with Three

Covariates, Sums of Squares, and Estimates

of the Slopes for Each of the Covariates

proc glm data=ex_7_1; classes treat;

model y=treat x1 x2 x5/solution;

lsmeans treat/pdiff stderr;

Source

Model

Error

Corr Total



df

6

33

39



SS

127.097

91.903

219.000



MS

21.183

2.785



FValue

7.606



ProbF

0.0000



Source

treat

x1

x2

x5



df

3

1

1

1



SS (III)

91.799

17.678

13.081

17.419



MS

30.600

17.678

13.081

17.419



FValue

10.988

6.348

4.697

6.255



ProbF

0.0000

0.0168

0.0375

0.0175



Estimate

0.340

–0.833

0.184



StdErr

0.135

0.384

0.073



tValue

2.519

–2.167

2.501



Probt

0.0168

0.0375

0.0175



Parameter

x1

x2

x5



degrees of freedom associated with a residual sum of squares are inflated by four.

This means that the computed t-statistics, adjusted R2 values, and CP values are not

correct. The values of these statistics could be recomputed before decisions are made

concerning the variables to be included in the model, but, the results without recomputation from the model building procedures provide really good approximations

and provide adequate means for making decisions.

Table 7.10 contains the PROC GLM code to fit the final model with X1, X2, and

X5 as covariates. The mean square error has a value of 2.785 as compared to the

mean square error for the model with all five covariates (see Table 7.2) which has

a value of 2.867. When covariates are included in the model that are not needed,

the degrees of freedom for error are reduced more than the error sum of squares are

reduced, thus increasing the value of the estimate of the variance. The significance

levels corresponding to the statistics for testing the individual slopes of the covariates

are equal to zero are 0.0168, 0.0375, and 0.0175 for X1, X2, and X5, respectively.

The significance level corresponding to source Treat is 0.0000, indicating the intercepts are not equal or that distances between the various parallel hyper-planes are

not zero. Table 7.11 contains the adjusted means, predicted values on the hyperplanes at the average values of X1, X2, and X5 which are 25.68, 4.05, and 36.20,

respectively. Using a Fisher’s protected LSD method to make pairwise comparisons

of the treatment means indicates that Treatments 1 and 2 are not significantly

different while all other comparisons have significance levels less than 0.05. There

is one additional comparison, 3 vs. 4, that is significant for the model with three

covariates than for the model with five covariates.

© 2002 by CRC Press LLC



Variable Selection in the Analysis of Covariance Model



9



TABLE 7.11

Adjusted Means and Pairwise Comparisons

Using the Final Model with Three Covariates

lsmeans treat/pdiff stderr;

treat

1

2

3

4

LSM Num

1

2

3

4



LSMean

17.710

18.366

20.190

21.734



StdErr

0.532

0.544

0.535

0.541



Probt

0.0000

0.0000

0.0000

0.0000



LSM Num

1

2

3

4



_1



_2

0.3925



_3

0.0025

0.0242



_4

0.0000

0.0001

0.0492



0.3925

0.0025

0.0000



0.0242

0.0001



0.0492



TABLE 7.12

PROC GLM Code to Fit the Residual Model with Three

Covariates to Provide, Sums of Squares, and Estimates

of the Slopes for Each of the Covariates

proc glm data=resids;

model ry=r1 r2 r5/solution;

Source

Model

Error

Corrected Total



df

3

36

39



SS

46.799

91.903

138.702



MS

15.600

2.553



FValue

6.111



ProbF

0.0018



Source

r1

r2

r5



df

1

1

1



SS(III)

17.678

13.081

17.419



MS

17.678

13.081

17.419



FValue

6.925

5.124

6.823



ProbF

0.0124

0.0297

0.0130



Estimate

0.000

0.340

–0.833

0.184



StdErr

0.253

0.129

0.368

0.070



tValue

0.000

2.631

–2.264

2.612



Probt

1.0000

0.0124

0.0297

0.0130



Parameter

Intercept

r1

r2

r5



For comparison purposes, the residuals of y were regressed on the residuals of

X1, X2, and X5 and the results are in Table 7.12. The error sum of squares is 91.903,

the same as in Table 7.10. The mean square error is 2.553 = 91.903/36 instead of

2.785 = 91.903/33 since the degrees of freedom for error from the residual model

are 36 instead of the 33 as in Table 7.10. The estimates of the slopes are identical

for both models (as shown by the theory), but the estimated standard errors from

© 2002 by CRC Press LLC



10



Analysis of Messy Data, Volume III: Analysis of Covariance



the residual model are smaller than those from the final model. Again this is the

result of using 36 degrees of freedom for error rather than using 33 degrees of

freedom. The standard errors from Table 7.12 can be recomputed as

stderrslope = stderrslope from residual model 36 33

0.135 = 0.129 36 33

Even though the variable selection procedure is not exact, the results are adequate

enough that effective models can be selected for carrying out analysis of covariance.



7.4 SOME THEORY

The analysis of covariance model can be expressed in general matrix notation as

y = Mµ + Xβ + ε



(7.7)



where y is n × 1 vector of the dependent variable, M is the design matrix, µ is the

associated parameters corresponding to the treatment and design structures (all

considered as fixed effects for this purpose), X is the matrix of possible covariates,

β is the vector of slopes corresponding to each of the covariates, and ε is the error

distributed N(0, σ 2 In ). The estimates of the slopes can be obtained by using a

stepwise process where the first step is to fit the Mµ part of the model, computing

the residuals, and then the second step is to fit the Xβ part of the model, i.e., first fit

y = Mµ + ε



(7.8)



and compute the residuals as



(



)



r = I − M M− y

where M– denotes a generalized inverse of M (Graybill, 1976).

A model for these residuals is a model that is free of the Mµ effects since the

model for r is



(



)



r = I − M M− X β + ε +

where ε+ ~ N(0, σ 2 (I – M M–)). Next the BLUE of β (assuming β is estimable) is



)(

) ] X′(I − M M ) (I − M M )y

[ (

= [X′ (I − M M )X] X′(I − M M ) y .



βˆ = X′ I − M M − I − M M − X





© 2002 by CRC Press LLC



−1



−1















(7.9)



Variable Selection in the Analysis of Covariance Model



11



The estimate of β is a function of r = (I – M M–)y, the residuals of y from Model 7.8,

and is a function of (I – M M–)X, but each column of (I – M M–)X is as a set of

residuals computed from fitting the model xk = Mµk + εk where xk is the kth column

of X. Thus, computing the residuals of y and of each candidate covariate from a

model with the design matrix of the treatment and design structures and then

performing a variable selection procedure using those residuals provides the appropriate estimates of the slopes. Since the covariance matrix of the residuals of y, r,

is not positive definite [it is of rank n-Rank (M)], the error degrees of freedom from

using variable selection method on the residuals is inflated by Rank (M). The correct

degrees of freedom could be used in the final steps of the variable selection procedure

to compute the appropriate significance levels. The overall effect of the inflated error

degrees of freedom depends on the sample size and the Rank (M). For example if

n = 100, R (M) = 30, and q = 10 (number of candidate covariates), there is not much

difference between t percentage points with 60 and 90 degrees of freedom. On the

other hand if n = 50, R(M) = 30, and q = 10, there is a big difference between t

percentage points with 10 and 40 degrees of freedom.



7.5 WHEN SLOPES ARE POSSIBLY UNEQUAL

When slopes are unequal, the procedure in Section 7.1 may not determine the

appropriate covariates, particularly when some treatments have positive slopes and

others have negative slopes. To extend the procedure to handle unequal slopes for

each covariate, an independent variable needs to be constructed for each level of the

treatment (or levels of treatment combinations) which has the value of the covariate

corresponding to observations of that treatment and has the value zero for observations not belonging to that treatment. In effect, the following model needs to be

constructed

y ij = α i + βi1X ij1 + βi 2 X ij2 + … + βik X ijk + ε ij .

For “t” treatments and two covariates, construct the matrix model

 y11  1

 y  1

 12  

 M  M

  

 y1n  1

 y  0

 21  

 y 22  0

 M  M

 =

 y 2 n  0

  

 M  M

 y  0

 t1  

 y t 2  0

 M  M

  

 y tn  0



0

0

M

0

1

1

M

1

M

0

0

M

0



L



L

L



L



© 2002 by CRC Press LLC



0

0

M

0

0

0

M

0

M

1

1

M

1



X111

X121

M

X1n1

0

0

M

0

M

0

0

M

0



L



L

L



L



0

0

M

0

X 211

X 221

M

X 2 n1

M

0

0

M

0



L



L

L



L



0

0

M

0

0

0

M

0

M

X t11

X t 21

M

X tn1



X112

X122

M

X 1n 2

0

0

M

0

M

0

0

M

0



0

0

M

0

X 212

X 222

M

X2n2

M

0

0

M

0



L



L

L



L



0 

α 

0   1 

α

 2



 M 

0  

α

0   t 

 β11

0  

 β 21 

  M +ε

0  

  β t1 

 β 

X t12   12 

 β

X t 22   22 

 M 

 β 

X tn 2   t 2 



12



Analysis of Messy Data, Volume III: Analysis of Covariance



[



]



y = D, x11, x 21, …, x t1, x12 , x 22 …, x t 2 β + ε

where D denotes the part of the design matrix with ones and zeros and

x ′is = (0, 0, …, 0, x i1s , x i 2s , …, x ins , 0, … 0) .

Next fit the models

y=D α+ε

x is = D α is + ε is



i = 1, 2, …, t, s = 1, 2, …, k



and compute the residuals, denoted by

r, and rik i = 1, 2, …, t, s = 1, 2, …, k.

Finally, the variable selection procedure can be applied to the resulting sets of

residuals as in Section 7.1.



REFERENCES

Draper, N. R. and Smith, H. (1981). Applied Regression Analysis, Second Edition, New York:

John Wiley & Sons.

Graybill, F. A. (1976). Theory and Application of the Linear Model, Pacific Grove, CA.

Wadsworth and Brooks/Cole.

Ott, Lyman (1988). An Introduction to Statistical Methods and Data Analysis, Boston: PWSKent.

SAS Institute Inc. (1989). SAS/STAT ® User’s Guide, Version 6, Fourth Edition, Volume 2,

Cary, NC.

Yang, S. S. (1989). Personal communication.



EXERCISES

EXERCISE 7.1: Carry out an analysis of covariance for the following data set by

determining the appropriate model and then making the needed treatment comparisons. Y is the response variable and X, Z, and W are the covariates. Use a regression

model building strategy.

EXERCISE 7.2: Use the data in Section 4.6 with the variable selection procedures

to select variables to be included in the model. The discussion in Section 4.6 indicates

there are some unequal slopes, so the method in Section 7.5 will need to be utilized.

EXERCISE 7.3: Use the data in Section 4.4 with the variable selection process to

determine if the models can be improved by including the square of height, the

square of weight, and the cross-product of height and weight in addition to height

© 2002 by CRC Press LLC



Variable Selection in the Analysis of Covariance Model



13



and weight as possible covariates. Make the necessary treatment comparisons using

the final model.



Data for Exercise 7.1

TRT

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

B

B

B

B

B

B

B

B

B

B

B

B

B

B

B



© 2002 by CRC Press LLC



X

2.9

7.3

4.5

4.0

2.8

6.2

5.5

3.1

3.0

3.8

5.9

2.1

3.5

6.9

4.5

8.1

2.8

6.2

3.3

4.1

5.9

5.1

8.1

8.8

7.0

5.7

2.0

5.7

5.8

3.9



Z

4.9

4.2

4.2

4.8

4.6

4.3

4.3

5.0

4.7

4.3

4.7

4.7

4.7

4.7

4.7

4.6

4.1

4.9

4.8

4.8

4.9

4.1

4.1

4.6

4.1

4.8

4.7

4.7

4.5

4.1



W

2.2

3.2

1.9

9.2

6.6

5.5

1.0

3.7

0.4

7.7

2.7

2.4

6.1

7.1

9.4

8.3

5.9

5.5

0.9

0.6

7.3

7.6

3.2

9.9

9.4

5.5

9.4

4.3

8.0

7.4



Y

11.9

17.5

21.5

18.1

9.5

16.8

14.0

16.3

13.4

15.6

20.8

13.3

13.9

15.7

16.1

11.4

13.2

16.5

6.9

8.9

12.1

8.4

14.1

12.9

10.4

12.2

15.0

10.3

12.8

12.5



TRT

C

C

C

C

C

C

C

C

C

C

C

C

C

C

C

D

D

D

D

D

D

D

D

D

D

D

D

D

D

D



X

5.2

7.4

5.6

5.1

2.4

4.2

8.6

6.2

6.9

7.8

6.2

3.0

2.4

8.6

6.0

3.7

2.9

4.6

2.0

7.4

5.3

4.0

5.7

7.5

7.2

2.2

6.5

7.3

8.8

8.4



Z

4.2

4.7

4.2

4.5

4.3

4.9

4.3

4.6

4.0

4.2

4.1

4.6

4.5

4.3

4.7

4.2

4.5

4.8

4.2

4.1

4.1

4.4

4.5

4.7

4.0

4.7

4.7

4.9

4.2

4.9



W

5.6

5.4

9.3

3.8

5.4

6.9

6.3

0.2

9.1

1.4

6.8

0.5

5.6

4.5

4.3

8.3

1.0

3.4

1.0

6.6

9.7

1.4

8.8

4.6

7.7

4.0

8.1

4.9

8.1

3.3



Y

13.3

15.0

15.4

12.3

13.6

18.4

13.0

12.6

17.9

18.1

16.7

21.1

15.4

13.2

14.5

19.6

20.5

12.8

23.5

17.9

11.4

21.6

24.6

17.0

18.4

16.3

15.0

16.9

12.7

18.4



8



Comparing Models

for Several Treatments



8.1 INTRODUCTION

Once an adequate covariance model has been selected to describe the relationship

between the dependent variable and the covariates, it often is of interest to see if

the models differ from one treatment to the next or from treatment combination to

treatment combination. If one is concerned about the experiment-wise error rate in

an analysis involving many tests of hypotheses, this procedure can provide that

protection if it is used as a first step in comparing the treatments’ models. Suppose

the selected analysis of covariance model is

y ij = α i + βi1x1ij + βi 2 x 2 ij + L + βiq x qij + ε ij



(8.1)



for i = 1, 2, …, t and j = 1, 2, …, ni. The equal model hypothesis is

 α1   α 2 

 αt 

β   β 

β 

 11   21 

 t1 

H 0 : β12  = β22  = L = β t 2  vs. H a : ( not H 0 ) .

   

 

 M   M 

 M 

β1q  β2 q 

β tq 

   

 

This type of hypothesis can be tested by constructing a set of contrast statements in

either PROC GLM or PROC MIXED or the model comparison method can be used

to compute the value of the test statistic. The methodology described in this chapter

is an application of the model comparison method that can easily be used to test the

equality of models in many different settings. Schaff et al. (1988) and Hinds and

Milliken (1987) used the method to compare nonlinear models. Section 8.2 describes

the methodology to develop the statistics to test the equal model hypothesis for a

one-way treatment structure, and methodology for the two-way treatment structure

is discussed in Section 8.3. For two-way and higher order treatment structures, this

process generates Type II sums of squares (Milliken and Johnson, 1992). Three

examples are used to demonstrate the methods.



© 2002 by CRC Press LLC



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

3 EXAMPLE: ONE-WAY TREATMENT STRUCTURE WITH EQUAL SLOPES MODEL

Tải bản đầy đủ ngay(0 tr)

×