14 Power Analysis—A Priori Determination of Sample Size
Tải bản đầy đủ
212
â†œæ¸€å±®
â†œæ¸€å±®
K-GROUP MANOVA
3. For 2, 3, 4, 5, 6, 8, and 10 groups.
4. For powerÂ€=Â€.70, .80, .90, and .95.
His tables are specifically for the Hotelling–Lawley trace criterion, and this might
seem to limit their utility. However, as Morrison (1967) noted for large sample size,
and as Olson (1974) showed for small and moderate sample size, the power differences
among the four main multivariate test statistics are generally quite small. Thus, the
sample size requirements for Wilks’ Λ, the Pillai–Bartlett trace, and Roy’s largest root
will be very similar to those for the Hotelling–Lawley trace for the vast majority of
situations.
Lauter’s tables are set up in terms of a certain minimum deviation from the multivariate
null hypothesis, which can be expressed in the following three forms:
j
1
µ ij − µ i ≥ q 2 , where μi is the total
1. There exists a variable i such that 2
σ j =1 j =1
mean and σ2 is variance.
∑(
)
2. There exists a variable i such that 1 / σ i µ ij1 − µ ij 2 ≥ d for two groups j1 and j2.
3. There exists a variable i such that for all pairs of groups 1 and m we have
1 / σ i µ il − µ il > c.
In Table A.5 of Appendix AÂ€of this text we present selected situations and power values that it is believed would be of most value to social science researchers: for 2, 3,
4, 5, 6, 8, 10, and 15 variables, with 3, 4, 5, and 6 groups, and for powerÂ€=Â€.70, .80,
and .90. We have also characterized the four different minimum deviation patterns
as very large, large, moderate, and small effect sizes. Although the characterizations
may be somewhat rough, they are reasonable in the following senses: They agree with
Cohen’s definitions of large, medium, and small effect sizes for one variable (Lauter
included the univariate case in his tables), and with Stevens’ (1980) definitions of
large, medium, and small effect sizes for the two-group MANOVA case.
It is important to note that there could be several ways, other than that specified by
Lauter, in which a large, moderate, or small multivariate effect size could occur. But
the essential point is how many participants will be needed for a given effect size,
regardless of the combination of differences on the variables that produced the specific
effect size. Thus, the tables do have broad applicability. We consider shortly a few specific examples of the use of the tables, but first we present a compact table that should
be of great interest to applied researchers:
Groups
Effect size
Very large
Large
Medium
Small
3
4
5
6
12–16
25–32
42–54
92–120
14–18
28–36
48–62
105–140
15–19
31–40
54–70
120–155
16–21
33–44
58–76
130–170
Chapter 5
â†œæ¸€å±®
â†œæ¸€å±®
This table gives the range of sample sizes needed per group for adequate power (.70)
at αÂ€=Â€.05 when there are three to six variables.
Thus, if we expect a large effect size and have four groups, 28 participants per group
are needed for powerÂ€=Â€.70 with three variables, whereas 36 participants per group are
required if there were six dependent variables.
Now we consider two examples to illustrate the use of the Lauter sample size tables
in the appendix.
Example 5.6
An investigator has a four-group MANOVA with five dependent variables. He wishes
powerÂ€=Â€.80 at αÂ€=Â€.05. From previous research and his knowledge of the nature of the
treatments, he anticipates a moderate effect size. How many participants per group
will he need? Reference to Table A.5 (for four groups) indicates that 70 participants
per group are required.
Example 5.7
A team of researchers has a five-group, seven-dependent-variable MANOVA. They
wish powerÂ€ =Â€ .70 at αÂ€ =Â€ .05. From previous research they anticipate a large effect
size. How many participants per group are needed? Interpolating in Table A.5 (for
five groups) between six and eight variables, we see that 43 participants per group are
needed, or a total of 215 participants.
5.15â•‡SUMMARY
Cohen’s (1968) seminal article showed social science researchers that univariate ANOVA
could be considered as a special case of regression, by dummy-coding group membership. In this chapter we have pointed out that MANOVA can also be considered as a
special case of regression analysis, except that for MANOVA it is multivariate regression because there are several dependent variables being predicted from the dummy
variables. That is, separation of the mean vectors is equivalent to demonstrating that the
dummy variables (predictors) significantly predict the scores on the dependent variables.
For exploratory research where the focus is on individual dependent variables (and
not linear combinations of these variables), two post hoc procedures were given for
examining group differences for the outcome variables. Each procedure followed up
a significant multivariate test result with univariate ANOVAs for each outcome. If an
F test were significant for a given outcome and more than two groups were present,
pairwise comparisons were conducted using the Tukey procedure. The two procedures differ in that one procedure used a Bonferroni-adjusted alpha for the univariate
F tests and pairwise comparisons while the other did not. Of the two procedures, the
more widely recommended procedure is to use the Bonferroni-adjusted alpha for the
univariate ANOVAs and the Tukey procedure, as this procedure provides for greater
control of the overall type IÂ€error rate and a more accurate set of confidence intervals
213
214
â†œæ¸€å±®
â†œæ¸€å±®
K-GROUP MANOVA
(in terms of coverage). The procedure that uses no such alpha adjustment should be
considered only when the number of outcomes and groups is small (i.e., two orÂ€three).
For confirmatory research, planned comparisons were discussed. The setup of multivariate contrasts on SPSS MANOVA was illustrated. Although uncorrelated contrasts
are desirable because of ease of interpretation and the nice additive partitioning they
yield, it was noted that often the important questions an investigator has will yield
correlated contrasts. The use of SPSS MANOVA to obtain the unique contribution of
each correlated contrast was illustrated.
It was noted that the Roy and Hotelling–Lawley statistics are natural generalizations of
the univariate F ratio. In terms of which of the four multivariate test statistics to use in
practice, two criteria can be used: robustness and power. Wilks’ Λ, the Pillai–Bartlett
trace, and Hotelling–Lawley statistics are equally robust (for equal or approximately
equal group sizes) with respect to the homogeneity of covariance matrices assumption,
and therefore any one of them can be used. The power differences among the four statistics are in general quite small (< .06), so that there is no strong basis for preferring
any one of them over the others on power considerations.
The important problem, in terms of experimental planning, of a priori determination
of sample size was considered for three-, four-, five-, and six-group MANOVA for the
number of dependent variables ranging from 2 to 15.
5.16 EXERCISES
1. Consider the following data for a three-group, three-dependent-variable
problem:
Group 1
Group 2
Group 3
y1
y2
y3
y1
y2
y3
y1
y2
y3
2.0
1.5
2.0
2.5
1.0
1.5
4.0
3.0
3.5
1.0
1.0
2.5
2.0
3.0
4.0
2.0
3.5
3.0
4.0
3.5
1.0
2.5
2.5
1.5
2.5
3.0
1.0
2.5
3.0
3.5
3.5
1.0
2.0
1.5
1.0
3.0
4.5
1.5
2.5
3.0
4.0
3.5
4.5
3.0
4.5
4.5
4.0
4.0
5.0
2.5
2.5
3.0
4.5
3.5
3.0
3.5
5.0
1.0
1.0
1.5
2.0
2.0
2.5
2.0
1.0
1.0
2.0
2.0
2.0
1.0
2.5
3.0
3.0
2.5
1.0
1.5
3.5
1.0
1.5
1.0
2.0
2.5
2.5
2.5
1.0
1.5
2.5
Chapter 5
â†œæ¸€å±®
â†œæ¸€å±®
Use SAS or SPSS to run a one-way MANOVA. Use procedure 1 (with the
adjusted Bonferroni F tests) to do the follow-up tests.
(a) What is the multivariate null hypothesis? Do you reject it at αÂ€=Â€.05?
(b) If you reject in part (a), then for which outcomes are there group differences at the .05 level?
(c) For any ANOVAs that are significant, use the post hoc tests to describe
group differences. Be sure to rank order group performance based on the
statistical test results.
2. Consider the following data from Wilkinson (1975):
Group A
5
6
6
4
5
6
7
7
5
4
Group B
4
5
3
5
2
2
3
4
3
2
2
3
4
2
1
Group C
7
5
6
4
4
4
6
3
5
5
3
7
3
5
5
4
5
5
5
4
Run a one-way MANOVA on SAS or SPSS. Do the various multivariate test
statistics agree in a decision on H0?
3. This table shows analysis results for 12 separate ANOVAs. The researchers
were examining differences among three groups for outpatient therapy, using
symptoms reported on the Symptom Checklist 90–Revised.
SCL 90–R Group Main Effects
Group
Group 1 Group 2
Dimension
Somatization
Obsessivecompulsive
Interpersonal
sensitivity
Depression
Anxiety
Hostility
Phobic anxiety
Group 3
NÂ€=Â€48
NÂ€=Â€60
NÂ€=Â€57
x¯
x¯
x¯
F
df
53.7
48.7
53.2
53.9
53.7
52.2
â•‡.03
2.75
2,141
2,141
ns
ns
47.3
51.3
52.9
4.84
2,141
p < .01
47.5
48.5
48.1
49.8
53.5
52.9
54.6
54.2
53.9
52.2
52.4
51.8
5.44
1.86
3.82
2.08
2,141
2,141
2,141
2,141
p < .01
ns
p < .03
ns
Significance
(Continued )
215
216
â†œæ¸€å±®
â†œæ¸€å±®
K-GROUP MANOVA
Dimension
Paranoid ideation
Psychoticism
Global Severity
index positive
symptom
Distress index
Positive symptom
total
x¯
x¯
x¯
F
df
Significance
51.4
52.4
49.7
54.7
54.6
54.4
54.0
54.2
54.0
1.38
.37
2.55
2,141
2,141
2,141
ns
ns
ns
49.3
50.2
55.8
52.9
53.2
54.4
3.39
1.96
2,141
2,141
p < .04
ns
(a) Could we be confident that these results would replicate? Explain.
(b) In this study, the authors did not a priori hypothesize differences on the
specific variables for which significance was found. Given that, what would
have been a better method of analysis?
4. A researcher is testing the efficacy of four drugs in inhibiting undesirable
responses in patients. Drugs AÂ€and B are similar in composition, whereas drugs
C and D are distinctly different in composition from AÂ€and B, although similar in
their basic ingredients. He takes 100 patients and randomly assigns them to five
groups: Gp 1—control, Gp 2—drug A, Gp 3—drug B, Gp 4—drug C, and Gp 5—
drug D. The following would be four very relevant planned comparisons to test:
Contrasts
1
2
3
4
Control
Drug A
Drug B
Drug C
Drug D
1
0
0
0
−.25
1
1
0
−.25
1
−1
0
−.25
−1
0
1
−.25
−1
0
−1
(a) Show that these contrasts are orthogonal.
Now, consider the following set of contrasts, which might also be of interest in the preceding study:
Contrasts
1
2
3
4
Control
Drug A
Drug B
Drug C
Drug D
1
1
1
0
−.25
−.5
0
1
−.25
−.5
0
1
−.25
0
−.5
−1
−.25
0
−.5
−1
(b) Show that these contrasts are not orthogonal.
(c) Because neither of these two sets of contrasts is one of the standard sets
that come out of SPSS MANOVA, it would be necessary to use the special
contrast feature to test each set. Show the control lines for doing this for
each set. Assume four criterion measures.
Chapter 5
â†œæ¸€å±®
â†œæ¸€å±®
5. Find an article in one of the better journals in your content area from within the
last 5Â€years that used primarily MANOVA. Answer the following questions:
(a)How many statistical tests (univariate or multivariate or both) were done?
Were the authors aware of this, and did they adjust in any way?
(b) Was power an issue in this study? Explain.
(c) Did the authors address practical importance in ANY way? Explain.
REFERENCES
Clifford, M.â•›M. (1972). Effects of competition as a motivational technique in the classroom.
American Educational Research Journal, 9, 123–134.
Cohen, J. (1968). Multiple regression as a general data-analytic system. Psychological Bulletin, 70, 426–443.
Cohen, J. (1988). Statistical power analysis for the social sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
DasGupta, S.,Â€& Perlman, M.â•›D. (1974). Power of the noncentral F-test: Effect of additional
variates on Hotelling’s T2-Test. Journal of the American Statistical Association, 69, 174–180.
Dunnett, C.â•›W. (1980). Pairwise multiple comparisons in the homogeneous variance, unequal
sample size cases. Journal of the American Statistical Association, 75, 789–795.
Hays, W.â•›L. (1981). Statistics (3rd ed.). New York, NY: Holt, RinehartÂ€& Winston.
Ito, K. (1962). AÂ€comparison of the powers of two MANOVA tests. Biometrika, 49, 455–462.
Johnson, N.,Â€& Wichern, D. (1982). Applied multivariate statistical analysis. Englewood
Cliffs, NJ: Prentice Hall.
Keppel, G.,Â€& Wickens, T.â•›D. (2004). Design and analysis: AÂ€researcher’s handbook (4th ed.).
Upper Saddle River, NJ: Prentice Hall.
Keselman, H.â•›J., Murray, R.,Â€& Rogan, J. (1976). Effect of very unequal group sizes on Tukey’s
multiple comparison test. Educational and Psychological Measurement, 36, 263–270.
Lauter, J. (1978). Sample size requirements for the T2 test of MANOVA (tables for one-way
classification). Biometrical Journal, 20, 389–406.
Levin, J.â•›R., Serlin, R.â•›C.,Â€& Seaman, M.â•›A. (1994). AÂ€controlled, powerful multiple-comparison
strategy for several situations. Psychological Bulletin, 115, 153–159.
Lohnes, P.â•›R. (1961). Test space and discriminant space classification models and related
significance tests. Educational and Psychological Measurement, 21, 559–574.
Morrison, D.â•›F. (1967). Multivariate statistical methods. New York, NY: McGraw-Hill.
Novince, L. (1977). The contribution of cognitive restructuring to the effectiveness of behavior rehearsal in modifying social inhibition in females. Unpublished doctoral dissertation,
University of Cincinnati, OH.
Olson, C.â•›L. (1973). AÂ€Monte Carlo investigation of the robustness of multivariate analysis of
variance. Dissertation Abstracts International, 35, 6106B.
Olson, C.â•›L. (1974). Comparative robustness of six tests in multivariate analysis of variance.
Journal of the American Statistical Association, 69, 894–908.
217
218
â†œæ¸€å±®
â†œæ¸€å±®
K-GROUP MANOVA
Pillai, K.,Â€& Jayachandian, K. (1967). Power comparisons of tests of two multivariate hypotheses based on four criteria. Biometrika, 54, 195–210.
Pruzek, R.â•›M. (1971). Methods and problems in the analysis of multivariate data. Review of
Educational Research, 41, 163–190.
Stevens, J.â•›P. (1972). Four methods of analyzing between variation for the k-group MANOVA
problem. Multivariate Behavioral Research, 7, 499–522.
Stevens, J.â•›P. (1979). Comment on Olson: Choosing a test statistic in multivariate analysis of
variance. Psychological Bulletin, 86, 355–360.
Stevens, J.â•›P. (1980). Power of the multivariate analysis of variance tests. Psychological Bulletin, 88, 728–737.
Tatsuoka, M.â•›M. (1971). Multivariate analysis: Techniques for educational and psychological
research. New York, NY: Wiley.
Wilkinson, L. (1975). Response variable hypotheses in the multivariate analysis of variance.
Psychological Bulletin, 82, 408–412.
Chapter 6
ASSUMPTIONS IN MANOVA
6.1 INTRODUCTION
You may recall that one of the assumptions in analysis of variance is normality; that
is, the scores for the subjects in each group are normally distributed. Why should
we be interested in studying assumptions in ANOVA and MANOVA? Because, in
ANOVA and MANOVA, we set up a mathematical model based on these assumptions,
and all mathematical models are approximations to reality. Therefore, violations of
the assumptions are inevitable. The salient question becomes: How radically must a
given assumption be violated before it has a serious effect on type IÂ€and type II error
rates? Thus, we may set our αÂ€=Â€.05 and think we are rejecting falsely 5% of the time,
but if a given assumption is violated, we may be rejecting falsely 10%, or if another
assumption is violated, we may be rejecting falsely 40% of the time. For these kinds
of situations, we would certainly want to be able to detect such violations and take
some corrective action, but all violations of assumptions are not serious, and hence it
is crucial to know which assumptions to be particularly concerned about, and under
what conditions.
In this chapter, we consider in detail what effect violating assumptions has on type
IÂ€error and power. There has been plenty of research on violations of assumptions in
ANOVA and a fair amount of research for MANOVA on which to base our conclusions. First, we remind you of some basic terminology that is needed to discuss the
results of simulation (i.e., Monte Carlo) studies, whether univariate or multivariate.
The nominal α (level of significance) is the α level set by the experimenter, and is the
proportion of time one is rejecting falsely when all assumptions are met. The actual
α is the proportion of time one is rejecting falsely if one or more of the assumptions
is violated. We say the F statistic is robust when the actual α is very close to the level
of significance (nominal α). For example, the actual αs for some very skewed (nonnormal) populations may be only .055 or .06, very minor deviations from the level of
significance of .05.
220
â†œæ¸€å±®
â†œæ¸€å±®
ASSUMPtIONS IN MANOVA
6.2 ANOVA AND MANOVA ASSUMPTIONS
The three statistical assumptions for univariate ANOVAÂ€are:
1. The observations are independent. (violation very serious)
2. The observations are normally distributed on the dependent variable in each group.
(robust with respect to type IÂ€error)
(skewness has generally very little effect on power, while platykurtosis attenuates
power)
3. The population variances for the groups are equal, often referred to as the homogeneity of variance assumption.
(conditionally robust—robust if group sizes are equal or approximately equal—
largest/smallest < 1.5)
The assumptions for MANOVA are as follows:
1. The observations are independent. (violation very serious)
2. The observations on the dependent variables follow a multivariate normal distribution in each group.
(robust with respect to type IÂ€error)
(no studies on effect of skewness on power, but platykurtosis attenuates power)
3. The population covariance matrices for the p dependent variables are equal. (conditionally robust—robust if the group sizes are equal or approximately equal—
largest/smallest < 1.5)
6.3 INDEPENDENCE ASSUMPTION
Note that independence of observations is an assumption for both ANOVA and
MANOVA. We have listed this assumption first and are emphasizing it for three
reasons:
1. A violation of this assumption is very serious.
2. Dependent observations do occur fairly often in social science research.
3. Some statistics books do not mention this assumption, and in some cases where
they do, misleading statements are made (e.g., that dependent observations occur
only infrequently, that random assignment of subjects to groups will eliminate the
problem, or that this assumption is usually satisfied by using a random sample).
Now let us consider several situations in social science research where dependence
among the observations will be present. Cooperative learning has become very popular
since the early 1980s. In this method, students work in small groups, interacting with
each other and helping each other learn the lesson. In fact, the evaluation of the success
of the group is dependent on the individual success of its members. Many studies have
compared cooperative learning versus individualistic learning. It was once common
Chapter 6
â†œæ¸€å±®
â†œæ¸€å±®
that such data was not analyzed properly (Hykle, Stevens,Â€& Markle, 1993). That is,
analyses would be conducted using individual scores while not taking into account the
dependence among the observations. With the increasing use of multilevel modeling,
such analyses are likely not as common.
Teaching methods studies constitute another broad class of situations where dependence of observations is undoubtedly present. For example, a few troublemakers in a
classroom would have a detrimental effect on the achievement of many children in
the classroom. Thus, their posttest achievement would be at least partially dependent
on the disruptive classroom atmosphere. On the other hand, even with a favorable
classroom atmosphere, dependence is introduced, because the achievement of many
of the children will be enhanced by the positive learning situation. Therefore, in either
case (positive or negative classroom atmosphere), the achievement of each child is not
independent of the other children in the classroom.
Another situation in which observations would be dependent is a study comparing
the achievement of students working in pairs at computers versus students working
in groups of three. Here, if Bill and John, say, are working at the same computer, then
obviously Bill’s achievement is partially influenced by John. If individual scores were
to be used in the analysis, clustering effects, due to working at the same computer,
need to be accounted for in the analysis.
Glass and Hopkins (1984) made the following statement concerning situations where
independence may or may not be tenable: “Whenever the treatment is individually
administered, observations are independent. But where treatments involve interaction
among persons, such as discussion method or group counseling, the observations may
influence each other” (p.Â€353).
6.3.1 Effect of Correlated Observations
We indicated earlier that a violation of the independence of observations assumption
is very serious. We now elaborate on this assertion. Just a small amount of dependence
among the observations causes the actual α to be several times greater than the level
of significance. Dependence among the observations is measured by the intraclass
correlation ICC, where:
ICCÂ€= MSb − MSw / [MSb + (n −1)MSw]
Mb and MSw are the numerator and denominator of the F statistic and n is the number
of participants in each group.
TableÂ€ 6.1, from Scariano and Davenport (1987), shows precisely how dramatic an
effect dependence has on type IÂ€error. For example, for the three-group case with 10
participants per group and moderate dependence (ICCÂ€=Â€.30), the actual α is .54. Also,
for three groups with 30 participants per group and small dependence (ICCÂ€=Â€.10), the
221
222
â†œæ¸€å±®
â†œæ¸€å±®
Assumptions in MANOVA
Table 6.1:â•‡ Actual Type IÂ€Error Rates for Correlated Observations in a One-WayÂ€ANOVA
Intraclass correlation
Number of Group
groups
size
.00
2
3
5
10
3
10
30
100
3
10
30
100
3
10
30
100
3
10
30
100
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.0500
.01
.10
.30
.50
.70
.0522
.0606
.0848
.1658
.0529
.0641
.0985
.2236
.0540
.0692
.1192
.3147
.0560
.0783
.1594
.4892
.0740 .1402 .2374 .3819
.1654 .3729 .5344 .6752
.3402 .5928 .7205 .8131
.5716 .7662 .8446 .8976
.0837 .1866 .3430 .5585
.2227 .5379 .7397 .8718
.4917 .7999 .9049 .9573
.7791 .9333 .9705 .9872
.0997 .2684 .5149 .7808
.3151 .7446 .9175 .9798
.6908 .9506 .9888 .9977
.9397 .9945 .9989 .9998
.1323 .4396 .7837 .9664
.4945 .9439 .9957 .9998
.9119 .9986 1.0000 1.0000
.9978 1.0000 1.0000 1.0000
.90
.95
.99
.6275
.8282
.9036
.9477
.8367
.9639
.9886
.9966
.9704
.9984
.9998
1.0000
.9997
1.0000
1.0000
1.0000
.7339
.8809
.9335
.9640
.9163
.9826
.9946
.9984
.9923
.9996
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.8800
.9475
.9708
.9842
.9829
.9966
.9990
.9997
.9997
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
actual α is .49, almost 10 times the level of significance. Notice, also, from the table,
that for a fixed value of the intraclass correlation, the situation does not improve with
larger sample size, but gets far worse.
6.4â•‡WHAT SHOULD BE DONE WITH CORRELATED
OBSERVATIONS?
Given the results in TableÂ€6.1 for a positive intraclass correlation, one route investigators could take if they suspect that the nature of their study will lead to correlated observations is to test at a more stringent level of significance. For the three- and five-group
cases in TableÂ€6.1, with 10 observations per group and intraclass correlationÂ€=Â€.10, the
error rates are five to six times greater than the assumed level of significance of .05.
Thus, for this type of situation, it would be wise to test at αÂ€=Â€.01, realizing that the
actual error rate will be about .05 or somewhat greater. For the three- and five-group
cases in TableÂ€6.1 with 30 observations per group and intraclass correlationÂ€=Â€.10, the
error rates are about 10 times greater than .05. Here, it would be advisable to either test
at .01, realizing that the actual α will be about .10, or test at an even more stringent α
level.
If several small groups (counseling, social interaction, etc.) are involved in each treatment, and there are clear reasons to suspect that observations will be correlated within