1: Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
Tải bản đầy đủ - 0trang
11.1
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
517
Before developing inferential procedures concerning m1 Ϫ m2, we must consider
how the two samples, one from each population, are selected. Two samples are said
to be independent samples if the selection of the individuals or objects that make up
one sample does not influence the selection of individuals or objects in the other
sample. However, when observations from the first sample are paired in some meaningful way with observations in the second sample, the samples are said to be paired.
For example, to study the effectiveness of a speed-reading course, the reading speed
of subjects could be measured before they take the class and again after they complete
the course. This gives rise to two related samples—one from the population of individuals who have not taken this particular course (the “before” measurements) and
one from the population of individuals who have had such a course (the “after” measurements). These samples are paired. The two samples are not independently chosen, because the selection of individuals from the first (before) population completely
determines which individuals make up the sample from the second (after) population.
In this section, we consider procedures based on independent samples. Methods for
analyzing data resulting from paired samples are presented in Section 11.2.
Because x1 provides an estimate of m1 and x2 gives an estimate of m2, it is natural
to use x1 2 x2 as a point estimate of m1 Ϫ m2. The value of x1 varies from sample to
sample (it is a statistic), as does the value of x2. Since the difference x1 2 x2 is calculated from sample values, it is also a statistic and, therefore, has a sampling
distribution.
Properties of the Sampling Distribution of x1 2 x2
If the random samples on which x1 and x2 are based are selected independently
of one another, then
1. mx1 2x2 5 a
mean value
b 5 m x1 2 m x2 5 m 1 2 m 2
of x1 2 x2
The sampling distribution of x1 2 x2 is always centered at the value of
m1 2 m2, so x1 2 x2 is an unbiased statistic for estimating m1 2 m2.
2. s2x12x2 5 a
and
variance of
s21
s22
2
2
5
s
1
s
5
1
b
x1
x2
n1
n2
x1 2 x2
sx1 2x2 5 a
x1 2 x2
standard deviation
s21
s22
1
b5
n2
of x1 2 x2
Å n1
3. If n1 and n2 are both large or the population distributions are (at least
approximately) normal, x1 and x2 each have (at least approximately) a normal distribution. This implies that the sampling distribution of x1 2 x2 is
also normal or approximately normal.
Properties 1 and 2 follow from the following general results:
1. The mean value of a difference in means is the difference of the two individual
mean values.
2. The variance of a difference of independent quantities is the sum of the two individual variances.
When the sample sizes are large or when the population distributions are approximately normal, the properties of the sampling distribution of x1 2 x2 imply that
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
518
Chapter 11
Comparing Two Populations or Treatments
x1 2 x2 can be standardized to obtain a variable with a sampling distribution that is approximately the standard normal (z) distribution. This leads to the following result.
When two random samples are independently selected and when n1 and n2 are both large
or the population distributions are (at least approximately) normal, the distribution of
z5
x1 2 x2 2 1m1 2 m22
s22
s21
1
n2
Å n1
is described (at least approximately) by the standard normal (z) distribution.
Although it is possible to base a test procedure and confidence interval on this
result, the values of s21 and s22 are rarely known. As a result, the z statistic is rarely
used. When s21 and s22 are unknown, we must estimate them using the corresponding
sample variances, s 21 and s 22. The result on which both a test procedure and confidence
interval are based is given in the accompanying box.
When two random samples are independently selected and when n1 and n2 are both
large or when the population distributions are normal, the standardized variable
x 2 x2 2 1m1 2 m22
t5 1
s21
s22
1
n2
Å n1
has approximately a t distribution with
1 V 1 1 V2 2 2
s 21
s 22
df 5
and V2 5
2
2 where V1 5
n1
n2
V1
V2
1
n1 2 1
n2 2 1
The computed value of df should be truncated (rounded down) to obtain an integer
value of df.
If one or both sample sizes are small, we must consider the shape of the population distributions. We can use normal probability plots or boxplots to evaluate
whether it is reasonable to consider the population distributions to be approximately
normal.
Test Procedures
In a test designed to compare two population means, the null hypothesis is of the
form
H0: m1 Ϫ m2 ϭ hypothesized value
Often the hypothesized value is 0, indicating that there is no difference between the
population means. The alternative hypothesis involves the same hypothesized value
but uses one of three inequalities (less than, greater than, or not equal to), depending
on the research question of interest. As an example, let m1 and m2 denote the average
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11.1
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
519
fuel efficiencies (in miles per gallon, mpg) for two models of a certain type of car
equipped with 4-cylinder and 6-cylinder engines, respectively. The hypotheses under
consideration might be
H0: m1 Ϫ m2 ϭ 5
versus
Ha: m1 2 m2 Ͼ 5
The null hypothesis is equivalent to the claim that the mean fuel efficiency for the
4-cylinder engine exceeds the mean fuel efficiency for the 6-cylinder engine by
5 mpg. The alternative hypothesis states that the difference between the mean fuel
efficiencies is more than 5 mpg.
A test statistic is obtained by replacing m1 Ϫ m2 in the standardized t variable
(given in the previous box) with the hypothesized value that appears in H0. Thus, the
t statistic for testing H0: m1 Ϫ m2 ϭ 5 is
t5
x1 2 x2 2 5
s 22
s 21
1
n2
Å n1
When the sample sizes are large or when the population distributions are normal,
the sampling distribution of the test statistic is approximately a t distribution when
H0 is true. The P-value for the test is obtained by first computing the appropriate
number of degrees of freedom and then using Appendix Table 4, a graphing calculator, or a statistical software package. The following box gives a general description of
the test procedure.
Summary of the Two-Sample t Test for Comparing Two Populations
Null hypothesis: H0: m1 2 m2 5 hypothesized value
Test statistic: t 5
x1 2 x2 2 hypothesized value
s 22
s 21
1
n2
Å n1
The appropriate df for the two-sample t test is
df 5
1 V 1 1 V2 2 2
s 21
s 22
where
V
5
and
V
5
1
2
n1
n2
V 21
V 22
1
n1 2 1
n2 2 1
The computed number of degrees of freedom should be truncated (rounded down) to an integer.
Alternative hypothesis:
Ha: m1 2 m2 Ͼ hypothesized value
P-value:
Area under appropriate t curve to the right of the computed t
Ha: m1 2 m2 Ͻ hypothesized value
Area under appropriate t curve to the left of the computed t
Ha: m1 2 m2 ϶ hypothesized value
(1) 2(area to the right of the computed t) if t is positive
or
(2) 2(area to the left of the computed t) if t is negative
Assumptions: 1. The two samples are independently selected random samples from the populations of interest.
2. The sample sizes are large (generally 30 or larger)
or the population distributions are (at least approximately) normal.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
520
Chapter 11 Comparing Two Populations or Treatments
EXAMPLE 11.1
Brain Size
Do children diagnosed with attention deficit/hyperactivity disorder (ADHD) have
smaller brains than children without this condition? This question was the topic of a
research study described in the paper “Developmental Trajectories of Brain Volume
Abnormalities in Children and Adolescents with Attention Deficit/Hyperactivity
Disorder” (Journal of the American Medical Association [2002]: 1740–1747). Brain
scans were completed for 152 children with ADHD and 139 children of similar age
without ADHD. Summary values for total cerebral volume (in milliliters) are given
in the following table:
Children with ADHD
Children without ADHD
n
x
s
152
139
1059.4
1104.5
117.5
111.3
Do these data provide evidence that the mean brain volume of children with ADHD
is smaller than the mean for children without ADHD? Let’s test the relevant hypotheses using a .05 level of significance.
1. m1 ϭ true mean brain volume for children with ADHD
m2 ϭ true mean brain volume for children without ADHD
m1 Ϫ m2 ϭ difference in mean brain volume
2. H0: m1 Ϫ m2 ϭ 0 (no difference in mean brain volume)
3. Ha: m1 Ϫ m2 Ͻ 0 (mean brain volume is smaller for children with ADHD)
4. Significance level: a ϭ .05
5. Test statistic: t 5
x1 2 x2 2 hypothesized value
s 22
s 21
1
n2
Å n1
5
x1 2 x2 2 0
s 22
s 21
1
n2
Å n1
6. Assumptions: The paper states that the study controlled for age and that the
participants were “recruited from the local community.” This is not equivalent
to random sampling, but the authors of the paper (five of whom were doctors at
well-known medical institutions) believed that it was reasonable to regard these
samples as representative of the two groups under study. Both sample sizes are
large, so it is reasonable to proceed with the two-sample t test.
7. Calculation:
t5
11059.4 2 1104.52 2 0
1117.52
1111.32
1
Å 152
139
2
2
5
245.10
245.10
5
5 23.36
!90.831 1 89.120
13.415
8. P-value: We first compute the df for the two-sample t test:
V1 5
df 5
s 21
s 22
5 90.831 V2 5
5 89.120
n1
n2
1 V 1 1 V2 2 2
190.831 1 89.1202 2
32,382.362
5
5 288.636
2
2
2
2 5
1
2
1
2
V2
89.120
V1
112.191
90.831
1
1
n1 2 1
n2 2 1
151
138
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11.1
521
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
We truncate the number of degrees of freedom to 288. Appendix Table 4 shows
that the area under the t curve with 288 df (using the z critical value column
because 288 is larger than 120 df) to the left of Ϫ3.36 is approximately 0.
Therefore,
P-value Ϸ 0
9. Conclusion: Because P-value Ϸ 0 Յ .05, we reject H0. There is convincing evidence that the mean brain volume for children with ADHD is smaller than the
mean for children without ADHD.
E X A M P L E 1 1 . 2 Sex and Salary
Are women still paid less than men for comparable work? The authors of the paper
“Sex and Salary: A Survey of Purchasing and Supply Professionals” (Journal of
Purchasing and Supply Management [2008]: 112–124) carried out a study in which
salary data was collected from a random sample of men and a random sample of
women who worked as purchasing managers and who were subscribers to Purchasing
magazine. Salary data consistent with summary quantities given in the paper appear
below (the actual sample sizes for the study were much larger):
Annual Salary (in thousands of dollars)
Men
Women
81
78
69
60
81
67
76
61
76
62
74
73
69
71
76
58
79
68
65
48
Even though the samples were selected from subscribers of a particular magazine, the
authors of the paper believed that it was reasonable to view the samples in the study
as representative of the two populations of interest—male purchasing managers and
female purchasing managers. For purposes of this example, we will assume that it is
also reasonable to consider the two samples given here as representative of the populations. We will use the given data and a significance level of .05 to determine if there
is convincing evidence that the mean annual salary for male purchasing managers is
greater than the mean annual salary for female purchasing managers.
1. m1 ϭ mean annual salary for male purchasing managers
m2 ϭ mean annual salary for female purchasing managers
m1 2 m2 ϭ difference in mean annual salary
2. H0: m1 Ϫ m2 ϭ 0
3. Ha: m1 Ϫ m2 Ͼ 0
4. Significance level: a ϭ .05
5. Test statistic: t 5
x1 2 x2 2 hypothesized value
s 21
Å n1
Data set available online
1
s 22
n2
5
x1 2 x2 2 0
s 21
s 22
1
n2
Å n1
6. Assumptions: For the two-sample t test to be appropriate, we must be willing to
assume that the two samples can be viewed as independently selected random
samples from the two populations of interest. As previously noted, we assume that
this is reasonable. Because both of the sample sizes are small, it is also necessary to
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
522
Chapter 11 Comparing Two Populations or Treatments
assume that the salary distribution is approximately normal for each of these two
populations. Boxplots constructed using the sample data are shown here:
Male
Female
50
55
60
65
70
75
Annual salary (thousands of dollars)
80
85
Because the boxplots are reasonably symmetric and because there are no outliers,
it is reasonable to proceed with the two-sample t-test.
7. Calculation: For the given data:
x 5 74.6 s1 5 5.4 x2 5 64.6 s2 5 8.6, and
t5
174.6 2 64.62 2 0
5
10
5
10
5 3.11
3.211
15.42 2
18.62 2
"2.916 1 7.396
1
Å 10
10
8. P-value: We first compute the df for the two-sample t test:
V1 5
df 5
s 21
s 22
5 2.916 V2 5
5 7.396
n1
n2
1 V 1 1 V2 2 2
12.916 1 7.3962 2
106.337
5 15.14
5
2
2
2
2 5
1
2
1
2
V1
7.023
V2
7.396
2.916
1
1
n1 2 1
n2 2 1
9
9
We truncate df to 15. Appendix Table 4 shows that the area under the t curve
with 15 df to the right of 3.1 is .004, so P-value 5 .004.
9. Conclusion: Because the P-value of .004 is less than .05, we reject H0. There is
convincing evidence to support the claim that mean annual salary for male purchasing managers is higher than mean annual salary for female purchasing managers.
Suppose the computed value of the test statistic in Step 7 had been 1.13
rather than 3.11. Then the P-value would have been .143 (the area to the right
of 1.1 under the t curve with 15 df) and the decision would have been to not
reject the null hypothesis. We then would have concluded that there was not
convincing evidence that the mean annual salary was higher for males than for
females. Notice that when we fail to reject the null hypothesis of no difference
between the population means, we are not saying that there is convincing evidence that the means are equal—we can only say that we were not convinced that
they were different.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11.1
523
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
Many statistical computer packages can perform the calculations for the twosample t test. The accompanying partial SPSS output shows summary statistics for
the two groups of Example 11.2. The second part of the output gives the number of
degrees of freedom, the test-statistic value, and a two-sided P-value. Since the test in
Example 11.2 is a one-sided test, we need to divide the two-sided P-value in half to
.0072
5 .0036. This P-value differs
obtain the correct value for our test, which is
2
from the value in
Example 11.2 because Appendix Table 4 gives only tail areas for t values to one decimal place and so we rounded the test statistic to 3.1. As a consequence, the P-value
given in the example is only approximate; the P-value from SPSS is more accurate.
GROUP STATISTICS
Salary
Sex
N
Mean
male
female
10
10
74.60
64.60
Std. Deviation
Std. Error Mean
5.40
8.62
1.71
2.73
95% CONFIDENCE
INTERVAL OF THE
DIFFERENCE
Equal Variances Not Assumed
t
df
3.11
15.14
Sig.
(2-tailed)
.0072
Mean
Difference
10.000
Std. Error
Difference
3.211
Lower
Upper
3.1454
16.8546
Comparing Treatments
When an experiment is carried out to compare two treatments (or to compare a single
treatment with a control), the investigator is interested in the effect of the treatments
on some response variable. The treatments are “applied” to individuals (as in an experiment to compare two different medications for decreasing blood pressure) or objects
(as in an experiment to compare two different baking temperatures on the density of
bread), and the value of some response variable (for example, blood pressure, density)
is recorded. Based on the resulting data, the investigator might wish to determine
whether there is a difference in the mean response for the two treatments.
In many actual experimental situations, the individuals or objects to which the
treatments will be applied are not selected at random from some larger population. A
consequence of this is that it is not possible to generalize the results of the experiment
to some larger population. However, if the experimental design provides for random
assignment of the individuals or objects used in the experiment to treatments (or for random assignment of treatments to the individuals or objects), it is possible to test hypotheses
about treatment differences.
It is common practice to use the two-sample t test statistic previously described
if the experiment employs random assignment and if either the sample sizes are large
or it is reasonable to think that the treatment response distributions (the distributions
of response values that would result if the treatments were applied to a very large
number of individuals or objects) are approximately normal.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
524
Chapter 11
Comparing Two Populations or Treatments
Two-Sample t Test for Comparing Two Treatments
When
1. individuals or objects are randomly assigned to treatments (or vice versa; that
is, treatments are randomly assigned to individuals or objects), and
2. the sample sizes are large (generally 30 or larger)
or the treatment response distributions are approximately normal,
the two-sample t test can be used to test H0: m1 2 m2 5 hypothesized value,
where m1 and m2 represent the mean response for treatments 1 and 2,
respectively.
In this case, these two conditions replace the assumptions previously stated for
comparing two population means. Whether the assumption of normality of the
treatment response distributions is reasonable can be assessed by constructing
normal probability plots or boxplots of the response values in each sample.
When the two-sample t test is used to compare two treatments when the individuals or objects used in the experiment are not randomly selected from some population, it is only an approximate test (the reported P-values are only approximate).
However, this is still the most common way to analyze such data.
E X A M P L E 1 1 . 3 Reading Emotions
Pierre Bourrier/Iconica/Getty Images
The paper “How Happy Was I, Anyway? A Retrospective Impact Bias” (Social
Cognition [2003]: 421–446) reported on an experiment designed to assess the extent
to which people rationalize poor performance. In this study, 246 college undergraduates were assigned at random to one of two groups—a negative feedback group or
a positive feedback group. Each participant took a test in which they were asked to
guess the emotions displayed in photographs of faces. At the end of the test, those
in the negative feedback group were told that they had correctly answered 21 of the
40 items and were assigned a “grade” of D. Those in the positive feedback group were
told that they had answered 35 of 40 correctly and were assigned an A grade. After a
brief time, participants were asked to answer two sets of questions. One set of questions asked about the validity of the test and the other set of questions asked about
the importance of being able to read faces. The researchers hypothesized that those in
the negative feedback group would tend to rationalize their poor performance by rating both the validity of the test and the importance of being a good face reader lower
than those in the positive feedback group. Do the data from this experiment support
the researchers’ hypotheses?
TEST VALIDITY RATING
Group
Negative feedback
Positive feedback
FACE READING
IMPORTANCE RATING
Sample
Size
Mean
Standard
Deviation
Mean
Standard
Deviation
123
123
5.51
6.95
.79
1.09
5.36
6.62
1.00
1.19
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11.1
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
525
We will test the relevant hypotheses using a significance level of .01, beginning with
the hypotheses about the test validity rating.
1. Let m1 denote the mean test validity score for the negative feedback group and
define m2 analogously for the positive feedback group. Then m1 Ϫ m2 is the difference between the mean test validity scores for the two treatment groups.
2. H0: m1 2 m2 ϭ 0
3. Ha: m1 Ϫ m2 Ͻ 0
4. Significance level: a ϭ .01
5. Test statistic: t 5
x1 2 x2 2 hypothesized value
s 21
Å n1
1
s 22
5
n2
x1 2 x2 2 0
s 22
s 21
1
n2
Å n1
6. Assumptions: Subjects were randomly assigned to the treatment groups, and
both sample sizes are large, so use of the two-sample t test is reasonable.
7. Calculation: t 5
15.51 2 6.952 2 0
1.792
11.092
1
Å 123
123
2
2
5
21.44
5 211.86
0.1214
8. P-value: We first compute the df for the two-sample t test:
V1 5
df 5
s 21
s 22
5 .0051 V2 5
5 .0097
n1
n2
1 V1 1 V 2 2 2
1.0051 1 .00972 2
.000219
5
5 219
2
2
2
2 5
1.00972
1.00512
V1
.000001
V2
1
1
n1 2 1
n2 2 1
122
122
This is a lower-tailed test, so the P-value is the area under the t curve with df ϭ
219 and to the left of Ϫ11.86. Since Ϫ11.86 is so far out in the lower tail of this
t curve, P-value Ϸ 0.
9. Conclusion: Since P-value # a, H0 is rejected. There is evidence that the mean
validity rating score for the positive feedback group is higher. The data support
the conclusion that those who received negative feedback did not rate the validity of the test, on average, as highly as those who thought they had done well on
the test.
We will use Minitab to test the researchers’ hypothesis that those in the negative
feedback group would also not rate the importance of being able to read faces as
highly as those in the positive group.
1. Let m1 denote the mean face reading importance rating for the negative feedback
group and define m2 analogously for the positive feedback group. Then m1 2 m2
is the difference between the mean face reading ratings for the two treatment
groups.
2. H0: m1 Ϫ m2 ϭ 0
3. Ha: m1 Ϫ m2 Ͻ 0
4. Significance level: a ϭ .01
5. Test statistic: t 5
x1 2 x2 2 hypothesized value
s 21
s 22
1
n2
Å n1
5
x1 2 x2 2 0
s 21
s 22
1
n2
Å n1
6. Assumptions: Subjects were randomly assigned to the treatment groups, and
both sample sizes are large, so use of the two-sample t test is reasonable.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
526
Chapter 11
Comparing Two Populations or Treatments
7. Calculation: Minitab output is shown here. From the output, t ϭ Ϫ8.99.
Two-Sample T-Test and CI
Sample
N
Mean
StDev
SE Mean
1
123
5.36
1.00
0.090
2
123
6.62
1.19
0.11
Difference = mu (1) – mu (2)
Estimate for difference: –1.26000
95% upper bound for difference: –1.02856
T-Test of difference = 0 (vs <): T-Value ϭ –8.99
P-Value = 0.000
DF = 236
8. P-value: From the Minitab output, P-value ϭ 0.000.
9. Conclusion: Since P-value Յ a, H0 is rejected. There is evidence that the mean
face reading importance rating for the positive feedback group is higher.
You have probably noticed that evaluating the formula for number of degrees of
freedom for the two-sample t test involves quite a bit of arithmetic. An alternative
approach is to compute a conservative estimate of the P-value—one that is close to
but larger than the actual P-value. If H0 is rejected using this conservative estimate,
then it will also be rejected if the actual P-value is used. A conservative estimate of the
P-value for the two-sample t test can be found by using the t curve with the number of
degrees of freedom equal to the smaller of (n1 2 1) and (n2 2 1).
The Pooled t Test
The two-sample t test procedure just described is appropriate when it is reasonable to
assume that the population distributions are approximately normal. If it is also
known that the variances of the two populations are equal 1s21 5 s222 , an alternative
procedure known as the pooled t test can be used. This test procedure combines information from both samples to obtain a “pooled” estimate of the common variance and
then uses this pooled estimate of the variance in place of s 21 and s 22 in the t test statistic.
This test procedure was widely used in the past, but it has fallen into some disfavor
because it is quite sensitive to departures from the assumption of equal population
variances. If the population variances are equal, the pooled t procedure has a slightly
better chance of detecting departures from H0 than does the two-sample t test of this
section. However, P-values based on the pooled t procedure can be seriously in error
if the population variances are not equal, so, in general, the two-sample t procedure
is a better choice than the pooled t test.
Comparisons and Causation
If the assignment of treatments to the individuals or objects used in a comparison of
treatments is not made by the investigators, the study is observational. As an example,
the article “Lead and Cadmium Absorption Among Children near a Nonferrous
Metal Plant” (Environmental Research [1978]: 290–308) reported data on blood
lead concentrations for two different samples of children. The first sample was drawn
from a population residing within 1 km of a lead smelter, whereas those in the second
sample were selected from a rural area much farther from the smelter. It was the parents of the children, rather than the investigators, who determined whether the children would be in the close-to-smelter group or the far-from-smelter group. As a second example, a letter in the Journal of the American Medical Association (May 19,
1978) reported on a comparison of doctors’ longevity after medical school graduation
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
11.1
Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples
527
for those with an academic affiliation and those in private practice. (The letter writer’s
stated objective was to see whether “publish or perish” really meant “publish and perish.”) Here again, an investigator did not start out with a group of doctors, assigning
some to academic and others to nonacademic careers. The doctors themselves selected
their groups.
The difficulty with drawing conclusions based on an observational study is that
a statistically significant difference may be due to some underlying factors that have
not been controlled rather than to conditions that define the groups. Does the type
of medical practice itself have an effect on longevity, or is the observed difference in
lifetimes caused by other factors, which themselves led graduates to choose academic
or nonacademic careers? Similarly, is the observed difference in blood lead concentration levels due to proximity to the smelter? Perhaps other physical and socioeconomic
factors are related both to choice of living area and to concentration.
In general, rejection of H0: m1 2 m2 ϭ 0 in favor of Ha: m1 Ϫ m2 Ͼ 0 suggests
that, on average, higher values of the variable are associated with individuals in the first
population or receiving the first treatment than with those in the second population
or receiving the second treatment. But association does not imply causation. Strong
statistical evidence for a causal relationship can be built up over time through many
different comparative studies that point to the same conclusions (as in the many investigations linking smoking to lung cancer). A randomized controlled experiment,
in which investigators assign subjects at random to the treatments or conditions being
compared, is particularly effective in suggesting causality. With such random assignment, the investigator and other interested parties can have more confidence in the
conclusion that an observed difference is caused by the difference in treatments or
conditions.
A Confidence Interval
A confidence interval for m1 Ϫ m2 is easily obtained from the basic t variable of this
section. Both the derivation of and the formula for the interval are similar to those of
the one-sample t interval discussed in Chapter 9.
The Two-Sample t Confidence Interval for the
Difference Between Two Population or Treatment Means
The general formula for a conﬁdence interval for m1 2 m2 when
1. the two samples are independently chosen random samples, and
2. the sample sizes are both large (generally n1 $ 30 and n2 $ 30
or
the population distributions are approximately normal
is
1 x1 2 x2 2 6 1t critical value2
s 22
s 21
1
n2
Å n1
The t critical value is based on
df 5
1 V 1 1 V2 2 2
s 21
s 22
where
V
5
and
V
5
1
2
n1
n2
V 21
V 22
1
n1 2 1
n2 2 1
(continued)
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.