Tải bản đầy đủ - 0 (trang)
1: Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples

1: Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples

Tải bản đầy đủ - 0trang

11.1



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



517



Before developing inferential procedures concerning m1 Ϫ m2, we must consider

how the two samples, one from each population, are selected. Two samples are said

to be independent samples if the selection of the individuals or objects that make up

one sample does not influence the selection of individuals or objects in the other

sample. However, when observations from the first sample are paired in some meaningful way with observations in the second sample, the samples are said to be paired.

For example, to study the effectiveness of a speed-reading course, the reading speed

of subjects could be measured before they take the class and again after they complete

the course. This gives rise to two related samples—one from the population of individuals who have not taken this particular course (the “before” measurements) and

one from the population of individuals who have had such a course (the “after” measurements). These samples are paired. The two samples are not independently chosen, because the selection of individuals from the first (before) population completely

determines which individuals make up the sample from the second (after) population.

In this section, we consider procedures based on independent samples. Methods for

analyzing data resulting from paired samples are presented in Section 11.2.

Because x1 provides an estimate of m1 and x2 gives an estimate of m2, it is natural

to use x1 2 x2 as a point estimate of m1 Ϫ m2. The value of x1 varies from sample to

sample (it is a statistic), as does the value of x2. Since the difference x1 2 x2 is calculated from sample values, it is also a statistic and, therefore, has a sampling

distribution.



Properties of the Sampling Distribution of x1 2 x2

If the random samples on which x1 and x2 are based are selected independently

of one another, then

1. mx1 2x2 5 a



mean value

b 5 m x1 2 m x2 5 m 1 2 m 2

of x1 2 x2



The sampling distribution of x1 2 x2 is always centered at the value of

m1 2 m2, so x1 2 x2 is an unbiased statistic for estimating m1 2 m2.

2. s2x12x2 5 a

and



variance of

s21

s22

2

2

5

s

1

s

5

1

b

x1

x2

n1

n2

x1 2 x2



sx1 2x2 5 a



x1 2 x2



standard deviation

s21

s22

1

b5

n2

of x1 2 x2

Å n1



3. If n1 and n2 are both large or the population distributions are (at least

approximately) normal, x1 and x2 each have (at least approximately) a normal distribution. This implies that the sampling distribution of x1 2 x2 is

also normal or approximately normal.

Properties 1 and 2 follow from the following general results:

1. The mean value of a difference in means is the difference of the two individual

mean values.

2. The variance of a difference of independent quantities is the sum of the two individual variances.

When the sample sizes are large or when the population distributions are approximately normal, the properties of the sampling distribution of x1 2 x2 imply that



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



518



Chapter 11



Comparing Two Populations or Treatments



x1 2 x2 can be standardized to obtain a variable with a sampling distribution that is approximately the standard normal (z) distribution. This leads to the following result.



When two random samples are independently selected and when n1 and n2 are both large

or the population distributions are (at least approximately) normal, the distribution of

z5



x1 2 x2 2 1m1 2 m22

s22

s21

1

n2

Å n1



is described (at least approximately) by the standard normal (z) distribution.



Although it is possible to base a test procedure and confidence interval on this

result, the values of s21 and s22 are rarely known. As a result, the z statistic is rarely

used. When s21 and s22 are unknown, we must estimate them using the corresponding

sample variances, s 21 and s 22. The result on which both a test procedure and confidence

interval are based is given in the accompanying box.



When two random samples are independently selected and when n1 and n2 are both

large or when the population distributions are normal, the standardized variable

x 2 x2 2 1m1 2 m22

t5 1

s21

s22

1

n2

Å n1

has approximately a t distribution with

1 V 1 1 V2 2 2

s 21

s 22

df 5

and V2 5

2

2  where V1 5

n1

n2

V1

V2

1

n1 2 1

n2 2 1

The computed value of df should be truncated (rounded down) to obtain an integer

value of df.



If one or both sample sizes are small, we must consider the shape of the population distributions. We can use normal probability plots or boxplots to evaluate

whether it is reasonable to consider the population distributions to be approximately

normal.



Test Procedures

In a test designed to compare two population means, the null hypothesis is of the

form

H0: m1 Ϫ m2 ϭ hypothesized value

Often the hypothesized value is 0, indicating that there is no difference between the

population means. The alternative hypothesis involves the same hypothesized value

but uses one of three inequalities (less than, greater than, or not equal to), depending

on the research question of interest. As an example, let m1 and m2 denote the average



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



11.1



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



519



fuel efficiencies (in miles per gallon, mpg) for two models of a certain type of car

equipped with 4-cylinder and 6-cylinder engines, respectively. The hypotheses under

consideration might be

H0: m1 Ϫ m2 ϭ 5



versus



Ha: m1 2 m2 Ͼ 5



The null hypothesis is equivalent to the claim that the mean fuel efficiency for the

4-cylinder engine exceeds the mean fuel efficiency for the 6-cylinder engine by

5 mpg. The alternative hypothesis states that the difference between the mean fuel

efficiencies is more than 5 mpg.

A test statistic is obtained by replacing m1 Ϫ m2 in the standardized t variable

(given in the previous box) with the hypothesized value that appears in H0. Thus, the

t statistic for testing H0: m1 Ϫ m2 ϭ 5 is

t5



x1 2 x2 2 5

s 22

s 21

1

n2

Å n1



When the sample sizes are large or when the population distributions are normal,

the sampling distribution of the test statistic is approximately a t distribution when

H0 is true. The P-value for the test is obtained by first computing the appropriate

number of degrees of freedom and then using Appendix Table 4, a graphing calculator, or a statistical software package. The following box gives a general description of

the test procedure.



Summary of the Two-Sample t Test for Comparing Two Populations

Null hypothesis: H0: m1 2 m2 5 hypothesized value

Test statistic: t 5



x1 2 x2 2 hypothesized value

s 22

s 21

1

n2

Å n1



The appropriate df for the two-sample t test is

df 5



1 V 1 1 V2 2 2

s 21

s 22

 where

V

5

and

V

5

1

2

n1

n2

V 21

V 22

1

n1 2 1

n2 2 1



The computed number of degrees of freedom should be truncated (rounded down) to an integer.

Alternative hypothesis:

Ha: m1 2 m2 Ͼ hypothesized value



P-value:

Area under appropriate t curve to the right of the computed t



Ha: m1 2 m2 Ͻ hypothesized value



Area under appropriate t curve to the left of the computed t



Ha: m1 2 m2 ϶ hypothesized value



(1) 2(area to the right of the computed t) if t is positive

or

(2) 2(area to the left of the computed t) if t is negative



Assumptions: 1. The two samples are independently selected random samples from the populations of interest.

2. The sample sizes are large (generally 30 or larger)

or the population distributions are (at least approximately) normal.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



520



Chapter 11 Comparing Two Populations or Treatments



EXAMPLE 11.1



Brain Size



Do children diagnosed with attention deficit/hyperactivity disorder (ADHD) have

smaller brains than children without this condition? This question was the topic of a

research study described in the paper “Developmental Trajectories of Brain Volume



Abnormalities in Children and Adolescents with Attention Deficit/Hyperactivity

Disorder” (Journal of the American Medical Association [2002]: 1740–1747). Brain

scans were completed for 152 children with ADHD and 139 children of similar age

without ADHD. Summary values for total cerebral volume (in milliliters) are given

in the following table:



Children with ADHD

Children without ADHD



n



x



s



152

139



1059.4

1104.5



117.5

111.3



Do these data provide evidence that the mean brain volume of children with ADHD

is smaller than the mean for children without ADHD? Let’s test the relevant hypotheses using a .05 level of significance.

1. m1 ϭ true mean brain volume for children with ADHD

m2 ϭ true mean brain volume for children without ADHD

m1 Ϫ m2 ϭ difference in mean brain volume

2. H0: m1 Ϫ m2 ϭ 0 (no difference in mean brain volume)

3. Ha: m1 Ϫ m2 Ͻ 0 (mean brain volume is smaller for children with ADHD)

4. Significance level: a ϭ .05

5. Test statistic: t 5



x1 2 x2 2 hypothesized value

s 22

s 21

1

n2

Å n1



5



x1 2 x2 2 0

s 22

s 21

1

n2

Å n1



6. Assumptions: The paper states that the study controlled for age and that the

participants were “recruited from the local community.” This is not equivalent

to random sampling, but the authors of the paper (five of whom were doctors at

well-known medical institutions) believed that it was reasonable to regard these

samples as representative of the two groups under study. Both sample sizes are

large, so it is reasonable to proceed with the two-sample t test.

7. Calculation:

t5



11059.4 2 1104.52 2 0

1117.52

1111.32

1

Å 152

139

2



2



5



245.10

245.10

5

5 23.36

!90.831 1 89.120

13.415



8. P-value: We first compute the df for the two-sample t test:

V1 5

df 5



s 21

s 22

5 90.831  V2 5

5 89.120

n1

n2

1 V 1 1 V2 2 2

190.831 1 89.1202 2

32,382.362

5

5 288.636

2

2

2

2 5

1

2

1

2

V2

89.120

V1

112.191

90.831

1

1

n1 2 1

n2 2 1

151

138



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



11.1



521



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



We truncate the number of degrees of freedom to 288. Appendix Table 4 shows

that the area under the t curve with 288 df (using the z critical value column

because 288 is larger than 120 df) to the left of Ϫ3.36 is approximately 0.

Therefore,

P-value Ϸ 0

9. Conclusion: Because P-value Ϸ 0 Յ .05, we reject H0. There is convincing evidence that the mean brain volume for children with ADHD is smaller than the

mean for children without ADHD.



E X A M P L E 1 1 . 2 Sex and Salary

Are women still paid less than men for comparable work? The authors of the paper



“Sex and Salary: A Survey of Purchasing and Supply Professionals” (Journal of

Purchasing and Supply Management [2008]: 112–124) carried out a study in which

salary data was collected from a random sample of men and a random sample of

women who worked as purchasing managers and who were subscribers to Purchasing

magazine. Salary data consistent with summary quantities given in the paper appear

below (the actual sample sizes for the study were much larger):



Annual Salary (in thousands of dollars)

Men

Women



81

78



69

60



81

67



76

61



76

62



74

73



69

71



76

58



79

68



65

48



Even though the samples were selected from subscribers of a particular magazine, the

authors of the paper believed that it was reasonable to view the samples in the study

as representative of the two populations of interest—male purchasing managers and

female purchasing managers. For purposes of this example, we will assume that it is

also reasonable to consider the two samples given here as representative of the populations. We will use the given data and a significance level of .05 to determine if there

is convincing evidence that the mean annual salary for male purchasing managers is

greater than the mean annual salary for female purchasing managers.

1. m1 ϭ mean annual salary for male purchasing managers

m2 ϭ mean annual salary for female purchasing managers

m1 2 m2 ϭ difference in mean annual salary

2. H0: m1 Ϫ m2 ϭ 0

3. Ha: m1 Ϫ m2 Ͼ 0

4. Significance level: a ϭ .05

5. Test statistic: t 5



x1 2 x2 2 hypothesized value

s 21

Å n1



Data set available online



1



s 22

n2



5



x1 2 x2 2 0

s 21

s 22

1

n2

Å n1



6. Assumptions: For the two-sample t test to be appropriate, we must be willing to

assume that the two samples can be viewed as independently selected random

samples from the two populations of interest. As previously noted, we assume that

this is reasonable. Because both of the sample sizes are small, it is also necessary to



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



522



Chapter 11 Comparing Two Populations or Treatments



assume that the salary distribution is approximately normal for each of these two

populations. Boxplots constructed using the sample data are shown here:



Male



Female



50



55



60

65

70

75

Annual salary (thousands of dollars)



80



85



Because the boxplots are reasonably symmetric and because there are no outliers,

it is reasonable to proceed with the two-sample t-test.

7. Calculation: For the given data:

x 5 74.6 s1 5 5.4 x2 5 64.6 s2 5 8.6, and

t5



174.6 2 64.62 2 0



5



10



5



10

5 3.11

3.211



15.42 2

18.62 2

"2.916 1 7.396

1

Å 10

10

8. P-value: We first compute the df for the two-sample t test:

V1 5

df 5



s 21

s 22

5 2.916  V2 5

5 7.396

n1

n2

1 V 1 1 V2 2 2

12.916 1 7.3962 2

106.337

5 15.14

5

2

2

2

2 5

1

2

1

2

V1

7.023

V2

7.396

2.916

1

1

n1 2 1

n2 2 1

9

9



We truncate df to 15. Appendix Table 4 shows that the area under the t curve

with 15 df to the right of 3.1 is .004, so P-value 5 .004.

9. Conclusion: Because the P-value of .004 is less than .05, we reject H0. There is

convincing evidence to support the claim that mean annual salary for male purchasing managers is higher than mean annual salary for female purchasing managers.

Suppose the computed value of the test statistic in Step 7 had been 1.13

rather than 3.11. Then the P-value would have been .143 (the area to the right

of 1.1 under the t curve with 15 df) and the decision would have been to not

reject the null hypothesis. We then would have concluded that there was not

convincing evidence that the mean annual salary was higher for males than for

females. Notice that when we fail to reject the null hypothesis of no difference

between the population means, we are not saying that there is convincing evidence that the means are equal—we can only say that we were not convinced that

they were different.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



11.1



523



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



Many statistical computer packages can perform the calculations for the twosample t test. The accompanying partial SPSS output shows summary statistics for

the two groups of Example 11.2. The second part of the output gives the number of

degrees of freedom, the test-statistic value, and a two-sided P-value. Since the test in

Example 11.2 is a one-sided test, we need to divide the two-sided P-value in half to

.0072

5 .0036. This P-value differs

obtain the correct value for our test, which is

2

from the value in

Example 11.2 because Appendix Table 4 gives only tail areas for t values to one decimal place and so we rounded the test statistic to 3.1. As a consequence, the P-value

given in the example is only approximate; the P-value from SPSS is more accurate.

GROUP STATISTICS



Salary



Sex



N



Mean



male

female



10

10



74.60

64.60



Std. Deviation



Std. Error Mean



5.40

8.62



1.71

2.73



95% CONFIDENCE

INTERVAL OF THE

DIFFERENCE



Equal Variances Not Assumed



t



df



3.11



15.14



Sig.

(2-tailed)

.0072



Mean

Difference

10.000



Std. Error

Difference

3.211



Lower



Upper



3.1454



16.8546



Comparing Treatments

When an experiment is carried out to compare two treatments (or to compare a single

treatment with a control), the investigator is interested in the effect of the treatments

on some response variable. The treatments are “applied” to individuals (as in an experiment to compare two different medications for decreasing blood pressure) or objects

(as in an experiment to compare two different baking temperatures on the density of

bread), and the value of some response variable (for example, blood pressure, density)

is recorded. Based on the resulting data, the investigator might wish to determine

whether there is a difference in the mean response for the two treatments.

In many actual experimental situations, the individuals or objects to which the

treatments will be applied are not selected at random from some larger population. A

consequence of this is that it is not possible to generalize the results of the experiment

to some larger population. However, if the experimental design provides for random

assignment of the individuals or objects used in the experiment to treatments (or for random assignment of treatments to the individuals or objects), it is possible to test hypotheses

about treatment differences.

It is common practice to use the two-sample t test statistic previously described

if the experiment employs random assignment and if either the sample sizes are large

or it is reasonable to think that the treatment response distributions (the distributions

of response values that would result if the treatments were applied to a very large

number of individuals or objects) are approximately normal.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



524



Chapter 11



Comparing Two Populations or Treatments



Two-Sample t Test for Comparing Two Treatments

When

1. individuals or objects are randomly assigned to treatments (or vice versa; that

is, treatments are randomly assigned to individuals or objects), and

2. the sample sizes are large (generally 30 or larger)

or the treatment response distributions are approximately normal,

the two-sample t test can be used to test H0: m1 2 m2 5 hypothesized value,

where m1 and m2 represent the mean response for treatments 1 and 2,

respectively.

In this case, these two conditions replace the assumptions previously stated for

comparing two population means. Whether the assumption of normality of the

treatment response distributions is reasonable can be assessed by constructing

normal probability plots or boxplots of the response values in each sample.



When the two-sample t test is used to compare two treatments when the individuals or objects used in the experiment are not randomly selected from some population, it is only an approximate test (the reported P-values are only approximate).

However, this is still the most common way to analyze such data.



E X A M P L E 1 1 . 3 Reading Emotions



Pierre Bourrier/Iconica/Getty Images



The paper “How Happy Was I, Anyway? A Retrospective Impact Bias” (Social

Cognition [2003]: 421–446) reported on an experiment designed to assess the extent

to which people rationalize poor performance. In this study, 246 college undergraduates were assigned at random to one of two groups—a negative feedback group or

a positive feedback group. Each participant took a test in which they were asked to

guess the emotions displayed in photographs of faces. At the end of the test, those

in the negative feedback group were told that they had correctly answered 21 of the

40 items and were assigned a “grade” of D. Those in the positive feedback group were

told that they had answered 35 of 40 correctly and were assigned an A grade. After a

brief time, participants were asked to answer two sets of questions. One set of questions asked about the validity of the test and the other set of questions asked about

the importance of being able to read faces. The researchers hypothesized that those in

the negative feedback group would tend to rationalize their poor performance by rating both the validity of the test and the importance of being a good face reader lower

than those in the positive feedback group. Do the data from this experiment support

the researchers’ hypotheses?



TEST VALIDITY RATING



Group

Negative feedback

Positive feedback



FACE READING

IMPORTANCE RATING



Sample

Size



Mean



Standard

Deviation



Mean



Standard

Deviation



123

123



5.51

6.95



.79

1.09



5.36

6.62



1.00

1.19



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



11.1



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



525



We will test the relevant hypotheses using a significance level of .01, beginning with

the hypotheses about the test validity rating.

1. Let m1 denote the mean test validity score for the negative feedback group and

define m2 analogously for the positive feedback group. Then m1 Ϫ m2 is the difference between the mean test validity scores for the two treatment groups.

2. H0: m1 2 m2 ϭ 0

3. Ha: m1 Ϫ m2 Ͻ 0

4. Significance level: a ϭ .01

5. Test statistic: t 5



x1 2 x2 2 hypothesized value

s 21

Å n1



1



s 22



5



n2



x1 2 x2 2 0

s 22

s 21

1

n2

Å n1



6. Assumptions: Subjects were randomly assigned to the treatment groups, and

both sample sizes are large, so use of the two-sample t test is reasonable.

7. Calculation: t 5



15.51 2 6.952 2 0

1.792

11.092

1

Å 123

123

2



2



5



21.44

5 211.86

0.1214



8. P-value: We first compute the df for the two-sample t test:

V1 5

df 5



s 21

s 22

5 .0051  V2 5

5 .0097

n1

n2

1 V1 1 V 2 2 2

1.0051 1 .00972 2

.000219

5

5 219

2

2

2

2 5

1.00972

1.00512

V1

.000001

V2

1

1

n1 2 1

n2 2 1

122

122



This is a lower-tailed test, so the P-value is the area under the t curve with df ϭ

219 and to the left of Ϫ11.86. Since Ϫ11.86 is so far out in the lower tail of this

t curve, P-value Ϸ 0.

9. Conclusion: Since P-value # a, H0 is rejected. There is evidence that the mean

validity rating score for the positive feedback group is higher. The data support

the conclusion that those who received negative feedback did not rate the validity of the test, on average, as highly as those who thought they had done well on

the test.

We will use Minitab to test the researchers’ hypothesis that those in the negative

feedback group would also not rate the importance of being able to read faces as

highly as those in the positive group.

1. Let m1 denote the mean face reading importance rating for the negative feedback

group and define m2 analogously for the positive feedback group. Then m1 2 m2

is the difference between the mean face reading ratings for the two treatment

groups.

2. H0: m1 Ϫ m2 ϭ 0

3. Ha: m1 Ϫ m2 Ͻ 0

4. Significance level: a ϭ .01

5. Test statistic: t 5



x1 2 x2 2 hypothesized value

s 21

s 22

1

n2

Å n1



5



x1 2 x2 2 0

s 21

s 22

1

n2

Å n1



6. Assumptions: Subjects were randomly assigned to the treatment groups, and

both sample sizes are large, so use of the two-sample t test is reasonable.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



526



Chapter 11



Comparing Two Populations or Treatments



7. Calculation: Minitab output is shown here. From the output, t ϭ Ϫ8.99.

Two-Sample T-Test and CI

Sample

N

Mean

StDev

SE Mean

1

123

5.36

1.00

0.090

2

123

6.62

1.19

0.11

Difference = mu (1) – mu (2)

Estimate for difference: –1.26000

95% upper bound for difference: –1.02856

T-Test of difference = 0 (vs <): T-Value ϭ –8.99

P-Value = 0.000



DF = 236



8. P-value: From the Minitab output, P-value ϭ 0.000.

9. Conclusion: Since P-value Յ a, H0 is rejected. There is evidence that the mean

face reading importance rating for the positive feedback group is higher.



You have probably noticed that evaluating the formula for number of degrees of

freedom for the two-sample t test involves quite a bit of arithmetic. An alternative

approach is to compute a conservative estimate of the P-value—one that is close to

but larger than the actual P-value. If H0 is rejected using this conservative estimate,

then it will also be rejected if the actual P-value is used. A conservative estimate of the

P-value for the two-sample t test can be found by using the t curve with the number of

degrees of freedom equal to the smaller of (n1 2 1) and (n2 2 1).



The Pooled t Test

The two-sample t test procedure just described is appropriate when it is reasonable to

assume that the population distributions are approximately normal. If it is also

known that the variances of the two populations are equal 1s21 5 s222 , an alternative

procedure known as the pooled t test can be used. This test procedure combines information from both samples to obtain a “pooled” estimate of the common variance and

then uses this pooled estimate of the variance in place of s 21 and s 22 in the t test statistic.

This test procedure was widely used in the past, but it has fallen into some disfavor

because it is quite sensitive to departures from the assumption of equal population

variances. If the population variances are equal, the pooled t procedure has a slightly

better chance of detecting departures from H0 than does the two-sample t test of this

section. However, P-values based on the pooled t procedure can be seriously in error

if the population variances are not equal, so, in general, the two-sample t procedure

is a better choice than the pooled t test.



Comparisons and Causation

If the assignment of treatments to the individuals or objects used in a comparison of

treatments is not made by the investigators, the study is observational. As an example,

the article “Lead and Cadmium Absorption Among Children near a Nonferrous

Metal Plant” (Environmental Research [1978]: 290–308) reported data on blood

lead concentrations for two different samples of children. The first sample was drawn

from a population residing within 1 km of a lead smelter, whereas those in the second

sample were selected from a rural area much farther from the smelter. It was the parents of the children, rather than the investigators, who determined whether the children would be in the close-to-smelter group or the far-from-smelter group. As a second example, a letter in the Journal of the American Medical Association (May 19,

1978) reported on a comparison of doctors’ longevity after medical school graduation

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



11.1



Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples



527



for those with an academic affiliation and those in private practice. (The letter writer’s

stated objective was to see whether “publish or perish” really meant “publish and perish.”) Here again, an investigator did not start out with a group of doctors, assigning

some to academic and others to nonacademic careers. The doctors themselves selected

their groups.

The difficulty with drawing conclusions based on an observational study is that

a statistically significant difference may be due to some underlying factors that have

not been controlled rather than to conditions that define the groups. Does the type

of medical practice itself have an effect on longevity, or is the observed difference in

lifetimes caused by other factors, which themselves led graduates to choose academic

or nonacademic careers? Similarly, is the observed difference in blood lead concentration levels due to proximity to the smelter? Perhaps other physical and socioeconomic

factors are related both to choice of living area and to concentration.

In general, rejection of H0: m1 2 m2 ϭ 0 in favor of Ha: m1 Ϫ m2 Ͼ 0 suggests

that, on average, higher values of the variable are associated with individuals in the first

population or receiving the first treatment than with those in the second population

or receiving the second treatment. But association does not imply causation. Strong

statistical evidence for a causal relationship can be built up over time through many

different comparative studies that point to the same conclusions (as in the many investigations linking smoking to lung cancer). A randomized controlled experiment,

in which investigators assign subjects at random to the treatments or conditions being

compared, is particularly effective in suggesting causality. With such random assignment, the investigator and other interested parties can have more confidence in the

conclusion that an observed difference is caused by the difference in treatments or

conditions.



A Confidence Interval

A confidence interval for m1 Ϫ m2 is easily obtained from the basic t variable of this

section. Both the derivation of and the formula for the interval are similar to those of

the one-sample t interval discussed in Chapter 9.



The Two-Sample t Confidence Interval for the

Difference Between Two Population or Treatment Means

The general formula for a confidence interval for m1 2 m2 when

1. the two samples are independently chosen random samples, and

2. the sample sizes are both large (generally n1 $ 30 and n2 $ 30

or

the population distributions are approximately normal

is

1 x1 2 x2 2 6 1t critical value2



s 22

s 21

1

n2

Å n1



The t critical value is based on

df 5



1 V 1 1 V2 2 2

s 21

s 22

 where

V

5

and

V

5

1

2

n1

n2

V 21

V 22

1

n1 2 1

n2 2 1

(continued)



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

1: Inferences Concerning the Difference Between Two Population or Treatment Means Using Independent Samples

Tải bản đầy đủ ngay(0 tr)

×