Tải bản đầy đủ - 0 (trang)
1 Inferences About the Difference Between Two Population Means: σ[sub(1)] and σ[sub(2)] Known

1 Inferences About the Difference Between Two Population Means: σ[sub(1)] and σ[sub(2)] Known

Tải bản đầy đủ - 0trang

10.1



Inferences About the Difference Between Two Population Means: σ1 and σ2 Known



409



The point estimator of the difference between the two population means is the difference

between the two sample means.

POINT ESTIMATOR OF THE DIFFERENCE BETWEEN TWO POPULATION MEANS



x¯1 Ϫ x¯ 2



The standard error of

x¯1 Ϫ x¯2 is the standard

deviation of the sampling

distribution of x¯1 Ϫ x¯2.



(10.1)



Figure 10.1 provides an overview of the process used to estimate the difference between

two population means based on two independent simple random samples.

As with other point estimators, the point estimator x¯1 Ϫ x¯ 2 has a standard error that describes the variation in the sampling distribution of the estimator. With two independent

simple random samples, the standard error of x¯1 Ϫ x¯ 2 is as follows:



STANDARD ERROR OF x¯1 Ϫ x¯2



σx¯1Ϫx¯2 ϭ



ͱ



σ 21

σ 22

ϩ

n1

n2



(10.2)



If both populations have a normal distribution, or if the sample sizes are large enough that

the central limit theorem enables us to conclude that the sampling distributions of x¯1 and

x¯ 2 can be approximated by a normal distribution, the sampling distribution of x¯1 Ϫ x¯ 2 will

have a normal distribution with mean given by μ1 Ϫ μ 2.

As we showed in Chapter 8, an interval estimate is given by a point estimate Ϯ a margin of error. In the case of estimation of the difference between two population means, an

interval estimate will take the following form:

x¯1 Ϫ x¯ 2 Ϯ Margin of error

FIGURE 10.1



ESTIMATING THE DIFFERENCE BETWEEN TWO POPULATION MEANS



Population 1

Inner-City Store Customers



Population 2

Suburban Store Customers



␮1 = mean age of inner-city

store customers



␮2 = mean age of suburban

store customers



␮1 – ␮ 2 = difference between the mean ages

Two Independent Simple Random Samples

Simple random sample of

n1 inner-city customers



Simple random sample of

n 2 suburban customers



x1 = sample mean age for the

inner-city store customers



x 2 = sample mean age for the

suburban store customers



x1 – x2 = Point estimator of ␮1 – ␮2



410



Chapter 10



Inference About Means and Proportions with Two Populations



With the sampling distribution of x¯1 Ϫ x¯ 2 having a normal distribution, we can write the

margin of error as follows:

The margin of error is

given by multiplying the

standard error by zα/2.



Margin of error ϭ zα/2 σx¯1Ϫx¯2 ϭ zα/2



ͱ



σ 21

σ 22

ϩ

n1

n2



(10.3)



Thus the interval estimate of the difference between two population means is as follows:

INTERVAL ESTIMATE OF THE DIFFERENCE BETWEEN TWO POPULATION

MEANS: σ1 AND σ2 KNOWN



x¯1 Ϫ x¯ 2 Ϯ zα/2



ͱ



σ 21

σ 22

ϩ

n1

n2



(10.4)



where 1 Ϫ α is the confidence coefficient.

Let us return to the Greystone example. Based on data from previous customer demographic studies, the two population standard deviations are known with σ1 ϭ 9 years and

σ2 ϭ 10 years. The data collected from the two independent simple random samples of

Greystone customers provided the following results.



Sample Size

Sample Mean



Inner City Store



Suburban Store



n1 ϭ 36

x¯1 ϭ 40 years



n 2 ϭ 49

x¯ 2 ϭ 35 years



Using expression (10.1), we find that the point estimate of the difference between the mean

ages of the two populations is x¯1 Ϫ x¯ 2 ϭ 40 Ϫ 35 ϭ 5 years. Thus, we estimate that the customers at the inner-city store have a mean age five years greater than the mean age of the suburban store customers. We can now use expression (10.4) to compute the margin of error and

provide the interval estimate of μ1 Ϫ μ 2. Using 95% confidence and zα/2 ϭ z.025 ϭ 1.96, we have

x¯1 Ϫ x¯ 2 Ϯ zα/2



ͱ

ͱ



40 Ϫ 35 Ϯ 1.96



σ 21

σ 22

ϩ

n1

n2

102

92

ϩ

36

49



5 Ϯ 4.06

Thus, the margin of error is 4.06 years and the 95% confidence interval estimate of the

difference between the two population means is 5 Ϫ 4.06 ϭ .94 years to 5 ϩ 4.06 ϭ

9.06 years.



Hypothesis Tests About μ1 ؊ μ2

Let us consider hypothesis tests about the difference between two population means. Using

D0 to denote the hypothesized difference between μ1 and μ 2 , the three forms for a hypothesis test are as follows:

H0: μ1 Ϫ μ 2 Ն D0

Ha: μ1 Ϫ μ 2 Ͻ D0



H0: μ1 Ϫ μ 2 Յ D0

Ha: μ1 Ϫ μ 2 Ͼ D0



H0: μ1 Ϫ μ 2 ϭ D0

Ha: μ1 Ϫ μ 2 D0



10.1



Inferences About the Difference Between Two Population Means: σ1 and σ2 Known



411



In many applications, D0 ϭ 0. Using the two-tailed test as an example, when D0 ϭ 0 the

null hypothesis is H0: μ1 Ϫ μ 2 ϭ 0. In this case, the null hypothesis is that μ1 and μ 2 are

equal. Rejection of H0 leads to the conclusion that Ha: μ1 Ϫ μ 2 0 is true; that is, μ1 and

μ 2 are not equal.

The steps for conducting hypothesis tests presented in Chapter 9 are applicable here.

We must choose a level of significance, compute the value of the test statistic and find the

p-value to determine whether the null hypothesis should be rejected. With two independent

simple random samples, we showed that the point estimator x¯1 Ϫ x¯ 2 has a standard error

σx¯1Ϫx¯2 given by expression (10.2) and, when the sample sizes are large enough, the distribution of x¯1 Ϫ x¯ 2 can be described by a normal distribution. In this case, the test statistic for the difference between two population means when σ1 and σ2 are known is as

follows.



TEST STATISTIC FOR HYPOTHESIS TESTS ABOUT μ1 Ϫ μ 2: σ1 AND σ2 KNOWN







(x¯1 Ϫ x¯ 2 ) Ϫ D0



ͱ



σ 21

σ 22

ϩ

n1

n2



(10.5)



Let us demonstrate the use of this test statistic in the following hypothesis testing example.

As part of a study to evaluate differences in education quality between two training centers, a standardized examination is given to individuals who are trained at the centers. The

difference between the mean examination scores is used to assess quality differences

between the centers. The population means for the two centers are as follows.

μ1 ϭ the mean examination score for the population

of individuals trained at center A

μ 2 ϭ the mean examination score for the population

of individuals trained at center B

We begin with the tentative assumption that no difference exists between the training

quality provided at the two centers. Hence, in terms of the mean examination scores, the

null hypothesis is that μ1 Ϫ μ 2 ϭ 0. If sample evidence leads to the rejection of this hypothesis, we will conclude that the mean examination scores differ for the two populations.

This conclusion indicates a quality differential between the two centers and suggests that a

follow-up study investigating the reason for the differential may be warranted. The null and

alternative hypotheses for this two-tailed test are written as follows.

H0: μ1 Ϫ μ 2 ϭ 0

Ha: μ1 Ϫ μ 2 0



WEB



file



ExamScores



The standardized examination given previously in a variety of settings always resulted in

an examination score standard deviation near 10 points. Thus, we will use this information

to assume that the population standard deviations are known with σ1 ϭ 10 and σ2 ϭ 10. An

α ϭ .05 level of significance is specified for the study.

Independent simple random samples of n1 ϭ 30 individuals from training center A and

n2 ϭ 40 individuals from training center B are taken. The respective sample means are

x¯1 ϭ 82 and x¯ 2 ϭ 78. Do these data suggest a significant difference between the population



412



Chapter 10



Inference About Means and Proportions with Two Populations



means at the two training centers? To help answer this question, we compute the test statistic using equation (10.5).





(x¯1 Ϫ x¯ 2 ) Ϫ D0



ͱ



σ 21

n1



ϩ



σ 22

n2



ϭ



(82 Ϫ 78) Ϫ 0



ͱ



102

102

ϩ

30

40



ϭ 1.66



Next let us compute the p-value for this two-tailed test. Because the test statistic z is in

the upper tail, we first compute the area under the curve to the right of z ϭ 1.66. Using the

standard normal distribution table, the area to the left of z ϭ 1.66 is .9515. Thus, the area

in the upper tail of the distribution is 1.0000 Ϫ .9515 ϭ .0485. Because this test is a twotailed test, we must double the tail area: p-value ϭ 2(.0485) ϭ .0970. Following the usual

rule to reject H0 if p-value Յ α, we see that the p-value of .0970 does not allow us to reject

H0 at the .05 level of significance. The sample results do not provide sufficient evidence to

conclude the training centers differ in quality.

In this chapter we will use the p-value approach to hypothesis testing as described in

Chapter 9. However, if you prefer, the test statistic and the critical value rejection rule may

be used. With α ϭ .05 and zα/2 ϭ z.025 ϭ 1.96, the rejection rule employing the critical value

approach would be reject H0 if z Յ Ϫ1.96 or if z Ն 1.96. With z ϭ 1.66, we reach the same

do not reject H0 conclusion.

In the preceding example, we demonstrated a two-tailed hypothesis test about the difference between two population means. Lower tail and upper tail tests can also be considered. These tests use the same test statistic as given in equation (10.5). The procedure for

computing the p-value and the rejection rules for these one-tailed tests are the same as those

presented in Chapter 9.



Practical Advice

In most applications of the interval estimation and hypothesis testing procedures presented

in this section, random samples with n1 Ն 30 and n 2 Ն 30 are adequate. In cases where either or both sample sizes are less than 30, the distributions of the populations become important considerations. In general, with smaller sample sizes, it is more important for the

analyst to be satisfied that it is reasonable to assume that the distributions of the two populations are at least approximately normal.



Exercises



Methods



SELF test



1. The following results come from two independent random samples taken of two

populations.



a.

b.

c.



Sample 1



Sample 2



n1 ϭ 50

x¯1 ϭ 13.6

σ1 ϭ 2.2



n2 ϭ 35

x¯2 ϭ 11.6

σ2 ϭ 3.0



What is the point estimate of the difference between the two population means?

Provide a 90% confidence interval for the difference between the two population means.

Provide a 95% confidence interval for the difference between the two population means.



10.1



SELF test



Inferences About the Difference Between Two Population Means: σ1 and σ2 Known



413



2. Consider the following hypothesis test.

H 0: μ 1 Ϫ μ 2 Յ 0

H a: μ 1 Ϫ μ 2 Ͼ 0

The following results are for two independent samples taken from the two populations.



a.

b.

c.



Sample 1



Sample 2



n1 ϭ 40

x¯1 ϭ 25.2

σ1 ϭ 5.2



n2 ϭ 50

x¯2 ϭ 22.8

σ2 ϭ 6.0



What is the value of the test statistic?

What is the p-value?

With α ϭ .05, what is your hypothesis testing conclusion?



3. Consider the following hypothesis test.

H0: μ1 Ϫ μ 2 ϭ 0

Ha: μ1 Ϫ μ 2 0

The following results are for two independent samples taken from the two populations.



a.

b.

c.



Sample 1



Sample 2



n1 ϭ 80

x¯1 ϭ 104

σ1 ϭ 8.4



n2 ϭ 70

x¯ 2 ϭ 106

σ2 ϭ 7.6



What is the value of the test statistic?

What is the p-value?

With α ϭ .05, what is your hypothesis testing conclusion?



Applications



SELF test



4. Condé Nast Traveler conducts an annual survey in which readers rate their favorite cruise

ship. All ships are rated on a 100-point scale, with higher values indicating better service.

A sample of 37 ships that carry fewer than 500 passengers resulted in an average rating of

85.36, and a sample of 44 ships that carry 500 or more passengers provided an average rating of 81.40 (Condé Nast Traveler, February 2008). Assume that the population standard

deviation is 4.55 for ships that carry fewer than 500 passengers and 3.97 for ships that carry

500 or more passengers.

a. What is the point estimate of the difference between the population mean rating for

ships that carry fewer than 500 passengers and the population mean rating for ships

that carry 500 or more passengers?

b. At 95% confidence, what is the margin of error?

c. What is a 95% confidence interval estimate of the difference between the population

mean ratings for the two sizes of ships?

5. The average expenditure on Valentine’s Day was expected to be $100.89 (USA Today,

February 13, 2006). Do male and female consumers differ in the amounts they spend?

The average expenditure in a sample survey of 40 male consumers was $135.67, and the

average expenditure in a sample survey of 30 female consumers was $68.64. Based on past

surveys, the standard deviation for male consumers is assumed to be $35, and the standard

deviation for female consumers is assumed to be $20.



414



Chapter 10



a.

b.

c.



WEB



file

Hotel



Inference About Means and Proportions with Two Populations



What is the point estimate of the difference between the population mean expenditure

for males and the population mean expenditure for females?

At 99% confidence, what is the margin of error?

Develop a 99% confidence interval for the difference between the two population means.



6. Suppose that you are responsible for making arrangements for a business convention. Because of budget cuts due to the recent recession, you have been charged with choosing a

city for the convention that has the least expensive hotel rooms. You have narrowed your

choices to Atlanta and Houston. The file named Hotel contains samples of prices for rooms

in Atlanta and Houston that are consistent with the results reported by Smith Travel Research (SmartMoney, March 2009). Because considerable historical data on the prices of

rooms in both cities are available, the population standard deviations for the prices can be

assumed to be $20 in Atlanta and $25 in Houston. Based on the sample data, can you conclude that the mean price of a hotel room in Atlanta is lower than one in Houston?

7. During the 2003 season, Major League Baseball took steps to speed up the play of baseball games in order to maintain fan interest (CNN Headline News, September 30, 2003).

The following results come from a sample of 60 games played during the summer of 2002

and a sample of 50 games played during the summer of 2003. The sample mean shows the

mean duration of the games included in each sample.



a.



b.

c.



d.

e.



8.



2002 Season



2003 Season



n1 ϭ 60

x¯1 ϭ 2 hours, 52 minutes



n 2 ϭ 50

x¯ 2 ϭ 2 hours, 46 minutes



A research hypothesis was that the steps taken during the 2003 season would reduce

the population mean duration of baseball games. Formulate the null and alternative hypotheses.

What is the point estimate of the reduction in the mean duration of games during the

2003 season?

Historical data indicate a population standard deviation of 12 minutes is a reasonable

assumption for both years. Conduct the hypothesis test and report the p-value. At a .05

level of significance, what is your conclusion?

Provide a 95% confidence interval estimate of the reduction in the mean duration of

games during the 2003 season.

What was the percentage reduction in the mean time of baseball games during the 2003

season? Should management be pleased with the results of the statistical analysis? Discuss. Should the length of baseball games continue to be an issue in future years? Explain.



Will improving customer service result in higher stock prices for the companies providing

the better service? “When a company’s satisfaction score has improved over the prior

year’s results and is above the national average (currently 75.7), studies show its shares

have a good chance of outperforming the broad stock market in the long run” (BusinessWeek, March 2, 2009). The following satisfaction scores of three companies for the 4th

quarters of 2007 and 2008 were obtained from the American Customer Satisfaction Index.

Assume that the scores are based on a poll of 60 customers from each company. Because

the polling has been done for several years, the standard deviation can be assumed to equal

6 points in each case.



Company



2007 Score



2008 Score



Rite Aid

Expedia

J.C. Penney



73

75

77



76

77

78



10.2



Inferences About the Difference Between Two Population Means: σ1 and σ2 Unknown



a.

b.

c.

d.



e.



10.2



415



For Rite Aid, is the increase in the satisfaction score from 2007 to 2008 statistically

significant? Use α ϭ .05. What can you conclude?

Can you conclude that the 2008 score for Rite Aid is above the national average of

75.7? Use α ϭ .05.

For Expedia, is the increase from 2007 to 2008 statistically significant? Use α ϭ .05.

When conducting a hypothesis test with the values given for the standard deviation,

sample size, and α, how large must the increase from 2007 to 2008 be for it to be statistically significant?

Use the result of part (d) to state whether the increase for J.C. Penney from 2007 to

2008 is statistically significant.



Inferences About the Difference Between Two

Population Means: σ1 and σ2 Unknown

In this section we extend the discussion of inferences about the difference between two

population means to the case when the two population standard deviations, σ1 and σ2 , are

unknown. In this case, we will use the sample standard deviations, s1 and s2 , to estimate the

unknown population standard deviations. When we use the sample standard deviations, the

interval estimation and hypothesis testing procedures will be based on the t distribution

rather than the standard normal distribution.



Interval Estimation of μ1 ؊ μ2

In the following example we show how to compute a margin of error and develop an interval estimate of the difference between two population means when σ1 and σ2 are unknown.

Clearwater National Bank is conducting a study designed to identify differences between

checking account practices by customers at two of its branch banks. A simple random

sample of 28 checking accounts is selected from the Cherry Grove Branch and an independent simple random sample of 22 checking accounts is selected from the Beechmont

Branch. The current checking account balance is recorded for each of the checking accounts. A summary of the account balances follows:



WEB



file



CheckAcct



Sample Size

Sample Mean

Sample Standard Deviation



Cherry Grove



Beechmont



n1 ϭ 28

x¯1 ϭ $1025

s1 ϭ $150



n 2 ϭ 22

x¯ 2 ϭ $910

s2 ϭ $125



Clearwater National Bank would like to estimate the difference between the mean

checking account balance maintained by the population of Cherry Grove customers and the

population of Beechmont customers. Let us develop the margin of error and an interval estimate of the difference between these two population means.

In Section 10.1, we provided the following interval estimate for the case when the

population standard deviations, σ1 and σ2 , are known.



ͱ



x¯1 Ϫ x¯ 2 Ϯ zα/2



σ 21 σ 22

n1 ϩ n2



416



Chapter 10



When σ1 and σ2 are

estimated by s1 and s2 , the

t distribution is used to

make inferences about the

difference between two

population means.



With σ1 and σ2 unknown, we will use the sample standard deviations s1 and s2 to estimate

σ1 and σ2 and replace zα/2 with tα/2. As a result, the interval estimate of the difference between two population means is given by the following expression:



Inference About Means and Proportions with Two Populations



INTERVAL ESTIMATE OF THE DIFFERENCE BETWEEN TWO POPULATION

MEANS: σ1 AND σ2 UNKNOWN



ͱ



s2

s2

x¯1 Ϫ x¯ 2 Ϯ tα/2 n1 ϩ n2

1

2



(10.6)



where 1 Ϫ α is the confidence coefficient.

In this expression, the use of the t distribution is an approximation, but it provides excellent

results and is relatively easy to use. The only difficulty that we encounter in using expression

(10.6) is determining the appropriate degrees of freedom for tα/2. Statistical software packages

compute the appropriate degrees of freedom automatically. The formula used is as follows:

DEGREES OF FREEDOM: t DISTRIBUTION WITH TWO INDEPENDENT RANDOM

SAMPLES



df ϭ



΂



s 21

s2

ϩ 2

n1

n2



2



΃



s 21 2



1

1

s 22

ϩ

n1 Ϫ 1 n1

n2 Ϫ 1 n2



΂ ΃



(10.7)



2



΂ ΃



Let us return to the Clearwater National Bank example and show how to use expression

(10.6) to provide a 95% confidence interval estimate of the difference between the population

mean checking account balances at the two branch banks. The sample data show n1 ϭ 28, x¯1 ϭ

$1025, and s1 ϭ $150 for the Cherry Grove branch, and n 2 ϭ 22, x¯ 2 ϭ $910, and s2 ϭ $125

for the Beechmont branch. The calculation for degrees of freedom for tα/2 is as follows:



df ϭ

n1



s 21



s 22



1

2 2

1



2



1



2



2



150 2



1252



2



΂n ϩ n ΃

΂ 28 ϩ 22 ΃

ϭ

ϭ 47.8

1

s

s

1

150

125

1

1

ϩ

ϩ

Ϫ 1 ΂n ΃

n Ϫ 1 ΂n ΃

28 Ϫ 1 ΂ 28 ΃

22 Ϫ 1 ΂ 22 ΃

2 2

2



2 2



2 2



2



We round the noninteger degrees of freedom down to 47 to provide a larger t-value and a

more conservative interval estimate. Using the t distribution table with 47 degrees of freedom, we find t.025 ϭ 2.012. Using expression (10.6), we develop the 95% confidence interval estimate of the difference between the two population means as follows.



ͱ



s2

s2

x¯1 Ϫ x¯ 2 Ϯ t.025 n1 ϩ n2

1

2

1025 Ϫ 910 Ϯ 2.012



ͱ



150 2

1252

ϩ

28

22



115 Ϯ 78

The point estimate of the difference between the population mean checking account balances

at the two branches is $115. The margin of error is $78, and the 95% confidence interval



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

1 Inferences About the Difference Between Two Population Means: σ[sub(1)] and σ[sub(2)] Known

Tải bản đầy đủ ngay(0 tr)

×