Tải bản đầy đủ - 0 (trang)
4: Hypothesis Tests for a Population Mean

# 4: Hypothesis Tests for a Population Mean

Tải bản đầy đủ - 0trang

10.4 Hypothesis Tests for a Population Mean

483

Because it is rarely the case that s, the population standard deviation, is known, we

focus our attention on the test procedure for the case in which s is unknown.

When testing a hypothesis about a population mean, the null hypothesis specifies

a particular hypothesized value for m, specifically, H0: m ϭ hypothesized value. The

alternative hypothesis has one of the following three forms, depending on the research

Ha: m Ͼ hypothesized value

Ha: m Ͻ hypothesized value

Ha: m ϶ hypothesized value

If n is large or if the population distribution is approximately normal, the test

statistic

t5

x 2 hypothesized value

s

!n

can be used. For example, if the null hypothesis to be tested is H0: m ϭ 100, the test

statistic becomes

t5

x 2 100

s

!n

Consider the alternative hypothesis Ha: m Ͼ 100, and suppose that a sample of

size n ϭ 24 gives xϭ 104.20 and s ϭ 8.23. The resulting test statistic value is

t5

104.20 2 100

4.20

5

5 2.50

8.23

1.6799

"24

Because this is an upper-tailed test, if the test statistic had been z rather than t, the

P-value would be the area under the z curve to the right of 2.50. With a t statistic,

the P-value is the area under an appropriate t curve (here with df ϭ 24 Ϫ 1 ϭ 23) to

the right of 2.50. Appendix Table 4 is a tabulation of t curve tail areas. Each column

of the table is for a different number of degrees of freedom: 1, 2, 3, . . . , 30, 35, 40,

60, 120, and a last column for df ϭ ϱ, which is the same as for the z curve. The table

gives the area under each t curve to the right of values ranging from 0.0 to 4.0 in

increments of 0.1. Part of this table appears in Figure 10.3. For example,

area under the 23-df t curve to the right of 2.5 5 .010

5 P-value for an upper-tailed t test

Suppose that t ϭ Ϫ2.7 for a lower-tailed test based on 23 df. Then, because each

t curve is symmetric about 0,

P-value ϭ area to the left of Ϫ2.7 ϭ area to the right of 2.7 ϭ .006

As is the case for z tests, we double the tail area to obtain the P-value for two-tailed t

tests. Thus, if t ϭ 2.6 or if t ϭ Ϫ2.6 for a two-tailed t test with 23 df, then

P-value ϭ 2(.008) ϭ .016

Once past 30 df, the tail areas change very little, so the last column (ϱ) in Appendix

Table 4 provides a good approximation.

The following two boxes show how the P-value is obtained as a t curve area and

give a general description of the test procedure.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Chapter 10

Hypothesis Testing Using a Single Sample

...

22

...

...

...

2.5

...

.010

.010

.010

...

2.6

...

.008

.008

.008

...

2.7

...

.007

.006

.006

...

2.8

..

.

...

.005

.005

.005

...

...

...

df

t

0.0

1

2

0.1

..

.

23

24

...

60

120

...

484

4.0

FIGURE 10.3

f

Area under 23-d 7

2.

of

ht

rig

to

e

t curv

Part of Appendix Table 4: t curve tail

areas.

Finding P-Values for a t Test

1. Upper-tailed test:

t curve for n − 1 df

Ha: m Ͼ hypothesized value

P-value = area in upper tail

0

Calculated t

2. Lower-tailed test:

Ha: m Ͻ hypothesized value

t curve for n − 1 df

P-value = area in lower tail

0

Calculated t

3. Two-tailed test:

P-value = sum of area in two tails

t curve for n − 1 df

Ha: m ϶ hypothesized value

0

Calculated t, −t

Appendix Table 4 gives upper-tail t curve areas to the right of values 0.0, 0.1, . . . , 4.0. These areas are P-values for

upper-tailed tests and, by symmetry, also for lower-tailed tests. Doubling an area gives the P-value for a two-tailed

test.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.4 Hypothesis Tests for a Population Mean

485

The One-Sample t Test for a Population Mean

Null hypothesis: H0: m ϭ hypothesized value

Test statistic: t 5

x 2 hypothesized value

s

"n

Alternative Hypothesis:

Ha: m Ͼ hypothesized value

Ha: m Ͻ hypothesized value

Ha: m ϶ hypothesized value

P-Value:

Area to the right of calculated t under t curve with df ϭ n Ϫ 1

Area to the left of calculated t under t curve with df ϭ n Ϫ 1

(1) 2(area to the right of t) if t is positive, or

(2) 2(area to the left of t) if t is negative

Assumptions: 1. x and s are the sample mean and sample standard deviation from a random sample.

2. The sample size is large (generally n Ն 30) or the population distribution is at least approximately

normal.

EXAMPLE 10.13

Time Stands Still (or So It Seems)

A study conducted by researchers at Pennsylvania State University investigated

whether time perception, an indication of a person’s ability to concentrate, is impaired

during nicotine withdrawal. The study results were presented in the paper “Smoking

Abstinence Impairs Time Estimation Accuracy in Cigarette Smokers” (Psychopharmacology Bulletin [2003]: 90–95). After a 24-hour smoking abstinence, 20 smokers

were asked to estimate how much time had passed during a 45-second period. Suppose

the resulting data on perceived elapsed time (in seconds) were as follows (these data are

artificial but are consistent with summary quantities given in the paper):

69

56

65

50

72

70

73

47

59

56

55

45

39

70

52

64

67

67

57

53

From these data, we obtain

n ϭ 20

x ϭ 59.30

s ϭ 9.84

The researchers wanted to determine whether smoking abstinence had a negative

impact on time perception, causing elapsed time to be overestimated. With m representing the mean perceived elapsed time for smokers who have abstained from smoking for 24 hours, we can answer this question by testing

H0: m ϭ 45 (no consistent tendency to overestimate the time elapsed)

versus

Ha: m Ͼ 45 (tendency for elapsed time to be overestimated)

The null hypothesis is rejected only if there is convincing evidence that m Ͼ 45. The

observed value, 59.30, is certainly larger than 45, but can a sample mean as large

as this be plausibly explained by chance variation from one sample to another when

m ϭ 45? To answer this question, we carry out a hypothesis test with a significance

level of .05 using the nine-step procedure described in Section 10.3.

1. Population characteristic of interest:

Data set available online

m ϭ mean perceived elapsed time for smokers who have abstained from smoking

for 24 hours

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

486

Chapter 10 Hypothesis Testing Using a Single Sample

2. Null hypothesis: H0: m ϭ 45

3. Alternative hypothesis: Ha: m Ͼ 45

4. Significance level: a ϭ .05

5. Test statistic: t 5

x 2 hypothesized value

x 2 45

5

s

s

!n

!n

6. Assumptions: This test requires a random sample and either a large sample size or

a normal population distribution. The authors of the paper believed that it was

reasonable to consider this sample as representative of smokers in general, and if this

is the case, it is reasonable to regard it as if it were a random sample. Because the

sample size is only 20, for the t test to be appropriate, we must be willing to assume

that the population distribution of perceived elapsed times is at least approximately

normal. Is this reasonable? The following graph gives a boxplot of the data:

40

50

60

Perceived elapsed time

70

Although the boxplot is not perfectly symmetric, it does not appear to be too

skewed and there are no outliers, so we judge the use of the t test to be reasonable.

7. Computations: n ϭ 20, x ϭ 59.30, and s ϭ 9.84, so

t5

59.30 2 45

14.30

5

5 6.50

9.84

2.20

"20

8. P-value: This is an upper-tailed test (the inequality in Ha is “greater than”), so

the P-value is the area to the right of the computed t value. Because df ϭ 20 Ϫ

1 ϭ 19, we can use the df ϭ 19 column of Appendix Table 4 to find the P-value.

With t ϭ 6.50, we obtain P-value ϭ area to the right of 6.50 Ϸ 0 (because 6.50 is

greater than 4.0, the largest tabulated value).

9. Conclusion: Because P-value Յ a, we reject H0 at the .05 level of significance.

There is virtually no chance of seeing a sample mean (and hence a t value) this

extreme as a result of just chance variation when H0 is true. There is convincing

evidence that the mean perceived time elapsed is greater than the actual time

elapsed of 45 seconds.

This paper also looked at perception of elapsed time for a sample of nonsmokers and

for a sample of smokers who had not abstained from smoking. The investigators found

that the null hypothesis of m ϭ 45 could not be rejected for either of these groups.

EXAMPLE 10.14

Step-by-Step technology

instructions available online

Data set available online

Goofing Off at Work

A growing concern of employers is time spent in activities like surfing the Internet

and e-mailing friends during work hours. The San Luis Obispo Tribune summarized

the findings from a survey of a large sample of workers in an article that ran under the

headline “Who Goofs Off 2 Hours a Day? Most Workers, Survey Says” (August 3,

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.4 Hypothesis Tests for a Population Mean

487

2006). Suppose that the CEO of a large company wants to determine whether the

average amount of wasted time during an 8-hour work day for employees of her company is less than the reported 120 minutes. Each person in a random sample of 10

employees was contacted and asked about daily wasted time at work. (Participants

would probably have to be guaranteed anonymity to obtain truthful responses!) The

resulting data are the following:

108

112

117

130

111

131

113

113

105

128

Summary quantities are n ϭ 10, x ϭ 116.80, and s ϭ 9.45.

Do these data provide evidence that the mean wasted time for this company is

less than 120 minutes? To answer this question, let’s carry out a hypothesis test with

a ϭ .05.

1.

2.

3.

4.

m ϭ mean daily wasted time for employees of this company

H0: m ϭ 120

Ha: m Ͻ 120

a ϭ .05

5. t 5

x 2 hypothesized value

x 2 120

5

s

s

"n

"n

6. This test requires a random sample and either a large sample or a normal population distribution. The given sample was a random sample of employees. Because

the sample size is small, we must be willing to assume that the population distribution of times is at least approximately normal. The accompanying normal probability plot appears to be reasonably straight, and although the normal probability

plot and the boxplot reveal some skewness in the sample, there are no outliers.

105

110

115

120

Wasted time

125

130

2

Normal score

1

0

−1

−2

105

110

115

120

Wasted time

125

130

Correlations (Pearson)

Correlation of Time and Normal Score ϭ 0.943

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

488

Chapter 10 Hypothesis Testing Using a Single Sample

Also, the correlation between the expected normal scores and the observed data

for this sample is .943, which is well above the critical r value for n ϭ 10 of .880 (see

Chapter 5 for critical r values). Based on these observations, it is plausible that the

population distribution is approximately normal, so we proceed with the t test.

7. Test statistic: t 5

116.80 2 120

5 21.07

9.45

!10

8. From the df ϭ 9 column of Appendix Table 4 and by rounding the test statistic

value to Ϫ1.1, we get

P-value ϭ area to the left of Ϫ1.1 ϭ area to the right of 1.1 ϭ .150

as shown:

t curve with

df = 9

.150

1.1

0

9. Because the P-value Ͼ a, we fail to reject H0. There is not sufficient evidence to

conclude that the mean wasted time per 8-hour work day for employees at this

company is less than 120 minutes.

Minitab could also have been used to carry out the test, as shown in the output

below.

One-Sample T: Wasted Time

Test of mu = 120 vs < 120

Variable

Wasted Time

N

10

Mean

116.800

StDev

9.449

SE Mean

2.988

95%

Upper

Bound

122.278

T

–1.07

P

0.156

Although we had to round the computed t value to Ϫ1.1 to use Appendix Table 4,

Minitab was able to compute the P-value corresponding to the actual value of the test

statistic, which was P-value ϭ 0.156.

EXAMPLE 10.15

Cricket Love

The article “Well-Fed Crickets Bowl Maidens Over” (Nature Science Update, February 11, 1999) reported that female field crickets are attracted to males that have high

chirp rates and hypothesized that chirp rate is related to nutritional status. The usual

chirp rate for male field crickets was reported to vary around a mean of 60 chirps per

second. To investigate whether chirp rate was related to nutritional status, investigators fed male crickets a high protein diet for 8 days, after which chirp rate was measured. The mean chirp rate for the crickets on the high protein diet was reported to

be 109 chirps per second. Is this convincing evidence that the mean chirp rate for

crickets on a high protein diet is greater than 60 (which would then imply an advantage in attracting the ladies)? Suppose that the sample size and sample standard deviation are n ϭ 32 and s ϭ 40. Let's test the relevant hypotheses with a ϭ .01.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.4 Hypothesis Tests for a Population Mean

1.

2.

3.

4.

489

m ϭ mean chirp rate for crickets on a high protein diet

H0: m ϭ 60

Ha: m Ͼ 60

a ϭ .01

5. t 5

x 2 hypothesized value

x 2 60

5

s

s

!n

!n

6. This test requires a random sample and either a large sample or a normal population distribution. Because the sample size is large (n ϭ 32), it is reasonable to

proceed with the t test as long as we are willing to consider the 32 male field

crickets in this study as if they were a random sample from the population of

male field crickets.

7. Test statistic: t 5

49

109 2 60

5 6.93

5

40

7.07

!32

8. This is an upper-tailed test, so the P-value is the area under the t curve with

df ϭ 31 and to the right of 6.93. From Appendix Table 4, P-value Ϸ 0.

9. Because P-value Ϸ 0, which is less than the significance level, a, we reject H0.

There is convincing evidence that the mean chirp rate is higher for male field

crickets that eat a high protein diet.

Statistical Versus Practical Significance

Carrying out a hypothesis test amounts to deciding whether the value obtained

for the test statistic could plausibly have resulted when H0 is true. When the

value of the test statistic leads to rejection of H0, it is customary to say that the

result is statistically significant at the chosen significance level ␣. The finding of

statistical significance means that, in the investigator’s opinion, the observed deviation from what was expected under H0 cannot reasonably be attributed to only

chance variation. However, statistical significance is not the same as concluding

that the true situation differs from what the null hypothesis states in any practical

sense. That is, even after H0 has been rejected, the data may suggest that there is

no practical difference between the actual value of the population characteristic

and what the null hypothesis states that value to be. This is illustrated in Example

10.16.

EXAMPLE 10.16

“Significant” but Unimpressive

Test Score Improvement

Let m denote the average score on a standardized test for all children in a certain region of the United States. The average score for all children in the United States

is 100. Regional education authorities are interested in testing H0: m ϭ 100 versus

Ha: m Ͼ 100 using a significance level of .001. A sample of 2500 children resulted in

the values n ϭ 2500, x ϭ 101.0, and s ϭ 15.0. Then

t5

101.0 2 100

5 3.3

15

!2500

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

490

Chapter 10 Hypothesis Testing Using a Single Sample

This is an upper-tailed test, so (using the z column of Appendix Table 4 because df ϭ

2499) P-value ϭ area to the right of 3.33 Ϸ .000. Because P-value Ͻ .001, we reject

H0. There is evidence that the mean score for this region is greater than 100.

However, with n ϭ 2500, the point estimate x ϭ101.0 is almost surely very close

to the true value of m. Therefore, it looks as though H0 was rejected because m Ϸ 101

rather than 100. And, from a practical point of view, a 1-point difference is most

likely of no practical importance.

EX E RC I S E S 1 0 . 4 2 - 1 0 . 5 8

10.42 Give as much information as you can about the

P-value of a t test in each of the following situations:

a. Upper-tailed test, df ϭ 8, t ϭ 2.0

b. Upper-tailed test, n ϭ 14, t ϭ 3.2

c. Lower-tailed test, df ϭ 10, t ϭ Ϫ2.4

d. Lower-tailed test, n ϭ 22, t ϭ Ϫ4.2

e. Two-tailed test, df ϭ 15, t ϭ Ϫ1.6

f. Two-tailed test, n ϭ 16, t ϭ 1.6

g. Two-tailed test, n ϭ 16, t ϭ 6.3

10.43 Give as much information as you can about the

P-value of a t test in each of the following situations:

a. Two-tailed test, df ϭ 9, t ϭ 0.73

b. Upper-tailed test, df ϭ 10, t ϭ Ϫ0.5

c. Lower-tailed test, n ϭ 20, t ϭ Ϫ2.1

d. Lower-tailed test, n ϭ 20, t ϭ Ϫ5.1

e. Two-tailed test, n ϭ 40, t ϭ 1.7

10.44 Paint used to paint lines on roads must reflect

enough light to be clearly visible at night. Let m denote

the mean reflectometer reading for a new type of paint

under consideration. A test of H0: m ϭ 20 versus

Ha: m Ͼ 20 based on a sample of 15 observations gave

t ϭ 3.2. What conclusion is appropriate at each of the

following significance levels?

a. a ϭ .05

c. a ϭ .001

b. a ϭ .01

10.45 A certain pen has been designed so that true

average writing lifetime under controlled conditions (involving the use of a writing machine) is at least 10 hours.

A random sample of 18 pens is selected, the writing lifetime of each is determined, and a normal probability plot

of the resulting data supports the use of a one-sample

t test. The relevant hypotheses are H0: m ϭ 10 versus

Ha: m Ͻ 10.

a. If t ϭ Ϫ2.3 and a ϭ .05 is selected, what conclusion

is appropriate?

Data set available online

b. If t ϭ Ϫ1.83 and a ϭ .01 is selected, what conclusion is appropriate?

c. If t ϭ 0.47, what conclusion is appropriate?

10.46 The true average diameter of ball bearings of a

certain type is supposed to be 0.5 inch. What conclusion

is appropriate when testing H0: m ϭ 0.5 versus Ha: m ϶

0.5 inch each of the following situations:

a. n ϭ 13, t ϭ 1.6, a ϭ .05

b. n ϭ 13, t ϭ Ϫ1.6, a ϭ .05

c. n ϭ 25, t ϭ Ϫ2.6, a ϭ .01

d. n ϭ 25, t ϭ Ϫ3.6

10.47 The paper “Playing Active Video Games Increases Energy Expenditure in Children” (Pediatrics

[2009]: 534–539) describes an interesting investigation

of the possible cardiovascular benefits of active video

games. Mean heart rate for healthy boys age 10 to 13 after

walking on a treadmill at 2.6 km/hour for 6 minutes is

98 beats per minute (bpm). For each of 14 boys, heart

rate was measured after 15 minutes of playing Wii Bowling. The resulting sample mean and standard deviation

were 101 bpm and 15 bpm, respectively. For purposes of

this exercise, assume that it is reasonable to regard the

sample of boys as representative of boys age 10 to 13 and

that the distribution of heart rates after 15 minutes of Wii

Bowling is approximately normal.

a. Does the sample provide convincing evidence that

the mean heart rate after 15 minutes of Wii Bowling

is different from the known mean heart rate after

6 minutes walking on the treadmill? Carry out a

hypothesis test using a ϭ .01.

b. The known resting mean heart rate for boys in this age

group is 66 bpm. Is there convincing evidence that the

mean heart rate after Wii Bowling for 15 minutes is

higher than the known mean resting heart rate for boys

of this age? Use a ϭ .01.

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.4 Hypothesis Tests for a Population Mean

c. Based on the outcomes of the tests in Parts (a) and

(b), write a paragraph comparing the benefits of

treadmill walking and Wii Bowling in terms of raising heart rate over the resting heart rate.

10.48 A study of fast-food intake is described in the

paper “What People Buy From Fast-Food Restaurants”

(Obesity [2009]: 1369–1374). Adult customers at three

hamburger chains (McDonald’s, Burger King, and

Wendy’s) at lunchtime in New York City were approached as they entered the restaurant and asked to

provide their receipt when exiting. The receipts were

then used to determine what was purchased and the

number of calories consumed was determined. In all,

3857 people participated in the study. The sample mean

number of calories consumed was 857 and the sample

standard deviation was 677.

a. The sample standard deviation is quite large. What

does this tell you about number of calories consumed in a hamburger-chain lunchtime fast-food

purchase in New York City?

b. Given the values of the sample mean and standard

deviation and the fact that the number of calories

consumed can’t be negative, explain why it is not

reasonable to assume that the distribution of calories

consumed is normal.

c. Based on a recommended daily intake of 2000 calories, the online Healthy Dining Finder (www

.healthydiningfinder.com) recommends a target of

750 calories for lunch. Assuming that it is reasonable

to regard the sample of 3857 fast-food purchases as

representative of all hamburger-chain lunchtime

purchases in New York City, carry out a hypothesis

test to determine if the sample provides convincing

evidence that the mean number of calories in a New

York City hamburger-chain lunchtime purchase is

greater than the lunch recommendation of 750 calories. Use a ϭ .01.

d. Would it be reasonable to generalize the conclusion

of the test in Part (c) to the lunchtime fast-food

purchases of all adult Americans? Explain why or

why not.

e. Explain why it is better to use the customer receipt

to determine what was ordered rather than just asking a customer leaving the restaurant what he or she

purchased.

f. Do you think that asking a customer to provide his

or her receipt before they ordered could have introduced a potential bias? Explain.

491

10.49 The report “Highest Paying Jobs for 2009–10

of Colleges and Employers, February 2010) states that

the mean yearly salary offer for students graduating with

a degree in accounting in 2010 is \$48,722. Suppose that

a random sample of 50 accounting graduates at a large

university who received job offers resulted in a mean offer of \$49,850 and a standard deviation of \$3300. Do

the sample data provide strong support for the claim that

the mean salary offer for accounting graduates of this

university is higher than the 2010 national average of

\$48,722? Test the relevant hypotheses using a ϭ .05.

The Economist collects data each year on the

price of a Big Mac in various countries around the world.

The price of a Big Mac for a sample of McDonald’s restaurants in Europe in May 2009 resulted in the following Big Mac prices (after conversion to U.S. dollars):

10.50

3.80

5.89

4.92

3.88

2.65

5.57

6.39

3.24

The mean price of a Big Mac in the U.S. in May 2009

was \$3.57. For purposes of this exercise, assume it is

reasonable to regard the sample as representative of European McDonald’s restaurants. Does the sample provide convincing evidence that the mean May 2009 price

of a Big Mac in Europe is greater than the reported U.S.

price? Test the relevant hypotheses using a ϭ .05.

10.51 A credit bureau analysis of undergraduate students credit records found that the average number of

credit cards in an undergraduate’s wallet was 4.09 (“Undergraduate Students and Credit Cards in 2004,”

Nellie Mae, May 2005). It was also reported that in a

random sample of 132 undergraduates, the sample mean

number of credit cards that the students said they carried

was 2.6. The sample standard deviation was not reported, but for purposes of this exercise, suppose that it

was 1.2. Is there convincing evidence that the mean

number of credit cards that undergraduates report carrying is less than the credit bureau’s figure of 4.09?

Medical research has shown that repeated wrist

extension beyond 20 degrees increases the risk of wrist and

hand injuries. Each of 24 students at Cornell University

used a proposed new computer mouse design, and while

using the mouse, each student’s wrist extension was recorded. Data consistent with summary values given in the

paper “Comparative Study of Two Computer Mouse

10.52

Designs” (Cornell Human Factors Laboratory Technical Report RP7992) are given. Use these data to test the

hypothesis that the mean wrist extension for people using

Data set available online

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Chapter 10 Hypothesis Testing Using a Single Sample

this new mouse design is greater than 20 degrees. Are any

assumptions required in order for it to be appropriate to

generalize the results of your test to the population of Cornell students? To the population of all university students?

27 28 24 26 27 25 25 24 24 24 25 28

22 25 24 28 27 26 31 25 28 27 27 25

10.53 The international polling organization Ipsos reported data from a survey of 2000 randomly selected Canadians who carry debit cards (Canadian Account Habits

Survey, July 24, 2006). Participants in this survey were

asked what they considered the minimum purchase

amount for which it would be acceptable to use a debit

card. Suppose that the sample mean and standard deviation were \$9.15 and \$7.60, respectively. (These values are

consistent with a histogram of the sample data that appears in the report.) Do these data provide convincing evidence that the mean minimum purchase amount for

which Canadians consider the use of a debit card to be

appropriate is less than \$10? Carry out a hypothesis test

with a significance level of .01.

10.54 A comprehensive study conducted by the National Institute of Child Health and Human Development tracked more than 1000 children from an early age

through elementary school (New York Times, November 1, 2005). The study concluded that children who

spent more than 30 hours a week in child care before

entering school tended to score higher in math and reading when they were in the third grade. The researchers

cautioned that the findings should not be a cause for

alarm because the effects of child care were found to be

small. Explain how the difference between the sample

mean math score for third graders who spent long hours

in child care and the known overall mean for third graders could be small but the researchers could still reach the

conclusion that the mean for the child care group is significantly higher than the overall mean for third graders.

cance level of .05 to decide if there is convincing evidence that the mean time spent using the Internet by

Canadians is greater than 12.5 hours.

c. Explain why the null hypothesis was rejected in the

test of Part (b) but not in the test of Part (a).

10.56 The paper titled “Music for Pain Relief” (The

Cochrane Database of Systematic Reviews, April 19,

2006) concluded, based on a review of 51 studies of the

effect of music on pain intensity, that “Listening to music reduces pain intensity levels . . . However, the magnitude of these positive effects is small, the clinical relevance of music for pain relief in clinical practice is

unclear.” Are the authors of this paper claiming that the

pain reduction attributable to listening to music is not

statistically significant, not practically significant, or neither statistically nor practically significant? Explain.

Many consumers pay careful attention to

stated nutritional contents on packaged foods when making purchases. It is therefore important that the information on packages be accurate. A random sample of n ϭ 12

frozen dinners of a certain type was selected from production during a particular period, and the calorie content of

each one was determined. (This determination entails destroying the product, so a census would certainly not be

desirable!) Here are the resulting observations, along with

a boxplot and normal probability plot:

10.57

255

225

244

226

Data set available online

242

233

265

245

259

248

255

245

235

225

10.55 In a study of computer use, 1000 randomly se265

255

Calories

they spend using the Internet in a typical week (Ipsos

Reid, August 9, 2005). The mean of the sample observations was 12.7 hours.

a. The sample standard deviation was not reported, but

suppose that it was 5 hours. Carry out a hypothesis

test with a significance level of .05 to decide if there is

convincing evidence that the mean time spent using

the Internet by Canadians is greater than 12.5 hours.

b. Now suppose that the sample standard deviation was

2 hours. Carry out a hypothesis test with a signifi-

239

251

265

Calories

492

245

235

225

−1.5

−0.5

0.5

Normal score

1.5

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.5 Power and Probability of Type II Error

a. Is it reasonable to test hypotheses about mean calorie

content m by using a t test? Explain why or why not.

b. The stated calorie content is 240. Does the boxplot

suggest that true average content differs from the

c. Carry out a formal test of the hypotheses suggested

in Part (b).

Much concern has been expressed regarding

the practice of using nitrates as meat preservatives. In one

study involving possible effects of these chemicals, bacteria cultures were grown in a medium containing nitrates.

10.58

10.5

Data set available online

493

The rate of uptake of radio-labeled amino acid (in dpm,

disintegrations per minute) was then determined for

each culture, yielding the following observations:

7251 6871 9632 6866 9094 5849 8957 7978

7064 7494 7883 8178 7523 8724 7468 0000

Suppose that it is known that the mean rate of uptake for

cultures without nitrates is 8000. Do the data suggest

that the addition of nitrates results in a decrease in the

mean rate of uptake? Test the appropriate hypotheses

using a significance level of .10.

Video Solution available

Power and Probability of Type II Error

In this chapter, we have introduced test procedures for testing hypotheses about

population characteristics, such as m and p. What characterizes a “good” test procedure? It makes sense to think that a good test procedure is one that has both a small

probability of rejecting H0 when it is true (a Type I error) and a high probability of

rejecting H0 when it is false. The test procedures presented in this chapter allow us to

directly control the probability of rejecting a true H0 by our choice of the significance

level a. But what about the probability of rejecting H0 when it is false? As we will see,

several factors influence this probability. Let’s begin by considering an example.

Suppose that the student body president at a university is interested in studying

the amount of money that students spend on textbooks each semester. The director

of the financial aid office believes that the average amount spent on books is \$500 per

semester and uses this figure to determine the amount of financial aid for which a

student is eligible. The student body president plans to ask each individual in a random sample of students how much he or she spent on books this semester and has

decided to use the resulting data to test

H0: m ϭ 500

versus

Ha: m Ͼ 500

using a significance level of .05. If the true mean is 500 (or less than 500), the correct

decision is to fail to reject the null hypothesis. Incorrectly rejecting the null hypothesis is a Type I error. On the other hand, if the true mean is 525 or 505 or even 501,

the correct decision is to reject the null hypothesis. Not rejecting the null hypothesis

is a Type II error. How likely is it that the null hypothesis will in fact be rejected?

If the true mean is 501, the probability that we reject H0: m ϭ 500 is not very

great. This is because when we carry out the test, we are essentially looking at the

sample mean and asking, Does this look like what we would expect to see if the population mean were 500? As illustrated in Figure 10.4, if the true mean is greater than

but very close to 500, chances are that the sample mean will look pretty much like

what we would expect to see if the population mean were 500, and we will be unconvinced that the null hypothesis should be rejected. If the true mean is 525, it is less

likely that the sample will be mistaken for a sample from a population with mean

500; sample means will tend to cluster around 525, and so it is more likely that we

will correctly reject H0. If the true mean is 550, rejection of H0 is even more likely.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4: Hypothesis Tests for a Population Mean

Tải bản đầy đủ ngay(0 tr)

×