4: Hypothesis Tests for a Population Mean
Tải bản đầy đủ - 0trang
10.4 Hypothesis Tests for a Population Mean
483
Because it is rarely the case that s, the population standard deviation, is known, we
focus our attention on the test procedure for the case in which s is unknown.
When testing a hypothesis about a population mean, the null hypothesis specifies
a particular hypothesized value for m, specifically, H0: m ϭ hypothesized value. The
alternative hypothesis has one of the following three forms, depending on the research
question being addressed:
Ha: m Ͼ hypothesized value
Ha: m Ͻ hypothesized value
Ha: m ϶ hypothesized value
If n is large or if the population distribution is approximately normal, the test
statistic
t5
x 2 hypothesized value
s
!n
can be used. For example, if the null hypothesis to be tested is H0: m ϭ 100, the test
statistic becomes
t5
x 2 100
s
!n
Consider the alternative hypothesis Ha: m Ͼ 100, and suppose that a sample of
size n ϭ 24 gives xϭ 104.20 and s ϭ 8.23. The resulting test statistic value is
t5
104.20 2 100
4.20
5
5 2.50
8.23
1.6799
"24
Because this is an upper-tailed test, if the test statistic had been z rather than t, the
P-value would be the area under the z curve to the right of 2.50. With a t statistic,
the P-value is the area under an appropriate t curve (here with df ϭ 24 Ϫ 1 ϭ 23) to
the right of 2.50. Appendix Table 4 is a tabulation of t curve tail areas. Each column
of the table is for a different number of degrees of freedom: 1, 2, 3, . . . , 30, 35, 40,
60, 120, and a last column for df ϭ ϱ, which is the same as for the z curve. The table
gives the area under each t curve to the right of values ranging from 0.0 to 4.0 in
increments of 0.1. Part of this table appears in Figure 10.3. For example,
area under the 23-df t curve to the right of 2.5 5 .010
5 P-value for an upper-tailed t test
Suppose that t ϭ Ϫ2.7 for a lower-tailed test based on 23 df. Then, because each
t curve is symmetric about 0,
P-value ϭ area to the left of Ϫ2.7 ϭ area to the right of 2.7 ϭ .006
As is the case for z tests, we double the tail area to obtain the P-value for two-tailed t
tests. Thus, if t ϭ 2.6 or if t ϭ Ϫ2.6 for a two-tailed t test with 23 df, then
P-value ϭ 2(.008) ϭ .016
Once past 30 df, the tail areas change very little, so the last column (ϱ) in Appendix
Table 4 provides a good approximation.
The following two boxes show how the P-value is obtained as a t curve area and
give a general description of the test procedure.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10
Hypothesis Testing Using a Single Sample
...
22
...
...
...
2.5
...
.010
.010
.010
...
2.6
...
.008
.008
.008
...
2.7
...
.007
.006
.006
...
2.8
..
.
...
.005
.005
.005
...
...
...
df
t
0.0
1
2
0.1
..
.
23
24
...
60
120
...
484
4.0
FIGURE 10.3
f
Area under 23-d 7
2.
of
ht
rig
to
e
t curv
Part of Appendix Table 4: t curve tail
areas.
Finding P-Values for a t Test
1. Upper-tailed test:
t curve for n − 1 df
Ha: m Ͼ hypothesized value
P-value = area in upper tail
0
Calculated t
2. Lower-tailed test:
Ha: m Ͻ hypothesized value
t curve for n − 1 df
P-value = area in lower tail
0
Calculated t
3. Two-tailed test:
P-value = sum of area in two tails
t curve for n − 1 df
Ha: m ϶ hypothesized value
0
Calculated t, −t
Appendix Table 4 gives upper-tail t curve areas to the right of values 0.0, 0.1, . . . , 4.0. These areas are P-values for
upper-tailed tests and, by symmetry, also for lower-tailed tests. Doubling an area gives the P-value for a two-tailed
test.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.4 Hypothesis Tests for a Population Mean
485
The One-Sample t Test for a Population Mean
Null hypothesis: H0: m ϭ hypothesized value
Test statistic: t 5
x 2 hypothesized value
s
"n
Alternative Hypothesis:
Ha: m Ͼ hypothesized value
Ha: m Ͻ hypothesized value
Ha: m ϶ hypothesized value
P-Value:
Area to the right of calculated t under t curve with df ϭ n Ϫ 1
Area to the left of calculated t under t curve with df ϭ n Ϫ 1
(1) 2(area to the right of t) if t is positive, or
(2) 2(area to the left of t) if t is negative
Assumptions: 1. x and s are the sample mean and sample standard deviation from a random sample.
2. The sample size is large (generally n Ն 30) or the population distribution is at least approximately
normal.
EXAMPLE 10.13
Time Stands Still (or So It Seems)
A study conducted by researchers at Pennsylvania State University investigated
whether time perception, an indication of a person’s ability to concentrate, is impaired
during nicotine withdrawal. The study results were presented in the paper “Smoking
Abstinence Impairs Time Estimation Accuracy in Cigarette Smokers” (Psychopharmacology Bulletin [2003]: 90–95). After a 24-hour smoking abstinence, 20 smokers
were asked to estimate how much time had passed during a 45-second period. Suppose
the resulting data on perceived elapsed time (in seconds) were as follows (these data are
artificial but are consistent with summary quantities given in the paper):
69
56
65
50
72
70
73
47
59
56
55
45
39
70
52
64
67
67
57
53
From these data, we obtain
n ϭ 20
x ϭ 59.30
s ϭ 9.84
The researchers wanted to determine whether smoking abstinence had a negative
impact on time perception, causing elapsed time to be overestimated. With m representing the mean perceived elapsed time for smokers who have abstained from smoking for 24 hours, we can answer this question by testing
H0: m ϭ 45 (no consistent tendency to overestimate the time elapsed)
versus
Ha: m Ͼ 45 (tendency for elapsed time to be overestimated)
The null hypothesis is rejected only if there is convincing evidence that m Ͼ 45. The
observed value, 59.30, is certainly larger than 45, but can a sample mean as large
as this be plausibly explained by chance variation from one sample to another when
m ϭ 45? To answer this question, we carry out a hypothesis test with a significance
level of .05 using the nine-step procedure described in Section 10.3.
1. Population characteristic of interest:
Data set available online
m ϭ mean perceived elapsed time for smokers who have abstained from smoking
for 24 hours
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
486
Chapter 10 Hypothesis Testing Using a Single Sample
2. Null hypothesis: H0: m ϭ 45
3. Alternative hypothesis: Ha: m Ͼ 45
4. Significance level: a ϭ .05
5. Test statistic: t 5
x 2 hypothesized value
x 2 45
5
s
s
!n
!n
6. Assumptions: This test requires a random sample and either a large sample size or
a normal population distribution. The authors of the paper believed that it was
reasonable to consider this sample as representative of smokers in general, and if this
is the case, it is reasonable to regard it as if it were a random sample. Because the
sample size is only 20, for the t test to be appropriate, we must be willing to assume
that the population distribution of perceived elapsed times is at least approximately
normal. Is this reasonable? The following graph gives a boxplot of the data:
40
50
60
Perceived elapsed time
70
Although the boxplot is not perfectly symmetric, it does not appear to be too
skewed and there are no outliers, so we judge the use of the t test to be reasonable.
7. Computations: n ϭ 20, x ϭ 59.30, and s ϭ 9.84, so
t5
59.30 2 45
14.30
5
5 6.50
9.84
2.20
"20
8. P-value: This is an upper-tailed test (the inequality in Ha is “greater than”), so
the P-value is the area to the right of the computed t value. Because df ϭ 20 Ϫ
1 ϭ 19, we can use the df ϭ 19 column of Appendix Table 4 to find the P-value.
With t ϭ 6.50, we obtain P-value ϭ area to the right of 6.50 Ϸ 0 (because 6.50 is
greater than 4.0, the largest tabulated value).
9. Conclusion: Because P-value Յ a, we reject H0 at the .05 level of significance.
There is virtually no chance of seeing a sample mean (and hence a t value) this
extreme as a result of just chance variation when H0 is true. There is convincing
evidence that the mean perceived time elapsed is greater than the actual time
elapsed of 45 seconds.
This paper also looked at perception of elapsed time for a sample of nonsmokers and
for a sample of smokers who had not abstained from smoking. The investigators found
that the null hypothesis of m ϭ 45 could not be rejected for either of these groups.
EXAMPLE 10.14
Step-by-Step technology
instructions available online
Data set available online
Goofing Off at Work
A growing concern of employers is time spent in activities like surfing the Internet
and e-mailing friends during work hours. The San Luis Obispo Tribune summarized
the findings from a survey of a large sample of workers in an article that ran under the
headline “Who Goofs Off 2 Hours a Day? Most Workers, Survey Says” (August 3,
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.4 Hypothesis Tests for a Population Mean
487
2006). Suppose that the CEO of a large company wants to determine whether the
average amount of wasted time during an 8-hour work day for employees of her company is less than the reported 120 minutes. Each person in a random sample of 10
employees was contacted and asked about daily wasted time at work. (Participants
would probably have to be guaranteed anonymity to obtain truthful responses!) The
resulting data are the following:
108
112
117
130
111
131
113
113
105
128
Summary quantities are n ϭ 10, x ϭ 116.80, and s ϭ 9.45.
Do these data provide evidence that the mean wasted time for this company is
less than 120 minutes? To answer this question, let’s carry out a hypothesis test with
a ϭ .05.
1.
2.
3.
4.
m ϭ mean daily wasted time for employees of this company
H0: m ϭ 120
Ha: m Ͻ 120
a ϭ .05
5. t 5
x 2 hypothesized value
x 2 120
5
s
s
"n
"n
6. This test requires a random sample and either a large sample or a normal population distribution. The given sample was a random sample of employees. Because
the sample size is small, we must be willing to assume that the population distribution of times is at least approximately normal. The accompanying normal probability plot appears to be reasonably straight, and although the normal probability
plot and the boxplot reveal some skewness in the sample, there are no outliers.
105
110
115
120
Wasted time
125
130
2
Normal score
1
0
−1
−2
105
110
115
120
Wasted time
125
130
Correlations (Pearson)
Correlation of Time and Normal Score ϭ 0.943
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
488
Chapter 10 Hypothesis Testing Using a Single Sample
Also, the correlation between the expected normal scores and the observed data
for this sample is .943, which is well above the critical r value for n ϭ 10 of .880 (see
Chapter 5 for critical r values). Based on these observations, it is plausible that the
population distribution is approximately normal, so we proceed with the t test.
7. Test statistic: t 5
116.80 2 120
5 21.07
9.45
!10
8. From the df ϭ 9 column of Appendix Table 4 and by rounding the test statistic
value to Ϫ1.1, we get
P-value ϭ area to the left of Ϫ1.1 ϭ area to the right of 1.1 ϭ .150
as shown:
t curve with
df = 9
.150
1.1
0
9. Because the P-value Ͼ a, we fail to reject H0. There is not sufficient evidence to
conclude that the mean wasted time per 8-hour work day for employees at this
company is less than 120 minutes.
Minitab could also have been used to carry out the test, as shown in the output
below.
One-Sample T: Wasted Time
Test of mu = 120 vs < 120
Variable
Wasted Time
N
10
Mean
116.800
StDev
9.449
SE Mean
2.988
95%
Upper
Bound
122.278
T
–1.07
P
0.156
Although we had to round the computed t value to Ϫ1.1 to use Appendix Table 4,
Minitab was able to compute the P-value corresponding to the actual value of the test
statistic, which was P-value ϭ 0.156.
© Dynamic Graphics/Creatas/Alamy
EXAMPLE 10.15
Cricket Love
The article “Well-Fed Crickets Bowl Maidens Over” (Nature Science Update, February 11, 1999) reported that female field crickets are attracted to males that have high
chirp rates and hypothesized that chirp rate is related to nutritional status. The usual
chirp rate for male field crickets was reported to vary around a mean of 60 chirps per
second. To investigate whether chirp rate was related to nutritional status, investigators fed male crickets a high protein diet for 8 days, after which chirp rate was measured. The mean chirp rate for the crickets on the high protein diet was reported to
be 109 chirps per second. Is this convincing evidence that the mean chirp rate for
crickets on a high protein diet is greater than 60 (which would then imply an advantage in attracting the ladies)? Suppose that the sample size and sample standard deviation are n ϭ 32 and s ϭ 40. Let's test the relevant hypotheses with a ϭ .01.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.4 Hypothesis Tests for a Population Mean
1.
2.
3.
4.
489
m ϭ mean chirp rate for crickets on a high protein diet
H0: m ϭ 60
Ha: m Ͼ 60
a ϭ .01
5. t 5
x 2 hypothesized value
x 2 60
5
s
s
!n
!n
6. This test requires a random sample and either a large sample or a normal population distribution. Because the sample size is large (n ϭ 32), it is reasonable to
proceed with the t test as long as we are willing to consider the 32 male field
crickets in this study as if they were a random sample from the population of
male field crickets.
7. Test statistic: t 5
49
109 2 60
5 6.93
5
40
7.07
!32
8. This is an upper-tailed test, so the P-value is the area under the t curve with
df ϭ 31 and to the right of 6.93. From Appendix Table 4, P-value Ϸ 0.
9. Because P-value Ϸ 0, which is less than the significance level, a, we reject H0.
There is convincing evidence that the mean chirp rate is higher for male field
crickets that eat a high protein diet.
Statistical Versus Practical Significance
Carrying out a hypothesis test amounts to deciding whether the value obtained
for the test statistic could plausibly have resulted when H0 is true. When the
value of the test statistic leads to rejection of H0, it is customary to say that the
result is statistically significant at the chosen significance level ␣. The finding of
statistical significance means that, in the investigator’s opinion, the observed deviation from what was expected under H0 cannot reasonably be attributed to only
chance variation. However, statistical significance is not the same as concluding
that the true situation differs from what the null hypothesis states in any practical
sense. That is, even after H0 has been rejected, the data may suggest that there is
no practical difference between the actual value of the population characteristic
and what the null hypothesis states that value to be. This is illustrated in Example
10.16.
EXAMPLE 10.16
“Significant” but Unimpressive
Test Score Improvement
Let m denote the average score on a standardized test for all children in a certain region of the United States. The average score for all children in the United States
is 100. Regional education authorities are interested in testing H0: m ϭ 100 versus
Ha: m Ͼ 100 using a significance level of .001. A sample of 2500 children resulted in
the values n ϭ 2500, x ϭ 101.0, and s ϭ 15.0. Then
t5
101.0 2 100
5 3.3
15
!2500
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
490
Chapter 10 Hypothesis Testing Using a Single Sample
This is an upper-tailed test, so (using the z column of Appendix Table 4 because df ϭ
2499) P-value ϭ area to the right of 3.33 Ϸ .000. Because P-value Ͻ .001, we reject
H0. There is evidence that the mean score for this region is greater than 100.
However, with n ϭ 2500, the point estimate x ϭ101.0 is almost surely very close
to the true value of m. Therefore, it looks as though H0 was rejected because m Ϸ 101
rather than 100. And, from a practical point of view, a 1-point difference is most
likely of no practical importance.
EX E RC I S E S 1 0 . 4 2 - 1 0 . 5 8
10.42 Give as much information as you can about the
P-value of a t test in each of the following situations:
a. Upper-tailed test, df ϭ 8, t ϭ 2.0
b. Upper-tailed test, n ϭ 14, t ϭ 3.2
c. Lower-tailed test, df ϭ 10, t ϭ Ϫ2.4
d. Lower-tailed test, n ϭ 22, t ϭ Ϫ4.2
e. Two-tailed test, df ϭ 15, t ϭ Ϫ1.6
f. Two-tailed test, n ϭ 16, t ϭ 1.6
g. Two-tailed test, n ϭ 16, t ϭ 6.3
10.43 Give as much information as you can about the
P-value of a t test in each of the following situations:
a. Two-tailed test, df ϭ 9, t ϭ 0.73
b. Upper-tailed test, df ϭ 10, t ϭ Ϫ0.5
c. Lower-tailed test, n ϭ 20, t ϭ Ϫ2.1
d. Lower-tailed test, n ϭ 20, t ϭ Ϫ5.1
e. Two-tailed test, n ϭ 40, t ϭ 1.7
10.44 Paint used to paint lines on roads must reflect
enough light to be clearly visible at night. Let m denote
the mean reflectometer reading for a new type of paint
under consideration. A test of H0: m ϭ 20 versus
Ha: m Ͼ 20 based on a sample of 15 observations gave
t ϭ 3.2. What conclusion is appropriate at each of the
following significance levels?
a. a ϭ .05
c. a ϭ .001
b. a ϭ .01
10.45 A certain pen has been designed so that true
average writing lifetime under controlled conditions (involving the use of a writing machine) is at least 10 hours.
A random sample of 18 pens is selected, the writing lifetime of each is determined, and a normal probability plot
of the resulting data supports the use of a one-sample
t test. The relevant hypotheses are H0: m ϭ 10 versus
Ha: m Ͻ 10.
a. If t ϭ Ϫ2.3 and a ϭ .05 is selected, what conclusion
is appropriate?
Bold exercises answered in back
Data set available online
b. If t ϭ Ϫ1.83 and a ϭ .01 is selected, what conclusion is appropriate?
c. If t ϭ 0.47, what conclusion is appropriate?
10.46 The true average diameter of ball bearings of a
certain type is supposed to be 0.5 inch. What conclusion
is appropriate when testing H0: m ϭ 0.5 versus Ha: m ϶
0.5 inch each of the following situations:
a. n ϭ 13, t ϭ 1.6, a ϭ .05
b. n ϭ 13, t ϭ Ϫ1.6, a ϭ .05
c. n ϭ 25, t ϭ Ϫ2.6, a ϭ .01
d. n ϭ 25, t ϭ Ϫ3.6
10.47 The paper “Playing Active Video Games Increases Energy Expenditure in Children” (Pediatrics
[2009]: 534–539) describes an interesting investigation
of the possible cardiovascular benefits of active video
games. Mean heart rate for healthy boys age 10 to 13 after
walking on a treadmill at 2.6 km/hour for 6 minutes is
98 beats per minute (bpm). For each of 14 boys, heart
rate was measured after 15 minutes of playing Wii Bowling. The resulting sample mean and standard deviation
were 101 bpm and 15 bpm, respectively. For purposes of
this exercise, assume that it is reasonable to regard the
sample of boys as representative of boys age 10 to 13 and
that the distribution of heart rates after 15 minutes of Wii
Bowling is approximately normal.
a. Does the sample provide convincing evidence that
the mean heart rate after 15 minutes of Wii Bowling
is different from the known mean heart rate after
6 minutes walking on the treadmill? Carry out a
hypothesis test using a ϭ .01.
b. The known resting mean heart rate for boys in this age
group is 66 bpm. Is there convincing evidence that the
mean heart rate after Wii Bowling for 15 minutes is
higher than the known mean resting heart rate for boys
of this age? Use a ϭ .01.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.4 Hypothesis Tests for a Population Mean
c. Based on the outcomes of the tests in Parts (a) and
(b), write a paragraph comparing the benefits of
treadmill walking and Wii Bowling in terms of raising heart rate over the resting heart rate.
10.48 A study of fast-food intake is described in the
paper “What People Buy From Fast-Food Restaurants”
(Obesity [2009]: 1369–1374). Adult customers at three
hamburger chains (McDonald’s, Burger King, and
Wendy’s) at lunchtime in New York City were approached as they entered the restaurant and asked to
provide their receipt when exiting. The receipts were
then used to determine what was purchased and the
number of calories consumed was determined. In all,
3857 people participated in the study. The sample mean
number of calories consumed was 857 and the sample
standard deviation was 677.
a. The sample standard deviation is quite large. What
does this tell you about number of calories consumed in a hamburger-chain lunchtime fast-food
purchase in New York City?
b. Given the values of the sample mean and standard
deviation and the fact that the number of calories
consumed can’t be negative, explain why it is not
reasonable to assume that the distribution of calories
consumed is normal.
c. Based on a recommended daily intake of 2000 calories, the online Healthy Dining Finder (www
.healthydiningfinder.com) recommends a target of
750 calories for lunch. Assuming that it is reasonable
to regard the sample of 3857 fast-food purchases as
representative of all hamburger-chain lunchtime
purchases in New York City, carry out a hypothesis
test to determine if the sample provides convincing
evidence that the mean number of calories in a New
York City hamburger-chain lunchtime purchase is
greater than the lunch recommendation of 750 calories. Use a ϭ .01.
d. Would it be reasonable to generalize the conclusion
of the test in Part (c) to the lunchtime fast-food
purchases of all adult Americans? Explain why or
why not.
e. Explain why it is better to use the customer receipt
to determine what was ordered rather than just asking a customer leaving the restaurant what he or she
purchased.
f. Do you think that asking a customer to provide his
or her receipt before they ordered could have introduced a potential bias? Explain.
491
10.49 The report “Highest Paying Jobs for 2009–10
Bachelor’s Degree Graduates” (National Association
of Colleges and Employers, February 2010) states that
the mean yearly salary offer for students graduating with
a degree in accounting in 2010 is $48,722. Suppose that
a random sample of 50 accounting graduates at a large
university who received job offers resulted in a mean offer of $49,850 and a standard deviation of $3300. Do
the sample data provide strong support for the claim that
the mean salary offer for accounting graduates of this
university is higher than the 2010 national average of
$48,722? Test the relevant hypotheses using a ϭ .05.
The Economist collects data each year on the
price of a Big Mac in various countries around the world.
The price of a Big Mac for a sample of McDonald’s restaurants in Europe in May 2009 resulted in the following Big Mac prices (after conversion to U.S. dollars):
10.50
3.80
5.89
4.92
3.88
2.65
5.57
6.39
3.24
The mean price of a Big Mac in the U.S. in May 2009
was $3.57. For purposes of this exercise, assume it is
reasonable to regard the sample as representative of European McDonald’s restaurants. Does the sample provide convincing evidence that the mean May 2009 price
of a Big Mac in Europe is greater than the reported U.S.
price? Test the relevant hypotheses using a ϭ .05.
10.51 A credit bureau analysis of undergraduate students credit records found that the average number of
credit cards in an undergraduate’s wallet was 4.09 (“Undergraduate Students and Credit Cards in 2004,”
Nellie Mae, May 2005). It was also reported that in a
random sample of 132 undergraduates, the sample mean
number of credit cards that the students said they carried
was 2.6. The sample standard deviation was not reported, but for purposes of this exercise, suppose that it
was 1.2. Is there convincing evidence that the mean
number of credit cards that undergraduates report carrying is less than the credit bureau’s figure of 4.09?
Medical research has shown that repeated wrist
extension beyond 20 degrees increases the risk of wrist and
hand injuries. Each of 24 students at Cornell University
used a proposed new computer mouse design, and while
using the mouse, each student’s wrist extension was recorded. Data consistent with summary values given in the
paper “Comparative Study of Two Computer Mouse
10.52
Designs” (Cornell Human Factors Laboratory Technical Report RP7992) are given. Use these data to test the
hypothesis that the mean wrist extension for people using
Bold exercises answered in back
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter 10 Hypothesis Testing Using a Single Sample
this new mouse design is greater than 20 degrees. Are any
assumptions required in order for it to be appropriate to
generalize the results of your test to the population of Cornell students? To the population of all university students?
27 28 24 26 27 25 25 24 24 24 25 28
22 25 24 28 27 26 31 25 28 27 27 25
10.53 The international polling organization Ipsos reported data from a survey of 2000 randomly selected Canadians who carry debit cards (Canadian Account Habits
Survey, July 24, 2006). Participants in this survey were
asked what they considered the minimum purchase
amount for which it would be acceptable to use a debit
card. Suppose that the sample mean and standard deviation were $9.15 and $7.60, respectively. (These values are
consistent with a histogram of the sample data that appears in the report.) Do these data provide convincing evidence that the mean minimum purchase amount for
which Canadians consider the use of a debit card to be
appropriate is less than $10? Carry out a hypothesis test
with a significance level of .01.
10.54 A comprehensive study conducted by the National Institute of Child Health and Human Development tracked more than 1000 children from an early age
through elementary school (New York Times, November 1, 2005). The study concluded that children who
spent more than 30 hours a week in child care before
entering school tended to score higher in math and reading when they were in the third grade. The researchers
cautioned that the findings should not be a cause for
alarm because the effects of child care were found to be
small. Explain how the difference between the sample
mean math score for third graders who spent long hours
in child care and the known overall mean for third graders could be small but the researchers could still reach the
conclusion that the mean for the child care group is significantly higher than the overall mean for third graders.
cance level of .05 to decide if there is convincing evidence that the mean time spent using the Internet by
Canadians is greater than 12.5 hours.
c. Explain why the null hypothesis was rejected in the
test of Part (b) but not in the test of Part (a).
10.56 The paper titled “Music for Pain Relief” (The
Cochrane Database of Systematic Reviews, April 19,
2006) concluded, based on a review of 51 studies of the
effect of music on pain intensity, that “Listening to music reduces pain intensity levels . . . However, the magnitude of these positive effects is small, the clinical relevance of music for pain relief in clinical practice is
unclear.” Are the authors of this paper claiming that the
pain reduction attributable to listening to music is not
statistically significant, not practically significant, or neither statistically nor practically significant? Explain.
Many consumers pay careful attention to
stated nutritional contents on packaged foods when making purchases. It is therefore important that the information on packages be accurate. A random sample of n ϭ 12
frozen dinners of a certain type was selected from production during a particular period, and the calorie content of
each one was determined. (This determination entails destroying the product, so a census would certainly not be
desirable!) Here are the resulting observations, along with
a boxplot and normal probability plot:
10.57
255
225
244
226
Bold exercises answered in back
Data set available online
242
233
265
245
259
248
255
245
235
225
10.55 In a study of computer use, 1000 randomly se265
255
Calories
lected Canadian Internet users were asked how much time
they spend using the Internet in a typical week (Ipsos
Reid, August 9, 2005). The mean of the sample observations was 12.7 hours.
a. The sample standard deviation was not reported, but
suppose that it was 5 hours. Carry out a hypothesis
test with a significance level of .05 to decide if there is
convincing evidence that the mean time spent using
the Internet by Canadians is greater than 12.5 hours.
b. Now suppose that the sample standard deviation was
2 hours. Carry out a hypothesis test with a signifi-
239
251
265
Calories
492
245
235
225
−1.5
−0.5
0.5
Normal score
1.5
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.5 Power and Probability of Type II Error
a. Is it reasonable to test hypotheses about mean calorie
content m by using a t test? Explain why or why not.
b. The stated calorie content is 240. Does the boxplot
suggest that true average content differs from the
stated value? Explain your reasoning.
c. Carry out a formal test of the hypotheses suggested
in Part (b).
Much concern has been expressed regarding
the practice of using nitrates as meat preservatives. In one
study involving possible effects of these chemicals, bacteria cultures were grown in a medium containing nitrates.
10.58
Bold exercises answered in back
10.5
Data set available online
493
The rate of uptake of radio-labeled amino acid (in dpm,
disintegrations per minute) was then determined for
each culture, yielding the following observations:
7251 6871 9632 6866 9094 5849 8957 7978
7064 7494 7883 8178 7523 8724 7468 0000
Suppose that it is known that the mean rate of uptake for
cultures without nitrates is 8000. Do the data suggest
that the addition of nitrates results in a decrease in the
mean rate of uptake? Test the appropriate hypotheses
using a significance level of .10.
Video Solution available
Power and Probability of Type II Error
In this chapter, we have introduced test procedures for testing hypotheses about
population characteristics, such as m and p. What characterizes a “good” test procedure? It makes sense to think that a good test procedure is one that has both a small
probability of rejecting H0 when it is true (a Type I error) and a high probability of
rejecting H0 when it is false. The test procedures presented in this chapter allow us to
directly control the probability of rejecting a true H0 by our choice of the significance
level a. But what about the probability of rejecting H0 when it is false? As we will see,
several factors influence this probability. Let’s begin by considering an example.
Suppose that the student body president at a university is interested in studying
the amount of money that students spend on textbooks each semester. The director
of the financial aid office believes that the average amount spent on books is $500 per
semester and uses this figure to determine the amount of financial aid for which a
student is eligible. The student body president plans to ask each individual in a random sample of students how much he or she spent on books this semester and has
decided to use the resulting data to test
H0: m ϭ 500
versus
Ha: m Ͼ 500
using a significance level of .05. If the true mean is 500 (or less than 500), the correct
decision is to fail to reject the null hypothesis. Incorrectly rejecting the null hypothesis is a Type I error. On the other hand, if the true mean is 525 or 505 or even 501,
the correct decision is to reject the null hypothesis. Not rejecting the null hypothesis
is a Type II error. How likely is it that the null hypothesis will in fact be rejected?
If the true mean is 501, the probability that we reject H0: m ϭ 500 is not very
great. This is because when we carry out the test, we are essentially looking at the
sample mean and asking, Does this look like what we would expect to see if the population mean were 500? As illustrated in Figure 10.4, if the true mean is greater than
but very close to 500, chances are that the sample mean will look pretty much like
what we would expect to see if the population mean were 500, and we will be unconvinced that the null hypothesis should be rejected. If the true mean is 525, it is less
likely that the sample will be mistaken for a sample from a population with mean
500; sample means will tend to cluster around 525, and so it is more likely that we
will correctly reject H0. If the true mean is 550, rejection of H0 is even more likely.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.