5: Power and Probability of Type II Error
Tải bản đầy đủ - 0trang
494
Chapter 10 Hypothesis Testing Using a Single Sample
FIGURE 10.4
Sampling distribution of x when
m ϭ 500, 505, 525.
500 505
525
When we consider the probability of rejecting the null hypothesis, we are looking
at what statisticians refer to as the power of the test.
The power of a test is the probability of rejecting the null hypothesis.
From the previous discussion, it should be apparent that when a hypothesis about
a population mean is being tested, the power of the test depends on the true value of
the population mean, m. Because the actual value of m is unknown (if we knew the
value of m we wouldn’t be doing the hypothesis test!), we cannot know what the power
is for the actual value of m. It is possible, however, to gain some insight into the power
of a test by looking at a number of “what if” scenarios. For example, we might ask,
What is the power if the actual mean is 525? or What is the power if the actual mean
is 505? and so on. That is, we can determine the power at m ϭ 525, the power at m
ϭ 505, and the power at any other value of interest. Although it is technically possible
to consider power when the null hypothesis is true, an investigator is usually concerned
about the power only at values for which the null hypothesis is false.
In general, when testing a hypothesis about a population characteristic, there are
three factors that influence the power of the test:
1. The size of the difference between the actual value of the population characteristic and the hypothesized value (the value that appears in the null hypothesis);
2. The choice of significance level, a, for the test; and
3. The sample size.
Effect of Various Factors on the Power of a Test
1. The larger the size of the discrepancy between the hypothesized value and
the actual value of the population characteristic, the higher the power.
2. The larger the significance level, a, the higher the power of the test.
3. The larger the sample size, the higher the power of the test.
Let’s consider each of the statements in the box above. The first statement has
already been discussed in the context of the textbook example. Because power is the
probability of rejecting the null hypothesis, it makes sense that the power will be
higher when the actual value of a population characteristic is quite different from the
hypothesized value than when it is close to that value.
The effect of significance level on power is not quite as obvious. To understand
the relationship between power and significance level, it helps to see the relationship
between power and b, the probability of a Type II error.
When H0 is false, power ϭ 1 Ϫ b.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.5 Power and Probability of Type II Error
495
This relationship follows from the definitions of power and Type II error. A Type
II error results from not rejecting a false H0. Because power is the probability of rejecting H0, it follows that when H0 is false
power ϭ probability of rejecting a false H0
power ϭ 1 Ϫ probability of not rejecting a false H0
power ϭ 1 Ϫ b
Recall from Section 10.2 that the choice of a, the Type I error probability, affects the
value of b, the Type II error probability. Choosing a larger value for a results in a
smaller value for b (and therefore a larger value for 1 Ϫ b). In terms of power, this
means that choosing a larger value for a results in a larger value for the power of the
test. That is, the larger the Type I error probability we are willing to tolerate, the more
likely it is that the test will be able to detect any particular departure from H0.
The third factor that affects the power of a test is the sample size. When H0 is
false, the power of a test is the probability that we will in fact “detect” that H0 is false
and, based on the observed sample, reject H0. Intuition suggests that we will be more
likely to detect a departure from H0 with a large sample than with a small sample.
This is in fact the case—the larger the sample size, the higher the power.
Consider testing the hypotheses presented previously:
H0: m ϭ 500 versus Ha: m Ͼ 500
The observations about power imply the following, for example:
1. For any value of m exceeding 500, the power of a test based on a sample of size
100 is higher than the power of a test based on a sample of size 75 (assuming the
same significance level).
2. For any value of m exceeding 500, the power of a test using a significance level of
.05 is higher than the power of a test using a significance level of .01 (assuming
the same sample size).
3. For any value of m exceeding 500, the power of the test is greater if the actual
mean is 550 than if the actual mean is 525 (assuming the same sample size and
significance level).
As was mentioned previously in this section, it is impossible to calculate the exact
power of a test because in practice we do not know the values of population characteristics. However, we can evaluate the power at a selected alternative value which would tell
us whether the power would be high or low if this alternative value is the actual value.
The following optional subsection shows how Type II error probabilities and
power can be evaluated for selected tests.
Calculating Power and Type II Error Probabilities
for Selected Tests (Optional)
The test procedures presented in this chapter are designed to control the probability
of a Type I error (rejecting H0 when H0 is true) at the desired significance level a.
However, little has been said so far about calculating the value of b, the probability
of a Type II error (not rejecting H0 when H0 is false). Here, we consider the determination of b and power for the hypothesis tests previously introduced.
When we carry out a hypothesis test, we specify the desired value of a, the probability of a Type I error. The probability of a Type II error, b, is the probability of
not rejecting H0 even though it is false. Suppose that we are testing
H0: m ϭ 1.5 versus Ha: m Ͼ 1.5
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
496
Chapter 10
Hypothesis Testing Using a Single Sample
Because we do not know the actual value of m, we cannot calculate the value of b.
However, the vulnerability of the test to Type II error can be investigated by calculating b for several different potential values of m, such as m ϭ 1.55, m ϭ 1.6, and m ϭ
1.7. Once the value of b has been determined, the power of the test at the corresponding alternative value is just 1 Ϫ b.
EXAMPLE 10.17
Calculating Power
An airline claims that the mean time on hold for callers to its customer service phone
line is 1.5 minutes. We might investigate this claim by testing
H0: m ϭ 1.5 versus Ha: m Ͼ 1.5
where m is the actual mean customer hold time. A random sample of n ϭ 36 calls
is to be selected, and the resulting data will be used to reach a conclusion. Suppose
that the standard deviation of hold time (s) is known to be 0.20 minutes and that a
significance level of .01 is to be used. Our test statistic (because s ϭ 0.20) is
z5
x 2 1.5
x 2 1.5
x 2 1.5
5
5
.20
.20
.0333
!n
!36
The inequality in Ha implies that
P-value ϭ area under z curve to the right of calculated z
From Appendix Table 2, it is easily verified that the z critical value 2.33 captures
an upper-tail z curve area of .01. Thus, P-value Յ .01 only when z Ն 2.33. This is
equivalent to the decision rule
reject H0 if calculated z Ն 2.33
which becomes
reject H0 if
x 2 1.5
$ 2.33
.0333
Solving this inequality for x we get
x $ 1.5 1 2.33 1.03332
or
x $ 1.578
So if x $ 1.578, we will reject H0, and if x , 1.578, we will fail to reject H0. This
decision rule corresponds to a ϭ .01.
Suppose now that m ϭ 1.6 (so that H0 is false). A Type II error will then occur
if x , 1.578. What is the probability that this occurs? If m ϭ 1.6, the sampling distribution of x is approximately normal, centered at 1.6, and has a standard deviation
of .0333. The probability of observing an x value less than 1.578 can then be determined by finding an area under a normal curve with mean 1.6 and standard deviation
.0333, as illustrated in Figure 10.5.
Because the curve in Figure 10.5 is not the standard normal (z) curve, we must
first convert to a z score before using Appendix Table 2 to find the area. Here,
z score for 1.578 5
1.578 2 1.6
1.578 2 mx
5
5 2.66
sx
.0333
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.5 Power and Probability of Type II Error
497
–x distribution (normal with mean
1.6 and standard deviation 0.0333)
x < 1.578)
β = P( –
FIGURE 10.5
b when m ϭ 1.6 in Example 10.17.
1.578
1.6
and
area under z curve to left of Ϫ0.66 ϭ .2546
So, if m ϭ 1.6, b ϭ .2546. This means that if m is 1.6, about 25% of all samples
would still result in x values less than 1.578 and failure to reject H0.
The power of the test at m ϭ 1.6 is then
(power at m ϭ 1.6) ϭ 1 Ϫ (b when m is 1.6)
ϭ 1 Ϫ .2546
ϭ .7454
This means that if the actual mean is 1.6, the probability of rejecting H0: m ϭ 1.5 in
favor of Ha: m Ͼ 1.5 is .7454. That is, if m is 1.6 and the test is used repeatedly with
random samples selected from the population, in the long run about 75% of the
samples will result in the correct conclusion to reject H0.
Now consider b and power when m ϭ 1.65. The normal curve in Figure 10.5
would then be centered at 1.65. Because b is the area to the left of 1.578 and the
curve has shifted to the right, b decreases. Converting 1.578 to a z score and using
Appendix Table 2 gives b ϭ .0154. Also,
(power at m ϭ 1.65) ϭ 1 Ϫ .0154 ϭ .9846
As expected, the power at m ϭ 1.65 is higher than the power at m ϭ 1.6 because 1.65
is farther from the hypothesized value of 1.5.
Statistical software and graphing calcultors can calculate the power for specified
values of s, a, n, and the difference between the actual and hypothesized values of m.
The following Minitab output shows power calculations corresponding to those in
Example 10.17:
1-Sample Z Test
Testing mean = null (versus > null)
Alpha = 0.01
Sigma = 0.2
Sample Size = 36
Difference
Power
0.10
0.7497
0.15
0.9851
The slight differences between the power values computed by Minitab and those
previously obtained are due to rounding in Example 10.17.
The probability of a Type II error and the power for z tests concerning a population proportion are calculated in an analogous manner.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
498
Chapter 10 Hypothesis Testing Using a Single Sample
EXAMPLE 10.18
Power for Testing Hypotheses
About Proportions
A package delivery service advertises that at least 90% of all packages brought to its
office by 9 a.m. for delivery in the same city are delivered by noon that day. Let p
denote the proportion of all such packages actually delivered by noon. The hypotheses of interest are
H0: p ϭ .9
Ha: p Ͻ .9
versus
where the alternative hypothesis states that the company’s claim is untrue. The value
p ϭ .8 represents a substantial departure from the company’s claim. If the hypotheses are
tested at level .01 using a sample of n ϭ 225 packages, what is the probability that the departure from H0 represented by this alternative value will go undetected?
At significance level .01, H0 is rejected if P-value Յ .01. For the case of a lowertailed test, this is the same as rejecting H0 if
p^ 2 mp^
p^ 2 .9
p^ 2 .9
z5
# 22.33
5
5
sp^
.02
1.92 1.12
Å 225
(Because Ϫ2.33 captures a lower-tail z curve area of .01, the smallest 1% of all z values satisfy z Յ Ϫ2.33.) This inequality is equivalent to p^ Յ .853, so H0 is not rejected
if p^ Ͼ .853. When p ϭ .8, p^ has approximately a normal distribution with
m p^ 5 .8
s p^ 5
1.82 1.22
5 .0267
Å 225
Then b is the probability of obtaining a sample proportion greater than .853, as illustrated in Figure 10.6.
Sampling distribution of pˆ (normal with
mean 0.8 and standard deviation 0.0267)
β
FIGURE 10.6
b when p ϭ .8 in Example 10.18.
0.8
0.853
Converting to a z score results in
z5
.853 2 .8
5 1.99
.0267
and Appendix Table 2 gives
b ϭ 1 Ϫ .9767 ϭ .0233
When p ϭ .8 and a level .01 test is used, less than 3% of all samples of size n ϭ 225
will result in a Type II error. The power of the test at p ϭ .8 is 1 Ϫ .0233 ϭ .9767.
This means that the probability of rejecting H0: p ϭ .9 in favor of Ha: p Ͻ .9 when p
is really .8 is .9767, which is quite high.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.5 Power and Probability of Type II Error
499
b and Power for the t Test (Optional)
The power and b values for t tests can be determined by using a set of curves specially constructed for this purpose or by using appropriate software. As with the z
test, the value of b depends not only on the actual value of m but also on the selected significance level a; b increases as a is made smaller. In addition, b depends
on the number of degrees of freedom, n Ϫ 1. For any fixed significance level a, it
should be easier for the test to detect a specific departure from H0 when n is large
than when n is small. This is indeed the case; for a fixed alternative value, b decreases as n Ϫ 1 increases.
Unfortunately, there is one other quantity on which b depends: the population
standard deviation s. As s increases, so does sx. This in turn makes it more likely that
an x value far from m will be observed just by chance, resulting in an incorrect conclusion. Once a is specified and n is fixed, the determination of b at a particular alternative value of m requires that a value of s be chosen, because each different value of s
yields a different value of b. (This did not present a problem with the z test because
when using a z test, the value of s is known.) If the investigator can specify a range of
plausible values for s, then using the largest such value will give a pessimistic b (one
on the high side) and a pessimistic value of power (one on the low side).
Figure 10.7 shows three different b curves for a one-tailed t test (appropriate for
Ha: m Ͼ hypothesized value or for Ha: m Ͻ hypothesized value). A more complete
set of curves for both one- and two-tailed tests when a ϭ .05 and when a ϭ .01 appears in Appendix Table 5. To determine b, first compute the quantity
d5
0 alternative value 2 hypothesized value 0
s
Then locate d on the horizontal axis, move directly up to the curve for n Ϫ 1 df, and
move over to the vertical axis to find b.
β
1.0
α = .01, df = 6
.8
α = .05, df = 6
.6
α = .01, df = 19
.4
Associated
value of β
.2
d
0
FIGURE 10.7
1
2
3
Value of d
b curves for the one-tailed t test.
EXAMPLE 10.19
b and Power for t Tests
Consider testing
H0: m ϭ 100 versus Ha: m Ͼ 100
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
500
Chapter 10 Hypothesis Testing Using a Single Sample
and focus on the alternative value m ϭ 110. Suppose that s ϭ 10, the sample size is
n ϭ 7, and a significance level of .01 has been selected. For s ϭ 10,
d5
0 110 2 100 0
10
5
51
10
10
Figure 10.7 (using df 5 7 2 1 5 6) gives b Ϸ .6. The interpretation is that if s 5
10 and a level .01 test based on n ϭ 7 is used when m ϭ 110 (and thus H0 is false),
roughly 60% of all samples result in an incorrect decision to not reject H0! Equivalently, the power of the test at m ϭ 110 is only 1 Ϫ .6 ϭ .4. The probability of rejecting H0 when m 5 110 is not very large. If a .05 significance level is used instead, then
b Ϸ .3, which is still rather large. Using a .01 significance level with n ϭ 20 (df ϭ
19) yields, from Figure 10.7, b Ϸ .05. At the alternative value m ϭ 110, for s ϭ 10
the level .01 test based on n ϭ 20 has smaller b than the level .05 test with n ϭ 7.
Substantially increasing n counterbalances using the smaller a.
Now consider the alternative m ϭ 105, again with s ϭ 10, so that
d5
0 105 2 100 0
5
5
5 .5
10
10
Then, from Figure 10.7, b 5 .95 when a 5 .01, n ϭ 7; b 5 .7 when a 5 .05,
n ϭ 7; and b ϭ .65 when a ϭ .01, n ϭ 20. These values of b are all quite large; with
s ϭ 10, m ϭ 105 is too close to the hypothesized value of 100 for any of these three
tests to have a good chance of detecting such a departure from H0. A substantial decrease in b would require using a much larger sample size. For example, from Appendix
Table 5, b ϭ .08 when a ϭ .05 and n ϭ 40.
The curves in Figure 10.7 also give b when testing H0: m ϭ 100 versus Ha:
m Ͻ 100. If the alternative value m ϭ 90 is of interest and s ϭ 10,
d5
0 90 2 100 0
10
5
51
10
10
and values of b are the same as those given in the first paragraph of this example.
Because curves for only selected degrees of freedom appear in Appendix Table 5,
other degrees of freedom require a visual approximation. For example, the 27-df
curve (for n ϭ 28) lies between the 19-df and 29-df curves, which do appear, and it
is closer to the 29-df curve. This type of approximation is adequate because it is the
general magnitude of b—large, small, or moderate—that is of primary concern.
Minitab can also evaluate power for the t test. For example, the following output
shows Minitab calculations for power at m ϭ 110 for samples of size 7 and 20 when
a ϭ .01. The corresponding approximate values from Appendix Table 5 found in
Example 10.19 are fairly close to the Minitab values.
1-Sample t Test
Testing mean = null (versus > null)
Calculating power for mean = null + 10
Alpha = 0.01
Sigma = 10
Sample Size
Power
7
0.3968
20
0.9653
The b curves in Appendix Table 5 are those for t tests. When the alternative value in
Ha corresponds to a value of d relatively close to 0, b for a t test may be rather large.
One might wonder whether there is another type of test that has the same level of
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.5 Power and Probability of Type II Error
501
significance a as does the t test and smaller values of b. The following result provides
the answer to this question.
When the population distribution is normal, the t test for testing hypotheses about m
has smaller b than does any other test procedure that has the same level of significance a.
Stated another way, among all tests with level of significance a, the t test makes
b as small as it can possibly be when the population distribution is normal. In this
sense, the t test is a best test. Statisticians have also shown that when the population
distribution is not too far from a normal distribution, no test procedure can improve
on the t test by very much (i.e., no test procedure can have the same a and substantially smaller b). However, when the population distribution is believed to be strongly
nonnormal (heavy-tailed, highly skewed, or multimodal), the t test should not be
used. Then it’s time to consult your friendly neighborhood statistician, who can provide you with alternative methods of analysis.
E X E RC I S E S 1 0 . 5 9 - 1 0.6 5
10.59 The power of a test is influenced by the sample
size and the choice of significance level.
a. Explain how increasing the sample size affects the
power (when significance level is held fixed).
b. Explain how increasing the significance level affects
the power (when sample size is held fixed).
10.60 Water samples are taken from water used for
cooling as it is being discharged from a power plant into
a river. It has been determined that as long as the mean
temperature of the discharged water is at most 150ЊF,
there will be no negative effects on the river’s ecosystem.
To investigate whether the plant is in compliance with
regulations that prohibit a mean discharge water temperature above 150ЊF, a scientist will take 50 water
samples at randomly selected times and will record the
water temperature of each sample. She will then use a z
statistic
Let m denote the true average lifetime (in
hours) for a certain type of battery under controlled laboratory conditions. A test of H0: m ϭ 10 versus Ha:
m Ͻ 10 will be based on a sample of size 36. Suppose
that s is known to be 0.6, from which sx 5 .1. The appropriate test statistic is then
10.61
x 2 150
z5
s
!n
to decide between the hypotheses H0: m ϭ 150 and
Ha: m Ͼ 150, where m is the mean temperature of discharged water. Assume that s is known to be 10.
a. Explain why use of the z statistic is appropriate in
this setting.
Bold exercises answered in back
b. Describe Type I and Type II errors in this context.
c. The rejection of H0 when z Ն 1.8 corresponds to
what value of a? (That is, what is the area under the
z curve to the right of 1.8?)
d. Suppose that the actual value for m is 153 and that
H0 is to be rejected if z Ն 1.8. Draw a sketch (similar
to that of Figure 10.5) of the sampling distribution
of x, and shade the region that would represent b,
the probability of making a Type II error.
e. For the hypotheses and test procedure described,
compute the value of b when m ϭ 153.
f. For the hypotheses and test procedure described,
what is the value of b if m ϭ 160?
g. What would be the conclusion of the test if H0 is
rejected when z Ն 1.8 and x ϭ 152.4? What type of
error might have been made in reaching this conclusion?
Data set available online
z5
x 2 10
0.1
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
502
Chapter 10 Hypothesis Testing Using a Single Sample
a. What is a for the test procedure that rejects H0 if
z Յ Ϫ1.28?
b. If the test procedure of Part (a) is used, calculate b
when m ϭ 9.8, and interpret this error probability.
c. Without doing any calculation, explain how b when
m ϭ 9.5 compares to b when m ϭ 9.8. Then check
your assertion by computing b when m ϭ 9.5.
d. What is the power of the test when m ϭ 9.8? when
m ϭ 9.5?
10.62 The city council in a large city has become concerned about the trend toward exclusion of renters with
children in apartments within the city. The housing coordinator has decided to select a random sample of 125
apartments and determine for each whether children are
permitted. Let p be the proportion of all apartments that
prohibit children. If the city council is convinced that p is
greater than 0.75, it will consider appropriate legislation.
a. If 102 of the 125 sampled apartments exclude renters with children, would a level .05 test lead you to
the conclusion that more than 75% of all apartments
exclude children?
b. What is the power of the test when p ϭ .8 and
a ϭ .05?
10.63 The amount of shaft wear after a fixed mileage
was determined for each of seven randomly selected
internal combustion engines, resulting in a mean of
0.0372 inch and a standard deviation of 0.0125 inch.
a. Assuming that the distribution of shaft wear is normal,
test at level .05 the hypotheses H0: m ϭ .035 versus
Ha: m Ͼ .035.
b. Using s ϭ 0.0125, a ϭ .05, and Appendix Table 5,
what is the approximate value of b, the probability
of a Type II error, when m ϭ .04?
Bold exercises answered in back
10.6
Data set available online
c. What is the approximate power of the test when
m 5 .04 and a ϭ .05?
10.64 Optical fibers are used in telecommunications to
transmit light. Suppose current technology allows production of fibers that transmit light about 50 km. Researchers are trying to develop a new type of glass fiber
that will increase this distance. In evaluating a new fiber,
it is of interest to test H0: m ϭ 50 versus Ha: m Ͼ 50,
with m denoting the mean transmission distance for the
new optical fiber.
a. Assuming s ϭ 10 and n ϭ 10, use Appendix Table
5 to find b, the probability of a Type II error, for
each of the given alternative values of m when a test
with significance level .05 is employed:
i. 52
ii. 55
iii. 60
iv. 70
b. What happens to b in each of the cases in Part (a) if
s is actually larger than 10? Explain your
reasoning.
10.65 Let m denote the mean diameter for bearings of
a certain type. A test of H0: m ϭ 0.5 versus Ha: m ϶ 0.5
will be based on a sample of n bearings. The diameter
distribution is believed to be normal. Determine the
value of b in each of the following cases:
a. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.52
b. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.48
c. n ϭ 15, a ϭ .01, s ϭ 0.02, m ϭ 0.52
d. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.54
e. n ϭ 15, a ϭ .05, s ϭ 0.04, m ϭ 0.54
f. n ϭ 20, a ϭ .05, s ϭ 0.04, m ϭ 0.54
g. Is the way in which b changes as n, a, s, and m vary
consistent with your intuition? Explain.
Video Solution available
Interpreting and Communicating the Results
of Statistical Analyses
The nine-step procedure that we have proposed for testing hypotheses provides a
systematic approach for carrying out a complete test. However, you rarely see the
results of a hypothesis test reported in publications in such a complete way.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.6 Interpreting and Communicating the Results of Statistical Analyses
503
Communicating the Results of Statistical Analyses
When summarizing the results of a hypothesis test, it is important that you include
several things in the summary in order to provide all the relevant information. These
are:
1. Hypotheses. Whether specified in symbols or described in words, it is important
that both the null and the alternative hypotheses be clearly stated. If you are using
symbols to define the hypotheses, be sure to describe them in the context of the
problem at hand (for example, m ϭ population mean calorie intake).
2. Test procedure. You should be clear about what test procedure was used (for example, large-sample z test for proportions) and why you think it was reasonable
to use this procedure. The plausibility of any required assumptions should be
satisfactorily addressed.
3. Test statistic. Be sure to include the value of the test statistic and the P-value.
Including the P-value allows a reader who may have chosen a different significance level to see whether she would have reached the same or a different
conclusion.
4. Conclusion in context. Never end the report of a hypothesis test with the statement
“I rejected (or did not reject) H0.” Always provide a conclusion that is in the
context of the problem and that answers the original research question which the
hypothesis test was designed to answer. Be sure also to indicate the level of significance used as a basis for the decision.
Interpreting the Results of Statistical Analyses
When the results of a hypothesis test are reported in a journal article or other published source, it is common to find only the value of the test statistic and the associated P-value accompanying the discussion of conclusions drawn from the data. Often, especially in newspaper articles, only sample summary statistics are given, with
the conclusion immediately following. You may have to fill in some of the intermediate steps for yourself to see whether or not the conclusion is justified.
For example, the article “Physicians’ Knowledge of Herbal Toxicities and
Adverse Herb-Drug Interactions” (European Journal of Emergency Medicine,
August 2004) summarizes the results of a study to assess doctors’ familiarity with
adverse effects of herbal remedies as follows: “A total of 142 surveys and quizzes were
completed by 59 attending physicians, 57 resident physicians, and 26 medical students. The mean subject score on the quiz was only slightly higher than would have
occurred from random guessing.” The quiz consisted of 16 multiple-choice questions. If each question had four possible choices, the statement that the mean quiz
score was only slightly higher than would have occurred from random guessing suggests that the researchers considered the hypotheses H0: m 5 4 and Ha: m . 4, where
m represents the mean score for the population of all physicians and medical students
and the null hypothesis corresponds to the expected number of correct choices for
someone who is guessing. Assuming that it is reasonable to regard this sample as
representative of the population of interest, the data from the sample could be used
to carry out a test of these hypotheses.
What to Look For in Published Data
Here are some questions to consider when you are reading a report that contains the
results of a hypothesis test:
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
504
Chapter 10
Hypothesis Testing Using a Single Sample
• What hypotheses are being tested? Are the hypotheses about a population mean,
a population proportion, or some other population characteristic?
• Was the appropriate test used? Does the validity of the test depend on any assumptions about the sample or about the population from which the sample was
selected? If so, are the assumptions reasonable?
• What is the P-value associated with the test? Was a significance level reported (as
opposed to simply reporting the P-value)? Is the chosen significance level
reasonable?
• Are the conclusions drawn consistent with the results of the hypothesis test?
For example, consider the following statement from the paper “Didgeridoo Playing as Alternative Treatment for Obstructive Sleep Apnoea Syndrome” (British
Medical Journal [2006]: 266–270): “We found that four months of training of the
upper airways by didgeridoo playing reduces daytime sleepiness in people with snoring and obstructive apnoea syndrome.” This statement was supported by data on a
measure of daytime sleepiness called the Epworth scale. For the 14 participants in the
study, the mean improvement in Epworth scale was 4.4 and the standard deviation
was 3.7. The paper does not indicate what test was performed or what the value of
the test statistic was. It appears that the hypotheses of interest are H0: m ϭ 0 (no
improvement) versus Ha: m Ͼ 0, where m represents the mean improvement in Epworth score after four months of didgeridoo playing for all people with snoring and
obstructive sleep apnoea. Because the sample size is not large, the one-sample t test
would be appropriate if the sample can be considered a random sample and the distribution of Epworth scale improvement scores is approximately normal. If these assumptions are reasonable (something that was not addressed in the paper), the t test
results in t ϭ 4.45 and an associated P-value of .000. Because the reported P-value is
so small H0 would be rejected, supporting the conclusion in the paper that didgeridoo
playing is an effective treatment. (In case you are wondering, a didgeridoo is an Australian Aboriginal woodwind instrument.)
A Word to the Wise: Cautions and Limitations
There are several things you should watch for when conducting a hypothesis test or
when evaluating a written summary of a hypothesis test.
1. The result of a hypothesis test can never show strong support for the null hypothesis. Make sure that you don’t confuse “There is no reason to believe the null
hypothesis is not true” with the statement “There is convincing evidence that the
null hypothesis is true.” These are very different statements!
2. If you have complete information for the population, don’t carry out a hypothesis test! It should be obvious that no test is needed to answer questions about a
population if you have complete information and don’t need to generalize from
a sample, but people sometimes forget this fact. For example, in an article
on growth in the number of prisoners by state, the San Luis Obispo Tribune
(August 13, 2001) reported “California’s numbers showed a statistically insignificant change, with 66 fewer prisoners at the end of 2000.” The use of the term
“statistically insignificant” implies some sort of statistical inference, which is not
appropriate when a complete accounting of the entire prison population is
known. Perhaps the author confused statistical and practical significance. Which
brings us to . . .
3. Don’t confuse statistical significance with practical significance. When statistical
significance has been declared, be sure to step back and evaluate the result in light
of its practical importance. For example, we may be convinced that the proporCopyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.