2: Errors in Hypothesis Testing
Tải bản đầy đủ - 0trang
10.2
Errors in Hypothesis Testing
463
ferent types of errors that might be made when making a decision in a hypothesis
testing problem. One type of error involves rejecting H0 even though the null hypothesis is true. The second type of error results from failing to reject H0 when it is false.
These errors are known as Type I and Type II errors, respectively.
DEFINITION
Type I error: the error of rejecting H0 when H0 is true
Type II error: the error of failing to reject H0 when H0 is false
The only way to guarantee that neither type of error occurs is to base the decision
on a census of the entire population. Risk of error is the price researchers pay for basing the decision on sample data.
EXAMPLE 10.4
On-Time Arrivals
The U.S. Bureau of Transportation Statistics reports that for 2009, 72% of all
domestic passenger flights arrived on time (meaning within 15 minutes of the scheduled arrival). Suppose that an airline with a poor on-time record decides to offer its
employees a bonus if, in an upcoming month, the airline’s proportion of on-time
flights exceeds the overall 2009 industry rate of .72. Let p be the actual proportion of
the airline’s flights that are on time during the month of interest. A random sample
of flights might be selected and used as a basis for choosing between
H0: p ϭ .72 and Ha: p Ͼ .72
In this context, a Type I error (rejecting a true H0) results in the airline rewarding its
employees when in fact the actual proportion of on-time flights did not exceed .72.
A Type II error (not rejecting a false H0) results in the airline employees not receiving
a reward that they deserved.
EXAMPLE 10.5
Slowing the Growth of Tumors
In 2004, Vertex Pharmaceuticals, a biotechnology company, issued a press release
announcing that it had filed an application with the Food and Drug Administration
to begin clinical trials of an experimental drug VX-680 that had been found to reduce
the growth rate of pancreatic and colon cancer tumors in animal studies (New York
Times, February 24, 2004).
Let m denote the true mean growth rate of tumors for patients receiving the experimental drug. Data resulting from the planned clinical trials can be used to test
H0: m ϭ mean growth rate of tumors for patients not taking the experimental drug
versus
Ha: m Ͻ mean growth rate of tumors for patients not taking the experimental drug
The null hypothesis states that the experimental drug is not effective—that the mean
growth rate of tumors for patients receiving the experimental drug is the same as for
patients who do not take the experimental drug. The alternative hypothesis states that
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
464
Chapter 10
Hypothesis Testing Using a Single Sample
the experimental drug is effective in reducing the mean growth rate of tumors. In this
context, a Type I error consists of incorrectly concluding that the experimental drug
is effective in slowing the growth rate of tumors. A potential consequence of making
a Type I error would be that the company would continue to devote resources to the
development of the drug when it really is not effective. A Type II error consists of
concluding that the experimental drug is ineffective when in fact the mean growth
rate of tumors is reduced. A potential consequence of making a Type II error is that
the company might abandon development of a drug that was effective.
Examples 10.4 and 10.5 illustrate the two different types of error that might occur when testing hypotheses. Type I and Type II errors—and the associated consequences of making such errors—are quite different. The accompanying box introduces the terminology and notation used to describe error probabilities.
DEFINITION
The probability of a Type I error is denoted by a and is called the significance
level of the test. For example, a test with a ϭ .01 is said to have a significance
level of .01.
The probability of a Type II error is denoted by b.
E X A M P L E 1 0 . 6 Blood Test for Ovarian Cancer
Women with ovarian cancer usually are not diagnosed until the disease is in an advanced stage, when it is most difficult to treat. The paper “Diagnostic Markers for
Early Detection of Ovarian Cancer” (Clinical Cancer Research [2008]: 1065–1072)
describes a new approach to diagnosing ovarian cancer that is based on using six different blood biomarkers (a blood biomarker is a biochemical characteristic that is
measured in laboratory testing). The authors report the following results using the six
biomarkers:
• For 156 women known to have ovarian cancer, the biomarkers correctly identi-
fied 151 as having ovarian cancer.
• For 362 women known not to have ovarian cancer, the biomarkers correctly
identified 360 of them as being ovarian cancer free.
We can think of using this blood test to choose between two hypotheses:
H0: woman has ovarian cancer
Ha: woman does not have ovarian cancer
Note that although these are not “statistical hypotheses” (statements about a population
characteristic), the possible decision errors are analogous to Type I and Type II errors.
In this situation, believing that a woman with ovarian cancer is cancer free would
be a Type I error—rejecting the hypothesis of ovarian cancer when it is in fact true.
Believing that a woman who is actually cancer free does have ovarian cancer is a
Type II error—not rejecting the null hypothesis when it is in fact false. Based on the
study results, we can estimate the error probabilities. The probability of a Type I error, a, is approximately 5/156 ϭ .032. The probability of a Type II error, b, is approximately 2/363 ϭ .006.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.2
Errors in Hypothesis Testing
465
The ideal test procedure would result in both a ϭ 0 and b ϭ 0. However, if we
must base our decision on incomplete information—a sample rather than a census—
it is impossible to achieve this ideal. The standard test procedures allow us to control
a, but they provide no direct control over b. Because a represents the probability
of rejecting a true null hypothesis, selecting a significance level a ϭ .05 results in a
test procedure that, used over and over with different samples, rejects a true H0 about
5 times in 100. Selecting a ϭ .01 results in a test procedure with a Type I error rate
of 1% in long-term repeated use. Choosing a small value for a implies that the user
wants a procedure for which the risk of a Type I error is quite small.
One question arises naturally at this point: If we can select a, the probability of
making a Type I error, why would we ever select a ϭ .05 rather than a ϭ .01? Why
not always select a very small value for a? To achieve a small probability of making a
Type I error, we would need the corresponding test procedure to require the evidence
against H0 to be very strong before the null hypothesis can be rejected. Although this
makes a Type I error unlikely, it increases the risk of a Type II error (not rejecting H0
when it should have been rejected). Frequently the investigator must balance the
consequences of Type I and Type II errors. If a Type II error has serious consequences, it may be a good idea to select a somewhat larger value for a.
In general, there is a compromise between small a and small b, leading to the
following widely accepted principle for specifying a test procedure.
After assessing the consequences of Type I and Type II errors, identify the largest a
that is tolerable for the problem. Then employ a test procedure that uses this maximum acceptable value—rather than anything smaller—as the level of significance
(because using a smaller a increases b). In other words, don’t choose a to be smaller
than it needs to be.
EXAMPLE 10.7
Lead in Tap Water
The Environmental Protection Agency (EPA) has adopted what is known as the Lead
and Copper Rule, which defines drinking water as unsafe if the concentration of lead
is 15 parts per billion (ppb) or greater or if the concentration of copper is 1.3 parts
per million (ppm) or greater. With m denoting the mean concentration of lead, the
manager of a community water system might use lead level measurements from a
sample of water specimens to test
H0: m ϭ 15 versus Ha: m Ͼ 15
The null hypothesis (which also implicitly includes the m Ͼ 15 case) states that the
mean lead concentration is excessive by EPA standards. The alternative hypothesis
states that the mean lead concentration is at an acceptable level and that the water
system meets EPA standards for lead.
In this context, a Type I error leads to the conclusion that a water source meets
EPA standards for lead when in fact it does not. Possible consequences of this type of
error include health risks associated with excessive lead consumption (for example,
increased blood pressure, hearing loss, and, in severe cases, anemia and kidney damage). A Type II error is to conclude that the water does not meet EPA standards for
lead when in fact it actually does. Possible consequences of a Type II error include
elimination of a community water source. Because a Type I error might result in
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
466
Chapter 10 Hypothesis Testing Using a Single Sample
potentially serious public health risks, a small value of a (Type I error probability),
such as a ϭ .01, could be selected. Of course, selecting a small value for a increases
the risk of a Type II error. If the community has only one water source, a Type II
error could also have very serious consequences for the community, and we might
want to rethink the choice of a.
EX E RC I S E S 1 0 . 1 2 - 1 0 . 2 2
10.12 Researchers at the University of Washington and
Harvard University analyzed records of breast cancer
screening and diagnostic evaluations (“Mammogram
Cancer Scares More Frequent than Thought,” USA
Today, April 16, 1998). Discussing the benefits and
downsides of the screening process, the article states that,
although the rate of false-positives is higher than previously thought, if radiologists were less aggressive in following up on suspicious tests, the rate of false-positives
would fall but the rate of missed cancers would rise. Suppose that such a screening test is used to decide between
a null hypothesis of H0: no cancer is present and an alternative hypothesis of Ha: cancer is present. (Although
these are not hypotheses about a population characteristic, this exercise illustrates the definitions of Type I and
Type II errors.)
a. Would a false-positive (thinking that cancer is present when in fact it is not) be a Type I error or a Type
II error?
b. Describe a Type I error in the context of this problem, and discuss the consequences of making a Type
I error.
c. Describe a Type II error in the context of this problem, and discuss the consequences of making a Type
II error.
d. What aspect of the relationship between the probability of Type I and Type II errors is being described
by the statement in the article that if radiologists
were less aggressive in following up on suspicious
tests, the rate of false-positives would fall but the rate
of missed cancers would rise?
10.13 The paper “MRI Evaluation of the Contralateral Breast in Women with Recently Diagnosed
Breast Cancer” (New England Journal of Medicine
[2007]: 1295–1303) describes a study of the use of MRI
(Magnetic Resonance Imaging) exams in the diagnosis of
breast cancer. The purpose of the study was to determine
if MRI exams do a better job than mammograms of deBold exercises answered in back
Data set available online
termining if women who have recently been diagnosed
with cancer in one breast have cancer in the other breast.
The study participants were 969 women who had been
diagnosed with cancer in one breast and for whom a
mammogram did not detect cancer in the other breast.
These women had an MRI exam of the other breast, and
121 of those exams indicated possible cancer. After undergoing biopsies, it was determined that 30 of the 121
did in fact have cancer in the other breast, whereas 91
did not. The women were all followed for one year, and
three of the women for whom the MRI exam did not
indicate cancer in the other breast were subsequently
diagnosed with cancer that the MRI did not detect. The
accompanying table summarizes this information.
Cancer Cancer Not
Present
Present
Total
MRI Positive for Cancer
MRI Negative for Cancer
Total
30
3
33
91
845
936
121
848
969
Suppose that for women recently diagnosed with cancer
in only one breast, the MRI is used to decide between
the two “hypotheses”
H0: woman has cancer in the other breast
Ha: woman does not have cancer in the other breast
(Although these are not hypotheses about a population
characteristic, this exercise illustrates the definitions of
Type I and Type II errors.)
a. One possible error would be deciding that a woman
who does have cancer in the other breast is cancerfree. Is this a Type I or a Type II error? Use the information in the table to approximate the probability of this type of error.
b. There is a second type of error that is possible in this
setting. Describe this error and use the information
in the given table to approximate the probability of
this type of error.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.2
10.14 Medical personnel are required to report suspected cases of child abuse. Because some diseases have
symptoms that mimic those of child abuse, doctors who
see a child with these symptoms must decide between
two competing hypotheses:
H0: symptoms are due to child abuse
Ha: symptoms are due to disease
(Although these are not hypotheses about a population
characteristic, this exercise illustrates the definitions of
Type I and Type II errors.) The article “Blurred Line Be-
tween Illness, Abuse Creates Problem for Authorities”
(Macon Telegraph, February 28, 2000) included the
following quote from a doctor in Atlanta regarding the
consequences of making an incorrect decision: “If it’s disease, the worst you have is an angry family. If it is abuse,
the other kids (in the family) are in deadly danger.”
a. For the given hypotheses, describe Type I and Type
II errors.
b. Based on the quote regarding consequences of the
two kinds of error, which type of error does the doctor quoted consider more serious? Explain.
10.15 Ann Landers, in her advice column of October 24, 1994 (San Luis Obispo Telegram-Tribune), described the reliability of DNA paternity testing as follows: “To get a completely accurate result, you would
have to be tested, and so would (the man) and your
mother. The test is 100% accurate if the man is not the
father and 99.9% accurate if he is.”
a. Consider using the results of DNA paternity testing
to decide between the following two hypotheses:
H0: a particular man is the father
Ha: a particular man is not the father
In the context of this problem, describe Type I and
Type II errors. (Although these are not hypotheses
about a population characteristic, this exercise illustrates the definitions of Type I and Type II errors.)
b. Based on the information given, what are the values
of a, the probability of a Type I error, and b, the
probability of a Type II error?
c. Ann Landers also stated, “If the mother is not tested,
there is a 0.8% chance of a false positive.” For the
hypotheses given in Part (a), what is the value of b if
the decision is based on DNA testing in which the
mother is not tested?
10.16 A television manufacturer claims that (at least)
Errors in Hypothesis Testing
467
3 years of operation. A consumer agency wishes to check
this claim, so it obtains a random sample of n ϭ 100
purchasers and asks each whether the set purchased
needed repair during the first 3 years after purchase. Let
p^ be the sample proportion of responses indicating no
repair (so that no repair is identified with a success). Let
p denote the actual proportion of successes for all sets
made by this manufacturer. The agency does not want to
claim false advertising unless sample evidence strongly
suggests that p Ͻ .9. The appropriate hypotheses are
then H0: p ϭ .9 versus Ha: p Ͻ .9.
a. In the context of this problem, describe Type I and
Type II errors, and discuss the possible consequences
of each.
b. Would you recommend a test procedure that uses
a ϭ .10 or one that uses a ϭ .01? Explain.
10.17 A manufacturer of hand-held calculators receives
large shipments of printed circuits from a supplier. It is
too costly and time-consuming to inspect all incoming
circuits, so when each shipment arrives, a sample is selected for inspection. Information from the sample is
then used to test H0: p ϭ .01 versus Ha: p Ͼ .01, where
p is the actual proportion of defective circuits in the shipment. If the null hypothesis is not rejected, the shipment
is accepted, and the circuits are used in the production of
calculators. If the null hypothesis is rejected, the entire
shipment is returned to the supplier because of inferior
quality. (A shipment is defined to be of inferior quality
if it contains more than 1% defective circuits.)
a. In this context, define Type I and Type II errors.
b. From the calculator manufacturer’s point of view,
which type of error is considered more serious?
c. From the printed circuit supplier’s point of view,
which type of error is considered more serious?
10.18 Water samples are taken from water used for
cooling as it is being discharged from a power plant into
a river. It has been determined that as long as the mean
temperature of the discharged water is at most 150ЊF,
there will be no negative effects on the river’s ecosystem.
To investigate whether the plant is in compliance with
regulations that prohibit a mean discharge water temperature above 150ЊF, researchers will take 50 water
samples at randomly selected times and record the temperature of each sample. The resulting data will be used
to test the hypotheses H0: m ϭ 150ЊF versus Ha: m Ͼ
150ЊF. In the context of this example, describe Type I
and Type II errors. Which type of error would you consider more serious? Explain.
90% of its TV sets will need no service during the first
Bold exercises answered in back
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
468
Chapter 10
Hypothesis Testing Using a Single Sample
Occasionally, warning flares of the type contained in most automobile emergency kits fail to ignite.
A consumer advocacy group wants to investigate a
claim against a manufacturer of flares brought by a person who claims that the proportion of defective flares is
much higher than the value of .1 claimed by the manufacturer. A large number of flares will be tested, and the
results will be used to decide between H0: p ϭ .1 and
Ha: p Ͼ .1, where p represents the proportion of defective flares made by this manufacturer. If H0 is rejected,
charges of false advertising will be filed against the
manufacturer.
a. Explain why the alternative hypothesis was chosen to
be Ha: p Ͼ .1.
b. In this context, describe Type I and Type II errors,
and discuss the consequences of each.
10.19
10.20 Suppose that you are an inspector for the Fish
and Game Department and that you are given the task
of determining whether to prohibit fishing along part of
the Oregon coast. You will close an area to fishing if it is
determined that fish in that region have an unacceptably
high mercury content.
a. Assuming that a mercury concentration of 5 ppm is
considered the maximum safe concentration, which
of the following pairs of hypotheses would you test:
H0: m 5 5 versus Ha: m . 5
or
H0: m 5 5 versus Ha: m , 5
ing evidence of any increased risk of death from any of the
cancers surveyed due to living near nuclear facilities. However, no study can prove the absence of an effect.”
a. Let p denote the proportion of the population in
areas near nuclear power plants who die of cancer
during a given year. The researchers at the Cancer
Institute might have considered the two rival hypotheses of the form
H0: p ϭ value for areas without nuclear facilities
Ha: p Ͼ value for areas without nuclear facilities
Did the researchers reject H0 or fail to reject H0?
b. If the Cancer Institute researchers were incorrect
in their conclusion that there is no increased cancer
risk associated with living near a nuclear power
plant, are they making a Type I or a Type II error?
Explain.
c. Comment on the spokesperson’s last statement that
no study can prove the absence of an effect. Do you
agree with this statement?
10.22 An automobile manufacturer is considering using robots for part of its assembly process. Converting to
robots is an expensive process, so it will be undertaken
only if there is strong evidence that the proportion of
defective installations is lower for the robots than for human assemblers. Let p denote the proportion of defective
installations for the robots. It is known that human assemblers have a defect proportion of .02.
a. Which of the following pairs of hypotheses should
the manufacturer test:
Give the reasons for your choice.
b. Would you prefer a significance level of .1 or .01 for
your test? Explain.
H0: p 5 .02 versus Ha: p , .02
10.21 The National Cancer Institute conducted a 2-year
H0: p 5 .02 versus Ha: p . .02
study to determine whether cancer death rates for areas
near nuclear power plants are higher than for areas without
nuclear facilities (San Luis Obispo Telegram-Tribune,
September 17, 1990). A spokesperson for the Cancer Institute said, “From the data at hand, there was no convincBold exercises answered in back
10.3
Data set available online
or
Explain your answer.
b. In the context of this exercise, describe Type I and
Type II errors.
c. Would you prefer a test with a ϭ .01 or a ϭ .1?
Explain your reasoning.
Video Solution available
Large-Sample Hypothesis Tests
for a Population Proportion
Now that the basic concepts of hypothesis testing have been introduced, we are ready
to turn our attention to the development of procedures for using sample information
to decide between a null and an alternative hypothesis. There are two possible conclusions: We either reject H0 or we fail to reject H0. The fundamental idea behind
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
10.3
Large-Sample Hypothesis Tests for a Population Proportion
469
hypothesis-testing procedures is this: We reject the null hypothesis if the observed sample
is very unlikely to have occurred when H0 is true.
In this section, we consider testing hypotheses about a population proportion
when the sample size n is large. Let p denote the proportion of individuals or objects
in a specified population that possess a certain property. A random sample of n individuals or objects is selected from the population. The sample proportion
p^ 5
number in the sample that possess the property
n
is the natural statistic for making inferences about p.
The large-sample test procedure is based on the same properties of the sampling
distribution of p^ that were used previously to obtain a confidence interval for p,
namely:
1. m p^ 5 p
2. s p^ 5
p 11 2 p2
n
Å
3. When n is large, the sampling distribution of p^ is approximately normal.
These three results imply that the standardized variable
z5
p^ 2 p
p 11 2 p2
n
Å
has approximately a standard normal distribution when n is large. Example 10.8
shows how this information allows us to make a decision.
C. Sherburne/PhotoDisc/Getty Images
E X A M P L E 1 0 . 8 Impact of Food Labels
In June 2006, an Associated Press survey was conducted to investigate how people
use the nutritional information provided on food package labels. Interviews were
conducted with 1003 randomly selected adult Americans, and each participant was
asked a series of questions, including the following two:
Question 1: When purchasing packaged food, how often do you check the nutrition labeling on the package?
Question 2: How often do you purchase foods that are bad for you, even after
you’ve checked the nutrition labels?
It was reported that 582 responded “frequently” to the question about checking labels
and 441 responded very often or somewhat often to the question about purchasing
“bad” foods even after checking the label.
Let’s start by looking at the responses to the first question. Based on these data,
is it reasonable to conclude that a majority of adult Americans frequently check the
nutritional labels when purchasing packaged foods? We can answer this question by
testing hypotheses, where
p ϭ true proportion of adult Americans who frequently check nutritional labels
H0: p ϭ .5
Ha: p Ͼ .5 (The proportion of adult Americans who frequently check nutritional labels is greater than .5. That is, more than half (a majority) frequently check nutritional labels.)
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.