Tải bản đầy đủ - 0 (trang)
2: Errors in Hypothesis Testing

2: Errors in Hypothesis Testing

Tải bản đầy đủ - 0trang

10.2

Errors in Hypothesis Testing

463

ferent types of errors that might be made when making a decision in a hypothesis

testing problem. One type of error involves rejecting H0 even though the null hypothesis is true. The second type of error results from failing to reject H0 when it is false.

These errors are known as Type I and Type II errors, respectively.

DEFINITION

Type I error: the error of rejecting H0 when H0 is true

Type II error: the error of failing to reject H0 when H0 is false

The only way to guarantee that neither type of error occurs is to base the decision

on a census of the entire population. Risk of error is the price researchers pay for basing the decision on sample data.

EXAMPLE 10.4

On-Time Arrivals

The U.S. Bureau of Transportation Statistics reports that for 2009, 72% of all

domestic passenger flights arrived on time (meaning within 15 minutes of the scheduled arrival). Suppose that an airline with a poor on-time record decides to offer its

employees a bonus if, in an upcoming month, the airline’s proportion of on-time

flights exceeds the overall 2009 industry rate of .72. Let p be the actual proportion of

the airline’s flights that are on time during the month of interest. A random sample

of flights might be selected and used as a basis for choosing between

H0: p ϭ .72 and Ha: p Ͼ .72

In this context, a Type I error (rejecting a true H0) results in the airline rewarding its

employees when in fact the actual proportion of on-time flights did not exceed .72.

A Type II error (not rejecting a false H0) results in the airline employees not receiving

a reward that they deserved.

EXAMPLE 10.5

Slowing the Growth of Tumors

In 2004, Vertex Pharmaceuticals, a biotechnology company, issued a press release

announcing that it had filed an application with the Food and Drug Administration

to begin clinical trials of an experimental drug VX-680 that had been found to reduce

the growth rate of pancreatic and colon cancer tumors in animal studies (New York

Times, February 24, 2004).

Let m denote the true mean growth rate of tumors for patients receiving the experimental drug. Data resulting from the planned clinical trials can be used to test

H0: m ϭ mean growth rate of tumors for patients not taking the experimental drug

versus

Ha: m Ͻ mean growth rate of tumors for patients not taking the experimental drug

The null hypothesis states that the experimental drug is not effective—that the mean

growth rate of tumors for patients receiving the experimental drug is the same as for

patients who do not take the experimental drug. The alternative hypothesis states that

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

464

Chapter 10

Hypothesis Testing Using a Single Sample

the experimental drug is effective in reducing the mean growth rate of tumors. In this

context, a Type I error consists of incorrectly concluding that the experimental drug

is effective in slowing the growth rate of tumors. A potential consequence of making

a Type I error would be that the company would continue to devote resources to the

development of the drug when it really is not effective. A Type II error consists of

concluding that the experimental drug is ineffective when in fact the mean growth

rate of tumors is reduced. A potential consequence of making a Type II error is that

the company might abandon development of a drug that was effective.

Examples 10.4 and 10.5 illustrate the two different types of error that might occur when testing hypotheses. Type I and Type II errors—and the associated consequences of making such errors—are quite different. The accompanying box introduces the terminology and notation used to describe error probabilities.

DEFINITION

The probability of a Type I error is denoted by a and is called the significance

level of the test. For example, a test with a ϭ .01 is said to have a significance

level of .01.

The probability of a Type II error is denoted by b.

E X A M P L E 1 0 . 6 Blood Test for Ovarian Cancer

Women with ovarian cancer usually are not diagnosed until the disease is in an advanced stage, when it is most difficult to treat. The paper “Diagnostic Markers for

Early Detection of Ovarian Cancer” (Clinical Cancer Research [2008]: 1065–1072)

describes a new approach to diagnosing ovarian cancer that is based on using six different blood biomarkers (a blood biomarker is a biochemical characteristic that is

measured in laboratory testing). The authors report the following results using the six

biomarkers:

• For 156 women known to have ovarian cancer, the biomarkers correctly identi-

fied 151 as having ovarian cancer.

• For 362 women known not to have ovarian cancer, the biomarkers correctly

identified 360 of them as being ovarian cancer free.

We can think of using this blood test to choose between two hypotheses:

H0: woman has ovarian cancer

Ha: woman does not have ovarian cancer

Note that although these are not “statistical hypotheses” (statements about a population

characteristic), the possible decision errors are analogous to Type I and Type II errors.

In this situation, believing that a woman with ovarian cancer is cancer free would

be a Type I error—rejecting the hypothesis of ovarian cancer when it is in fact true.

Believing that a woman who is actually cancer free does have ovarian cancer is a

Type II error—not rejecting the null hypothesis when it is in fact false. Based on the

study results, we can estimate the error probabilities. The probability of a Type I error, a, is approximately 5/156 ϭ .032. The probability of a Type II error, b, is approximately 2/363 ϭ .006.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.2

Errors in Hypothesis Testing

465

The ideal test procedure would result in both a ϭ 0 and b ϭ 0. However, if we

must base our decision on incomplete information—a sample rather than a census—

it is impossible to achieve this ideal. The standard test procedures allow us to control

a, but they provide no direct control over b. Because a represents the probability

of rejecting a true null hypothesis, selecting a significance level a ϭ .05 results in a

test procedure that, used over and over with different samples, rejects a true H0 about

5 times in 100. Selecting a ϭ .01 results in a test procedure with a Type I error rate

of 1% in long-term repeated use. Choosing a small value for a implies that the user

wants a procedure for which the risk of a Type I error is quite small.

One question arises naturally at this point: If we can select a, the probability of

making a Type I error, why would we ever select a ϭ .05 rather than a ϭ .01? Why

not always select a very small value for a? To achieve a small probability of making a

Type I error, we would need the corresponding test procedure to require the evidence

against H0 to be very strong before the null hypothesis can be rejected. Although this

makes a Type I error unlikely, it increases the risk of a Type II error (not rejecting H0

when it should have been rejected). Frequently the investigator must balance the

consequences of Type I and Type II errors. If a Type II error has serious consequences, it may be a good idea to select a somewhat larger value for a.

In general, there is a compromise between small a and small b, leading to the

following widely accepted principle for specifying a test procedure.

After assessing the consequences of Type I and Type II errors, identify the largest a

that is tolerable for the problem. Then employ a test procedure that uses this maximum acceptable value—rather than anything smaller—as the level of significance

(because using a smaller a increases b). In other words, don’t choose a to be smaller

than it needs to be.

EXAMPLE 10.7

The Environmental Protection Agency (EPA) has adopted what is known as the Lead

and Copper Rule, which defines drinking water as unsafe if the concentration of lead

is 15 parts per billion (ppb) or greater or if the concentration of copper is 1.3 parts

per million (ppm) or greater. With m denoting the mean concentration of lead, the

manager of a community water system might use lead level measurements from a

sample of water specimens to test

H0: m ϭ 15 versus Ha: m Ͼ 15

The null hypothesis (which also implicitly includes the m Ͼ 15 case) states that the

mean lead concentration is excessive by EPA standards. The alternative hypothesis

states that the mean lead concentration is at an acceptable level and that the water

system meets EPA standards for lead.

In this context, a Type I error leads to the conclusion that a water source meets

EPA standards for lead when in fact it does not. Possible consequences of this type of

error include health risks associated with excessive lead consumption (for example,

increased blood pressure, hearing loss, and, in severe cases, anemia and kidney damage). A Type II error is to conclude that the water does not meet EPA standards for

lead when in fact it actually does. Possible consequences of a Type II error include

elimination of a community water source. Because a Type I error might result in

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

466

Chapter 10 Hypothesis Testing Using a Single Sample

potentially serious public health risks, a small value of a (Type I error probability),

such as a ϭ .01, could be selected. Of course, selecting a small value for a increases

the risk of a Type II error. If the community has only one water source, a Type II

error could also have very serious consequences for the community, and we might

want to rethink the choice of a.

EX E RC I S E S 1 0 . 1 2 - 1 0 . 2 2

10.12 Researchers at the University of Washington and

Harvard University analyzed records of breast cancer

screening and diagnostic evaluations (“Mammogram

Cancer Scares More Frequent than Thought,” USA

Today, April 16, 1998). Discussing the benefits and

downsides of the screening process, the article states that,

although the rate of false-positives is higher than previously thought, if radiologists were less aggressive in following up on suspicious tests, the rate of false-positives

would fall but the rate of missed cancers would rise. Suppose that such a screening test is used to decide between

a null hypothesis of H0: no cancer is present and an alternative hypothesis of Ha: cancer is present. (Although

these are not hypotheses about a population characteristic, this exercise illustrates the definitions of Type I and

Type II errors.)

a. Would a false-positive (thinking that cancer is present when in fact it is not) be a Type I error or a Type

II error?

b. Describe a Type I error in the context of this problem, and discuss the consequences of making a Type

I error.

c. Describe a Type II error in the context of this problem, and discuss the consequences of making a Type

II error.

d. What aspect of the relationship between the probability of Type I and Type II errors is being described

by the statement in the article that if radiologists

were less aggressive in following up on suspicious

tests, the rate of false-positives would fall but the rate

of missed cancers would rise?

10.13 The paper “MRI Evaluation of the Contralateral Breast in Women with Recently Diagnosed

Breast Cancer” (New England Journal of Medicine

[2007]: 1295–1303) describes a study of the use of MRI

(Magnetic Resonance Imaging) exams in the diagnosis of

breast cancer. The purpose of the study was to determine

if MRI exams do a better job than mammograms of deBold exercises answered in back

Data set available online

termining if women who have recently been diagnosed

with cancer in one breast have cancer in the other breast.

The study participants were 969 women who had been

diagnosed with cancer in one breast and for whom a

mammogram did not detect cancer in the other breast.

These women had an MRI exam of the other breast, and

121 of those exams indicated possible cancer. After undergoing biopsies, it was determined that 30 of the 121

did in fact have cancer in the other breast, whereas 91

did not. The women were all followed for one year, and

three of the women for whom the MRI exam did not

indicate cancer in the other breast were subsequently

diagnosed with cancer that the MRI did not detect. The

accompanying table summarizes this information.

Cancer Cancer Not

Present

Present

Total

MRI Positive for Cancer

MRI Negative for Cancer

Total

30

3

33

91

845

936

121

848

969

Suppose that for women recently diagnosed with cancer

in only one breast, the MRI is used to decide between

the two “hypotheses”

H0: woman has cancer in the other breast

Ha: woman does not have cancer in the other breast

(Although these are not hypotheses about a population

characteristic, this exercise illustrates the definitions of

Type I and Type II errors.)

a. One possible error would be deciding that a woman

who does have cancer in the other breast is cancerfree. Is this a Type I or a Type II error? Use the information in the table to approximate the probability of this type of error.

b. There is a second type of error that is possible in this

setting. Describe this error and use the information

in the given table to approximate the probability of

this type of error.

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.2

10.14 Medical personnel are required to report suspected cases of child abuse. Because some diseases have

symptoms that mimic those of child abuse, doctors who

see a child with these symptoms must decide between

two competing hypotheses:

H0: symptoms are due to child abuse

Ha: symptoms are due to disease

(Although these are not hypotheses about a population

characteristic, this exercise illustrates the definitions of

Type I and Type II errors.) The article “Blurred Line Be-

tween Illness, Abuse Creates Problem for Authorities”

(Macon Telegraph, February 28, 2000) included the

following quote from a doctor in Atlanta regarding the

consequences of making an incorrect decision: “If it’s disease, the worst you have is an angry family. If it is abuse,

the other kids (in the family) are in deadly danger.”

a. For the given hypotheses, describe Type I and Type

II errors.

b. Based on the quote regarding consequences of the

two kinds of error, which type of error does the doctor quoted consider more serious? Explain.

10.15 Ann Landers, in her advice column of October 24, 1994 (San Luis Obispo Telegram-Tribune), described the reliability of DNA paternity testing as follows: “To get a completely accurate result, you would

have to be tested, and so would (the man) and your

mother. The test is 100% accurate if the man is not the

father and 99.9% accurate if he is.”

a. Consider using the results of DNA paternity testing

to decide between the following two hypotheses:

H0: a particular man is the father

Ha: a particular man is not the father

In the context of this problem, describe Type I and

Type II errors. (Although these are not hypotheses

about a population characteristic, this exercise illustrates the definitions of Type I and Type II errors.)

b. Based on the information given, what are the values

of a, the probability of a Type I error, and b, the

probability of a Type II error?

c. Ann Landers also stated, “If the mother is not tested,

there is a 0.8% chance of a false positive.” For the

hypotheses given in Part (a), what is the value of b if

the decision is based on DNA testing in which the

mother is not tested?

10.16 A television manufacturer claims that (at least)

Errors in Hypothesis Testing

467

3 years of operation. A consumer agency wishes to check

this claim, so it obtains a random sample of n ϭ 100

purchasers and asks each whether the set purchased

needed repair during the first 3 years after purchase. Let

p^ be the sample proportion of responses indicating no

repair (so that no repair is identified with a success). Let

p denote the actual proportion of successes for all sets

made by this manufacturer. The agency does not want to

claim false advertising unless sample evidence strongly

suggests that p  Ͻ .9. The appropriate hypotheses are

then H0: p ϭ .9 versus Ha: p Ͻ .9.

a. In the context of this problem, describe Type I and

Type II errors, and discuss the possible consequences

of each.

b. Would you recommend a test procedure that uses

a ϭ .10 or one that uses a ϭ .01? Explain.

10.17 A manufacturer of hand-held calculators receives

large shipments of printed circuits from a supplier. It is

too costly and time-consuming to inspect all incoming

circuits, so when each shipment arrives, a sample is selected for inspection. Information from the sample is

then used to test H0: p ϭ .01 versus Ha: p Ͼ .01, where

p is the actual proportion of defective circuits in the shipment. If the null hypothesis is not rejected, the shipment

is accepted, and the circuits are used in the production of

calculators. If the null hypothesis is rejected, the entire

shipment is returned to the supplier because of inferior

quality. (A shipment is defined to be of inferior quality

if it contains more than 1% defective circuits.)

a. In this context, define Type I and Type II errors.

b. From the calculator manufacturer’s point of view,

which type of error is considered more serious?

c. From the printed circuit supplier’s point of view,

which type of error is considered more serious?

10.18 Water samples are taken from water used for

cooling as it is being discharged from a power plant into

a river. It has been determined that as long as the mean

temperature of the discharged water is at most 150ЊF,

there will be no negative effects on the river’s ecosystem.

To investigate whether the plant is in compliance with

regulations that prohibit a mean discharge water temperature above 150ЊF, researchers will take 50 water

samples at randomly selected times and record the temperature of each sample. The resulting data will be used

to test the hypotheses H0: m ϭ 150ЊF versus Ha: m Ͼ

150ЊF. In the context of this example, describe Type I

and Type II errors. Which type of error would you consider more serious? Explain.

90% of its TV sets will need no service during the first

Data set available online

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

468

Chapter 10

Hypothesis Testing Using a Single Sample

Occasionally, warning flares of the type contained in most automobile emergency kits fail to ignite.

A consumer advocacy group wants to investigate a

claim against a manufacturer of flares brought by a person who claims that the proportion of defective flares is

much higher than the value of .1 claimed by the manufacturer. A large number of flares will be tested, and the

results will be used to decide between H0: p ϭ .1 and

Ha: p Ͼ .1, where p represents the proportion of defective flares made by this manufacturer. If H0 is rejected,

charges of false advertising will be filed against the

manufacturer.

a. Explain why the alternative hypothesis was chosen to

be Ha: p Ͼ .1.

b. In this context, describe Type I and Type II errors,

and discuss the consequences of each.

10.19

10.20 Suppose that you are an inspector for the Fish

and Game Department and that you are given the task

of determining whether to prohibit fishing along part of

the Oregon coast. You will close an area to fishing if it is

determined that fish in that region have an unacceptably

high mercury content.

a. Assuming that a mercury concentration of 5 ppm is

considered the maximum safe concentration, which

of the following pairs of hypotheses would you test:

H0: m 5 5 versus Ha: m . 5

or

H0: m 5 5 versus Ha: m , 5

ing evidence of any increased risk of death from any of the

cancers surveyed due to living near nuclear facilities. However, no study can prove the absence of an effect.”

a. Let p denote the proportion of the population in

areas near nuclear power plants who die of cancer

during a given year. The researchers at the Cancer

Institute might have considered the two rival hypotheses of the form

H0: p ϭ value for areas without nuclear facilities

Ha: p Ͼ value for areas without nuclear facilities

Did the researchers reject H0 or fail to reject H0?

b. If the Cancer Institute researchers were incorrect

in their conclusion that there is no increased cancer

risk associated with living near a nuclear power

plant, are they making a Type I or a Type II error?

Explain.

c. Comment on the spokesperson’s last statement that

no study can prove the absence of an effect. Do you

agree with this statement?

10.22 An automobile manufacturer is considering using robots for part of its assembly process. Converting to

robots is an expensive process, so it will be undertaken

only if there is strong evidence that the proportion of

defective installations is lower for the robots than for human assemblers. Let p denote the proportion of defective

installations for the robots. It is known that human assemblers have a defect proportion of .02.

a. Which of the following pairs of hypotheses should

the manufacturer test:

Give the reasons for your choice.

b. Would you prefer a significance level of .1 or .01 for

H0: p 5 .02 versus Ha: p , .02

10.21 The National Cancer Institute conducted a 2-year

H0: p 5 .02 versus Ha: p . .02

study to determine whether cancer death rates for areas

near nuclear power plants are higher than for areas without

nuclear facilities (San Luis Obispo Telegram-Tribune,

September 17, 1990). A spokesperson for the Cancer Institute said, “From the data at hand, there was no convincBold exercises answered in back

10.3

Data set available online

or

b. In the context of this exercise, describe Type I and

Type II errors.

c. Would you prefer a test with a ϭ .01 or a ϭ .1?

Video Solution available

Large-Sample Hypothesis Tests

for a Population Proportion

Now that the basic concepts of hypothesis testing have been introduced, we are ready

to turn our attention to the development of procedures for using sample information

to decide between a null and an alternative hypothesis. There are two possible conclusions: We either reject H0 or we fail to reject H0. The fundamental idea behind

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.3

Large-Sample Hypothesis Tests for a Population Proportion

469

hypothesis-testing procedures is this: We reject the null hypothesis if the observed sample

is very unlikely to have occurred when H0 is true.

In this section, we consider testing hypotheses about a population proportion

when the sample size n is large. Let p denote the proportion of individuals or objects

in a specified population that possess a certain property. A random sample of n individuals or objects is selected from the population. The sample proportion

p^ 5

number in the sample that possess the property

n

is the natural statistic for making inferences about p.

The large-sample test procedure is based on the same properties of the sampling

distribution of p^ that were used previously to obtain a confidence interval for p,

namely:

1. m p^ 5 p

2. s p^ 5

p 11 2 p2

n

Å

3. When n is large, the sampling distribution of p^ is approximately normal.

These three results imply that the standardized variable

z5

p^ 2 p

p 11 2 p2

n

Å

has approximately a standard normal distribution when n is large. Example 10.8

shows how this information allows us to make a decision.

C. Sherburne/PhotoDisc/Getty Images

E X A M P L E 1 0 . 8 Impact of Food Labels

In June 2006, an Associated Press survey was conducted to investigate how people

use the nutritional information provided on food package labels. Interviews were

conducted with 1003 randomly selected adult Americans, and each participant was

asked a series of questions, including the following two:

Question 1: When purchasing packaged food, how often do you check the nutrition labeling on the package?

Question 2: How often do you purchase foods that are bad for you, even after

you’ve checked the nutrition labels?

It was reported that 582 responded “frequently” to the question about checking labels

and 441 responded very often or somewhat often to the question about purchasing

“bad” foods even after checking the label.

Let’s start by looking at the responses to the first question. Based on these data,

is it reasonable to conclude that a majority of adult Americans frequently check the

nutritional labels when purchasing packaged foods? We can answer this question by

testing hypotheses, where

p ϭ true proportion of adult Americans who frequently check nutritional labels

H0: p ϭ .5

Ha: p Ͼ .5 (The proportion of adult Americans who frequently check nutritional labels is greater than .5. That is, more than half (a majority) frequently check nutritional labels.)

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

2: Errors in Hypothesis Testing

Tải bản đầy đủ ngay(0 tr)

×