Tải bản đầy đủ - 0 (trang)
5: Power and Probability of Type II Error

# 5: Power and Probability of Type II Error

Tải bản đầy đủ - 0trang

494

Chapter 10 Hypothesis Testing Using a Single Sample

FIGURE 10.4

Sampling distribution of x when

m ϭ 500, 505, 525.

500 505

525

When we consider the probability of rejecting the null hypothesis, we are looking

at what statisticians refer to as the power of the test.

The power of a test is the probability of rejecting the null hypothesis.

From the previous discussion, it should be apparent that when a hypothesis about

a population mean is being tested, the power of the test depends on the true value of

the population mean, m. Because the actual value of m is unknown (if we knew the

value of m we wouldn’t be doing the hypothesis test!), we cannot know what the power

is for the actual value of m. It is possible, however, to gain some insight into the power

of a test by looking at a number of “what if” scenarios. For example, we might ask,

What is the power if the actual mean is 525? or What is the power if the actual mean

is 505? and so on. That is, we can determine the power at m ϭ 525, the power at m

ϭ 505, and the power at any other value of interest. Although it is technically possible

to consider power when the null hypothesis is true, an investigator is usually concerned

about the power only at values for which the null hypothesis is false.

In general, when testing a hypothesis about a population characteristic, there are

three factors that influence the power of the test:

1. The size of the difference between the actual value of the population characteristic and the hypothesized value (the value that appears in the null hypothesis);

2. The choice of significance level, a, for the test; and

3. The sample size.

Effect of Various Factors on the Power of a Test

1. The larger the size of the discrepancy between the hypothesized value and

the actual value of the population characteristic, the higher the power.

2. The larger the significance level, a, the higher the power of the test.

3. The larger the sample size, the higher the power of the test.

Let’s consider each of the statements in the box above. The first statement has

already been discussed in the context of the textbook example. Because power is the

probability of rejecting the null hypothesis, it makes sense that the power will be

higher when the actual value of a population characteristic is quite different from the

hypothesized value than when it is close to that value.

The effect of significance level on power is not quite as obvious. To understand

the relationship between power and significance level, it helps to see the relationship

between power and b, the probability of a Type II error.

When H0 is false, power ϭ 1 Ϫ b.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.5 Power and Probability of Type II Error

495

This relationship follows from the definitions of power and Type II error. A Type

II error results from not rejecting a false H0. Because power is the probability of rejecting H0, it follows that when H0 is false

power ϭ probability of rejecting a false H0

power ϭ 1 Ϫ probability of not rejecting a false H0

power ϭ 1 Ϫ b

Recall from Section 10.2 that the choice of a, the Type I error probability, affects the

value of b, the Type II error probability. Choosing a larger value for a results in a

smaller value for b (and therefore a larger value for 1 Ϫ b). In terms of power, this

means that choosing a larger value for a results in a larger value for the power of the

test. That is, the larger the Type I error probability we are willing to tolerate, the more

likely it is that the test will be able to detect any particular departure from H0.

The third factor that affects the power of a test is the sample size. When H0 is

false, the power of a test is the probability that we will in fact “detect” that H0 is false

and, based on the observed sample, reject H0. Intuition suggests that we will be more

likely to detect a departure from H0 with a large sample than with a small sample.

This is in fact the case—the larger the sample size, the higher the power.

Consider testing the hypotheses presented previously:

H0: m ϭ 500 versus Ha: m Ͼ 500

The observations about power imply the following, for example:

1. For any value of m exceeding 500, the power of a test based on a sample of size

100 is higher than the power of a test based on a sample of size 75 (assuming the

same significance level).

2. For any value of m exceeding 500, the power of a test using a significance level of

.05 is higher than the power of a test using a significance level of .01 (assuming

the same sample size).

3. For any value of m exceeding 500, the power of the test is greater if the actual

mean is 550 than if the actual mean is 525 (assuming the same sample size and

significance level).

As was mentioned previously in this section, it is impossible to calculate the exact

power of a test because in practice we do not know the values of population characteristics. However, we can evaluate the power at a selected alternative value which would tell

us whether the power would be high or low if this alternative value is the actual value.

The following optional subsection shows how Type II error probabilities and

power can be evaluated for selected tests.

Calculating Power and Type II Error Probabilities

for Selected Tests (Optional)

The test procedures presented in this chapter are designed to control the probability

of a Type I error (rejecting H0 when H0 is true) at the desired significance level a.

However, little has been said so far about calculating the value of b, the probability

of a Type II error (not rejecting H0 when H0 is false). Here, we consider the determination of b and power for the hypothesis tests previously introduced.

When we carry out a hypothesis test, we specify the desired value of a, the probability of a Type I error. The probability of a Type II error, b, is the probability of

not rejecting H0 even though it is false. Suppose that we are testing

H0: m ϭ 1.5 versus Ha: m Ͼ 1.5

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

496

Chapter 10

Hypothesis Testing Using a Single Sample

Because we do not know the actual value of m, we cannot calculate the value of b.

However, the vulnerability of the test to Type II error can be investigated by calculating b for several different potential values of m, such as m ϭ 1.55, m ϭ 1.6, and m ϭ

1.7. Once the value of b has been determined, the power of the test at the corresponding alternative value is just 1 Ϫ b.

EXAMPLE 10.17

Calculating Power

An airline claims that the mean time on hold for callers to its customer service phone

line is 1.5 minutes. We might investigate this claim by testing

H0: m ϭ 1.5 versus Ha: m Ͼ 1.5

where m is the actual mean customer hold time. A random sample of n ϭ 36 calls

is to be selected, and the resulting data will be used to reach a conclusion. Suppose

that the standard deviation of hold time (s) is known to be 0.20 minutes and that a

significance level of .01 is to be used. Our test statistic (because s ϭ 0.20) is

z5

x 2 1.5

x 2 1.5

x 2 1.5

5

5

.20

.20

.0333

!n

!36

The inequality in Ha implies that

P-value ϭ area under z curve to the right of calculated z

From Appendix Table 2, it is easily verified that the z critical value 2.33 captures

an upper-tail z curve area of .01. Thus, P-value Յ .01 only when z Ն 2.33. This is

equivalent to the decision rule

reject H0 if calculated z Ն 2.33

which becomes

reject H0 if

x 2 1.5

\$ 2.33

.0333

Solving this inequality for x we get

x \$ 1.5 1 2.33 1.03332

or

x \$ 1.578

So if x \$ 1.578, we will reject H0, and if x , 1.578, we will fail to reject H0. This

decision rule corresponds to a ϭ .01.

Suppose now that m ϭ 1.6 (so that H0 is false). A Type II error will then occur

if x , 1.578. What is the probability that this occurs? If m ϭ 1.6, the sampling distribution of x is approximately normal, centered at 1.6, and has a standard deviation

of .0333. The probability of observing an x value less than 1.578 can then be determined by finding an area under a normal curve with mean 1.6 and standard deviation

.0333, as illustrated in Figure 10.5.

Because the curve in Figure 10.5 is not the standard normal (z) curve, we must

first convert to a z score before using Appendix Table 2 to find the area. Here,

z score for 1.578 5

1.578 2 1.6

1.578 2 mx

5

5 2.66

sx

.0333

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.5 Power and Probability of Type II Error

497

–x distribution (normal with mean

1.6 and standard deviation 0.0333)

x < 1.578)

β = P( –

FIGURE 10.5

b when m ϭ 1.6 in Example 10.17.

1.578

1.6

and

area under z curve to left of Ϫ0.66 ϭ .2546

So, if m ϭ 1.6, b ϭ .2546. This means that if m is 1.6, about 25% of all samples

would still result in x values less than 1.578 and failure to reject H0.

The power of the test at m ϭ 1.6 is then

(power at m ϭ 1.6) ϭ 1 Ϫ (b when m is 1.6)

ϭ 1 Ϫ .2546

ϭ .7454

This means that if the actual mean is 1.6, the probability of rejecting H0: m ϭ 1.5 in

favor of Ha: m Ͼ 1.5 is .7454. That is, if m is 1.6 and the test is used repeatedly with

random samples selected from the population, in the long run about 75% of the

samples will result in the correct conclusion to reject H0.

Now consider b and power when m ϭ 1.65. The normal curve in Figure 10.5

would then be centered at 1.65. Because b is the area to the left of 1.578 and the

curve has shifted to the right, b decreases. Converting 1.578 to a z score and using

Appendix Table 2 gives b ϭ .0154. Also,

(power at m ϭ 1.65) ϭ 1 Ϫ .0154 ϭ .9846

As expected, the power at m ϭ 1.65 is higher than the power at m ϭ 1.6 because 1.65

is farther from the hypothesized value of 1.5.

Statistical software and graphing calcultors can calculate the power for specified

values of s, a, n, and the difference between the actual and hypothesized values of m.

The following Minitab output shows power calculations corresponding to those in

Example 10.17:

1-Sample Z Test

Testing mean = null (versus > null)

Alpha = 0.01

Sigma = 0.2

Sample Size = 36

Difference

Power

0.10

0.7497

0.15

0.9851

The slight differences between the power values computed by Minitab and those

previously obtained are due to rounding in Example 10.17.

The probability of a Type II error and the power for z tests concerning a population proportion are calculated in an analogous manner.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

498

Chapter 10 Hypothesis Testing Using a Single Sample

EXAMPLE 10.18

Power for Testing Hypotheses

A package delivery service advertises that at least 90% of all packages brought to its

office by 9 a.m. for delivery in the same city are delivered by noon that day. Let p

denote the proportion of all such packages actually delivered by noon. The hypotheses of interest are

H0: p ϭ .9

Ha: p Ͻ .9

versus

where the alternative hypothesis states that the company’s claim is untrue. The value

p ϭ .8 represents a substantial departure from the company’s claim. If the hypotheses are

tested at level .01 using a sample of n ϭ 225 packages, what is the probability that the departure from H0 represented by this alternative value will go undetected?

At significance level .01, H0 is rejected if P-value Յ .01. For the case of a lowertailed test, this is the same as rejecting H0 if

p^ 2 mp^

p^ 2 .9

p^ 2 .9

z5

# 22.33

5

5

sp^

.02

1.92 1.12

Å 225

(Because Ϫ2.33 captures a lower-tail z curve area of .01, the smallest 1% of all z values satisfy z Յ Ϫ2.33.) This inequality is equivalent to p^ Յ .853, so H0 is not rejected

if p^ Ͼ .853. When p ϭ .8, p^ has approximately a normal distribution with

m p^ 5 .8

s p^ 5

1.82 1.22

5 .0267

Å 225

Then b is the probability of obtaining a sample proportion greater than .853, as illustrated in Figure 10.6.

Sampling distribution of pˆ (normal with

mean 0.8 and standard deviation 0.0267)

β

FIGURE 10.6

b when p ϭ .8 in Example 10.18.

0.8

0.853

Converting to a z score results in

z5

.853 2 .8

5 1.99

.0267

and Appendix Table 2 gives

b ϭ 1 Ϫ .9767 ϭ .0233

When p ϭ .8 and a level .01 test is used, less than 3% of all samples of size n ϭ 225

will result in a Type II error. The power of the test at p ϭ .8 is 1 Ϫ .0233 ϭ .9767.

This means that the probability of rejecting H0: p ϭ .9 in favor of Ha: p Ͻ .9 when p

is really .8 is .9767, which is quite high.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.5 Power and Probability of Type II Error

499

b and Power for the t Test (Optional)

The power and b values for t tests can be determined by using a set of curves specially constructed for this purpose or by using appropriate software. As with the z

test, the value of b depends not only on the actual value of m but also on the selected significance level a; b increases as a is made smaller. In addition, b depends

on the number of degrees of freedom, n Ϫ 1. For any fixed significance level a, it

should be easier for the test to detect a specific departure from H0 when n is large

than when n is small. This is indeed the case; for a fixed alternative value, b decreases as n Ϫ 1 increases.

Unfortunately, there is one other quantity on which b depends: the population

standard deviation s. As s increases, so does sx. This in turn makes it more likely that

an x value far from m will be observed just by chance, resulting in an incorrect conclusion. Once a is specified and n is fixed, the determination of b at a particular alternative value of m requires that a value of s be chosen, because each different value of s

yields a different value of b. (This did not present a problem with the z test because

when using a z test, the value of s is known.) If the investigator can specify a range of

plausible values for s, then using the largest such value will give a pessimistic b (one

on the high side) and a pessimistic value of power (one on the low side).

Figure 10.7 shows three different b curves for a one-tailed t test (appropriate for

Ha: m Ͼ hypothesized value or for Ha: m Ͻ hypothesized value). A more complete

set of curves for both one- and two-tailed tests when a ϭ .05 and when a ϭ .01 appears in Appendix Table 5. To determine b, first compute the quantity

d5

0 alternative value 2 hypothesized value 0

s

Then locate d on the horizontal axis, move directly up to the curve for n Ϫ 1 df, and

move over to the vertical axis to find b.

β

1.0

α = .01, df = 6

.8

α = .05, df = 6

.6

α = .01, df = 19

.4

Associated

value of β

.2

d

0

FIGURE 10.7

1

2

3

Value of d

b curves for the one-tailed t test.

EXAMPLE 10.19

b and Power for t Tests

Consider testing

H0: m ϭ 100 versus Ha: m Ͼ 100

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

500

Chapter 10 Hypothesis Testing Using a Single Sample

and focus on the alternative value m ϭ 110. Suppose that s ϭ 10, the sample size is

n ϭ 7, and a significance level of .01 has been selected. For s ϭ 10,

d5

0 110 2 100 0

10

5

51

10

10

Figure 10.7 (using df 5 7 2 1 5 6) gives b Ϸ .6. The interpretation is that if s 5

10 and a level .01 test based on n ϭ 7 is used when m ϭ 110 (and thus H0 is false),

roughly 60% of all samples result in an incorrect decision to not reject H0! Equivalently, the power of the test at m ϭ 110 is only 1 Ϫ .6 ϭ .4. The probability of rejecting H0 when m 5 110 is not very large. If a .05 significance level is used instead, then

b Ϸ .3, which is still rather large. Using a .01 significance level with n ϭ 20 (df ϭ

19) yields, from Figure 10.7, b Ϸ .05. At the alternative value m ϭ 110, for s ϭ 10

the level .01 test based on n ϭ 20 has smaller b than the level .05 test with n ϭ 7.

Substantially increasing n counterbalances using the smaller a.

Now consider the alternative m ϭ 105, again with s ϭ 10, so that

d5

0 105 2 100 0

5

5

5 .5

10

10

Then, from Figure 10.7, b 5 .95 when a 5 .01, n ϭ 7; b 5 .7 when a 5 .05,

n ϭ 7; and b ϭ .65 when a ϭ .01, n ϭ 20. These values of b are all quite large; with

s ϭ 10, m ϭ 105 is too close to the hypothesized value of 100 for any of these three

tests to have a good chance of detecting such a departure from H0. A substantial decrease in b would require using a much larger sample size. For example, from Appendix

Table 5, b ϭ .08 when a ϭ .05 and n ϭ 40.

The curves in Figure 10.7 also give b when testing H0: m ϭ 100 versus Ha:

m Ͻ 100. If the alternative value m ϭ 90 is of interest and s ϭ 10,

d5

0 90 2 100 0

10

5

51

10

10

and values of b are the same as those given in the first paragraph of this example.

Because curves for only selected degrees of freedom appear in Appendix Table 5,

other degrees of freedom require a visual approximation. For example, the 27-df

curve (for n ϭ 28) lies between the 19-df and 29-df curves, which do appear, and it

is closer to the 29-df curve. This type of approximation is adequate because it is the

general magnitude of b—large, small, or moderate—that is of primary concern.

Minitab can also evaluate power for the t test. For example, the following output

shows Minitab calculations for power at m ϭ 110 for samples of size 7 and 20 when

a ϭ .01. The corresponding approximate values from Appendix Table 5 found in

Example 10.19 are fairly close to the Minitab values.

1-Sample t Test

Testing mean = null (versus > null)

Calculating power for mean = null + 10

Alpha = 0.01

Sigma = 10

Sample Size

Power

7

0.3968

20

0.9653

The b curves in Appendix Table 5 are those for t tests. When the alternative value in

Ha corresponds to a value of d relatively close to 0, b for a t test may be rather large.

One might wonder whether there is another type of test that has the same level of

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.5 Power and Probability of Type II Error

501

significance a as does the t test and smaller values of b. The following result provides

When the population distribution is normal, the t test for testing hypotheses about m

has smaller b than does any other test procedure that has the same level of significance a.

Stated another way, among all tests with level of significance a, the t test makes

b as small as it can possibly be when the population distribution is normal. In this

sense, the t test is a best test. Statisticians have also shown that when the population

distribution is not too far from a normal distribution, no test procedure can improve

on the t test by very much (i.e., no test procedure can have the same a and substantially smaller b). However, when the population distribution is believed to be strongly

nonnormal (heavy-tailed, highly skewed, or multimodal), the t test should not be

used. Then it’s time to consult your friendly neighborhood statistician, who can provide you with alternative methods of analysis.

E X E RC I S E S 1 0 . 5 9 - 1 0.6 5

10.59 The power of a test is influenced by the sample

size and the choice of significance level.

a. Explain how increasing the sample size affects the

power (when significance level is held fixed).

b. Explain how increasing the significance level affects

the power (when sample size is held fixed).

10.60 Water samples are taken from water used for

cooling as it is being discharged from a power plant into

a river. It has been determined that as long as the mean

temperature of the discharged water is at most 150ЊF,

there will be no negative effects on the river’s ecosystem.

To investigate whether the plant is in compliance with

regulations that prohibit a mean discharge water temperature above 150ЊF, a scientist will take 50 water

samples at randomly selected times and will record the

water temperature of each sample. She will then use a z

statistic

Let m denote the true average lifetime (in

hours) for a certain type of battery under controlled laboratory conditions. A test of H0: m ϭ 10 versus Ha:

m Ͻ 10 will be based on a sample of size 36. Suppose

that s is known to be 0.6, from which sx 5 .1. The appropriate test statistic is then

10.61

x 2 150

z5

s

!n

to decide between the hypotheses H0: m ϭ 150 and

Ha: m Ͼ 150, where m is the mean temperature of discharged water. Assume that s is known to be 10.

a. Explain why use of the z statistic is appropriate in

this setting.

b. Describe Type I and Type II errors in this context.

c. The rejection of H0 when z Ն 1.8 corresponds to

what value of a? (That is, what is the area under the

z curve to the right of 1.8?)

d. Suppose that the actual value for m is 153 and that

H0 is to be rejected if z Ն 1.8. Draw a sketch (similar

to that of Figure 10.5) of the sampling distribution

of x, and shade the region that would represent b,

the probability of making a Type II error.

e. For the hypotheses and test procedure described,

compute the value of b when m ϭ 153.

f. For the hypotheses and test procedure described,

what is the value of b if m ϭ 160?

g. What would be the conclusion of the test if H0 is

rejected when z Ն 1.8 and x ϭ 152.4? What type of

error might have been made in reaching this conclusion?

Data set available online

z5

x 2 10

0.1

Video Solution available

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

502

Chapter 10 Hypothesis Testing Using a Single Sample

a. What is a for the test procedure that rejects H0 if

z Յ Ϫ1.28?

b. If the test procedure of Part (a) is used, calculate b

when m ϭ 9.8, and interpret this error probability.

c. Without doing any calculation, explain how b when

m ϭ 9.5 compares to b when m ϭ 9.8. Then check

your assertion by computing b when m ϭ 9.5.

d. What is the power of the test when m ϭ 9.8? when

m ϭ 9.5?

10.62 The city council in a large city has become concerned about the trend toward exclusion of renters with

children in apartments within the city. The housing coordinator has decided to select a random sample of 125

apartments and determine for each whether children are

permitted. Let p be the proportion of all apartments that

prohibit children. If the city council is convinced that p is

greater than 0.75, it will consider appropriate legislation.

a. If 102 of the 125 sampled apartments exclude renters with children, would a level .05 test lead you to

the conclusion that more than 75% of all apartments

exclude children?

b. What is the power of the test when p ϭ .8 and

a ϭ .05?

10.63 The amount of shaft wear after a fixed mileage

was determined for each of seven randomly selected

internal combustion engines, resulting in a mean of

0.0372 inch and a standard deviation of 0.0125 inch.

a. Assuming that the distribution of shaft wear is normal,

test at level .05 the hypotheses H0: m ϭ .035 versus

Ha: m Ͼ .035.

b. Using s ϭ 0.0125, a ϭ .05, and Appendix Table 5,

what is the approximate value of b, the probability

of a Type II error, when m ϭ .04?

10.6

Data set available online

c. What is the approximate power of the test when

m 5 .04 and a ϭ .05?

10.64 Optical fibers are used in telecommunications to

transmit light. Suppose current technology allows production of fibers that transmit light about 50 km. Researchers are trying to develop a new type of glass fiber

that will increase this distance. In evaluating a new fiber,

it is of interest to test H0: m ϭ 50 versus Ha: m Ͼ 50,

with m denoting the mean transmission distance for the

new optical fiber.

a. Assuming s ϭ 10 and n ϭ 10, use Appendix Table

5 to find b, the probability of a Type II error, for

each of the given alternative values of m when a test

with significance level .05 is employed:

i. 52

ii. 55

iii. 60

iv. 70

b. What happens to b in each of the cases in Part (a) if

s is actually larger than 10? Explain your

reasoning.

10.65 Let m denote the mean diameter for bearings of

a certain type. A test of H0: m ϭ 0.5 versus Ha: m ϶ 0.5

will be based on a sample of n bearings. The diameter

distribution is believed to be normal. Determine the

value of b in each of the following cases:

a. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.52

b. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.48

c. n ϭ 15, a ϭ .01, s ϭ 0.02, m ϭ 0.52

d. n ϭ 15, a ϭ .05, s ϭ 0.02, m ϭ 0.54

e. n ϭ 15, a ϭ .05, s ϭ 0.04, m ϭ 0.54

f. n ϭ 20, a ϭ .05, s ϭ 0.04, m ϭ 0.54

g. Is the way in which b changes as n, a, s, and m vary

Video Solution available

Interpreting and Communicating the Results

of Statistical Analyses

The nine-step procedure that we have proposed for testing hypotheses provides a

systematic approach for carrying out a complete test. However, you rarely see the

results of a hypothesis test reported in publications in such a complete way.

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

10.6 Interpreting and Communicating the Results of Statistical Analyses

503

Communicating the Results of Statistical Analyses

When summarizing the results of a hypothesis test, it is important that you include

several things in the summary in order to provide all the relevant information. These

are:

1. Hypotheses. Whether specified in symbols or described in words, it is important

that both the null and the alternative hypotheses be clearly stated. If you are using

symbols to define the hypotheses, be sure to describe them in the context of the

problem at hand (for example, m ϭ population mean calorie intake).

2. Test procedure. You should be clear about what test procedure was used (for example, large-sample z test for proportions) and why you think it was reasonable

to use this procedure. The plausibility of any required assumptions should be

3. Test statistic. Be sure to include the value of the test statistic and the P-value.

Including the P-value allows a reader who may have chosen a different significance level to see whether she would have reached the same or a different

conclusion.

4. Conclusion in context. Never end the report of a hypothesis test with the statement

“I rejected (or did not reject) H0.” Always provide a conclusion that is in the

context of the problem and that answers the original research question which the

hypothesis test was designed to answer. Be sure also to indicate the level of significance used as a basis for the decision.

Interpreting the Results of Statistical Analyses

When the results of a hypothesis test are reported in a journal article or other published source, it is common to find only the value of the test statistic and the associated P-value accompanying the discussion of conclusions drawn from the data. Often, especially in newspaper articles, only sample summary statistics are given, with

the conclusion immediately following. You may have to fill in some of the intermediate steps for yourself to see whether or not the conclusion is justified.

For example, the article “Physicians’ Knowledge of Herbal Toxicities and

Adverse Herb-Drug Interactions” (European Journal of Emergency Medicine,

August 2004) summarizes the results of a study to assess doctors’ familiarity with

adverse effects of herbal remedies as follows: “A total of 142 surveys and quizzes were

completed by 59 attending physicians, 57 resident physicians, and 26 medical students. The mean subject score on the quiz was only slightly higher than would have

occurred from random guessing.” The quiz consisted of 16 multiple-choice questions. If each question had four possible choices, the statement that the mean quiz

score was only slightly higher than would have occurred from random guessing suggests that the researchers considered the hypotheses H0: m 5 4 and Ha: m . 4, where

m represents the mean score for the population of all physicians and medical students

and the null hypothesis corresponds to the expected number of correct choices for

someone who is guessing. Assuming that it is reasonable to regard this sample as

representative of the population of interest, the data from the sample could be used

to carry out a test of these hypotheses.

What to Look For in Published Data

Here are some questions to consider when you are reading a report that contains the

results of a hypothesis test:

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

504

Chapter 10

Hypothesis Testing Using a Single Sample

• What hypotheses are being tested? Are the hypotheses about a population mean,

a population proportion, or some other population characteristic?

• Was the appropriate test used? Does the validity of the test depend on any assumptions about the sample or about the population from which the sample was

selected? If so, are the assumptions reasonable?

• What is the P-value associated with the test? Was a significance level reported (as

opposed to simply reporting the P-value)? Is the chosen significance level

reasonable?

• Are the conclusions drawn consistent with the results of the hypothesis test?

For example, consider the following statement from the paper “Didgeridoo Playing as Alternative Treatment for Obstructive Sleep Apnoea Syndrome” (British

Medical Journal [2006]: 266–270): “We found that four months of training of the

upper airways by didgeridoo playing reduces daytime sleepiness in people with snoring and obstructive apnoea syndrome.” This statement was supported by data on a

measure of daytime sleepiness called the Epworth scale. For the 14 participants in the

study, the mean improvement in Epworth scale was 4.4 and the standard deviation

was 3.7. The paper does not indicate what test was performed or what the value of

the test statistic was. It appears that the hypotheses of interest are H0: m ϭ 0 (no

improvement) versus Ha: m Ͼ 0, where m represents the mean improvement in Epworth score after four months of didgeridoo playing for all people with snoring and

obstructive sleep apnoea. Because the sample size is not large, the one-sample t test

would be appropriate if the sample can be considered a random sample and the distribution of Epworth scale improvement scores is approximately normal. If these assumptions are reasonable (something that was not addressed in the paper), the t test

results in t ϭ 4.45 and an associated P-value of .000. Because the reported P-value is

so small H0 would be rejected, supporting the conclusion in the paper that didgeridoo

playing is an effective treatment. (In case you are wondering, a didgeridoo is an Australian Aboriginal woodwind instrument.)

A Word to the Wise: Cautions and Limitations

There are several things you should watch for when conducting a hypothesis test or

when evaluating a written summary of a hypothesis test.

1. The result of a hypothesis test can never show strong support for the null hypothesis. Make sure that you don’t confuse “There is no reason to believe the null

hypothesis is not true” with the statement “There is convincing evidence that the

null hypothesis is true.” These are very different statements!

2. If you have complete information for the population, don’t carry out a hypothesis test! It should be obvious that no test is needed to answer questions about a

population if you have complete information and don’t need to generalize from

a sample, but people sometimes forget this fact. For example, in an article

on growth in the number of prisoners by state, the San Luis Obispo Tribune

(August 13, 2001) reported “California’s numbers showed a statistically insignificant change, with 66 fewer prisoners at the end of 2000.” The use of the term

“statistically insignificant” implies some sort of statistical inference, which is not

appropriate when a complete accounting of the entire prison population is

known. Perhaps the author confused statistical and practical significance. Which

brings us to . . .

3. Don’t confuse statistical significance with practical significance. When statistical

significance has been declared, be sure to step back and evaluate the result in light

of its practical importance. For example, we may be convinced that the proporCopyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5: Power and Probability of Type II Error

Tải bản đầy đủ ngay(0 tr)

×