Tải bản đầy đủ - 0 (trang)
4 Assessing Normality; Normal Probability Plots

4 Assessing Normality; Normal Probability Plots

Tải bản đầy đủ - 0trang

6.4 Assessing Normality; Normal Probability Plots



KEY FACT 6.7



?



313



Guidelines for Assessing Normality Using

a Normal Probability Plot



What Does It Mean?



To assess the normality of a variable using sample data, construct a normal

probability plot.

r If the plot is roughly linear, you can assume that the variable is approximately normally distributed.

r If the plot is not roughly linear, you can assume that the variable is not

approximately normally distributed.



Roughly speaking, a

normal probability plot that

falls nearly in a straight line

indicates a normal variable, and

one that does not indicates a

nonnormal variable.



These guidelines should be interpreted loosely for small samples but usually

interpreted strictly for large samples.



In practice, normal probability plots are generated by computer. However, to better

understand these plots, constructing a few by hand is helpful. Table III in Appendix A

gives the normal scores for sample sizes from 5 to 30. In the next example, we explain

how to use Table III to obtain a normal probability plot.



TABLE 6.4

Adjusted gross incomes ($1000s)



9.7

81.4

12.8



93.1

51.1

7.8



33.0

43.5

18.1



21.2

10.6

12.7



TABLE 6.5

Ordered data and normal scores



Adjusted gross

income



Normal

score



7.8

9.7

10.6

12.7

12.8

18.1

21.2

33.0

43.5

51.1

81.4

93.1



−1.64

−1.11

−0.79

−0.53

−0.31

−0.10

0.10

0.31

0.53

0.79

1.11

1.64



Normal Probability Plots

Adjusted Gross Incomes The Internal Revenue Service publishes data on federal

individual income tax returns in Statistics of Income, Individual Income Tax Returns.

A simple random sample of 12 returns from last year revealed the adjusted gross

incomes, in thousands of dollars, shown in Table 6.4. Construct a normal probability

plot for these data, and use the plot to assess the normality of adjusted gross incomes.

Solution Here the variable is adjusted gross income, and the population consists

of all of last year’s federal individual income tax returns. To construct a normal

probability plot, we first arrange the data in increasing order and obtain the normal

scores from Table III. The ordered data are shown in the first column of Table 6.5;

the normal scores, from the n = 12 column of Table III, are shown in the second

column of Table 6.5.

Next, we plot the points in Table 6.5, using the horizontal axis for the adjusted

gross incomes and the vertical axis for the normal scores. For instance, the first

point plotted has a horizontal coordinate of 7.8 and a vertical coordinate of −1.64.

Figure 6.29 shows all 12 points from Table 6.5. This graph is the normal probability

plot for the sample of adjusted gross incomes. Note that the normal probability plot

in Fig. 6.29 is curved, not linear.

FIGURE 6.29 Normal probability plot for the

sample of adjusted gross incomes

3

Normal score



EXAMPLE 6.16



2

1

0



−1

−2

−3

10 20 30 40 50 60 70 80 90 100

Adjusted gross income

($1000s)



314



CHAPTER 6 The Normal Distribution



Interpretation In light of Key Fact 6.7, last year’s adjusted gross incomes

apparently are not (approximately) normally distributed.

Report 6.4

Exercise 6.127(a), (c)

on page 317



Note: If two or more observations in a sample are equal, you can think of them as

slightly different from one another for purposes of obtaining their normal scores.

In some books and statistical technologies, you may encounter one or more of the

following differences in normal probability plots:

r The vertical axis is used for the data and the horizontal axis for the normal scores.

r A probability or percent scale is used instead of normal scores.

r An averaging process is used to assign equal normal scores to equal observations.

r The method used for computing normal scores differs from the one used to obtain

Table III.



Detecting Outliers with Normal Probability Plots

Recall that outliers are observations that fall well outside the overall pattern of the data.

We can also use normal probability plots to detect outliers.



EXAMPLE 6.17

TABLE 6.6

Sample of last year’s chicken

consumption (lb)



57

72

60



69

65

75



63

91

55



49

59

80



63

0

73



61

82



Using Normal Probability Plots to Detect Outliers

Chicken Consumption The U.S. Department of Agriculture publishes data on

U.S. chicken consumption in Food Consumption, Prices, and Expenditures. The

annual chicken consumption, in pounds, for 17 randomly selected people is displayed in Table 6.6. A normal probability plot for these observations is presented in

Fig. 6.30(a). Use the plot to discuss the distribution of chicken consumption and to

detect any outliers.

Solution Figure 6.30(a) reveals that the normal probability plot falls roughly in a

straight line, except for the point corresponding to 0 lb, which falls well outside the

overall pattern of the plot.



3



3



2



2



Normal score



Normal score



FIGURE 6.30 Normal probability plots for chicken consumption: (a) original data;

(b) data with outlier removed



1

0



outlier



–1

–2

–3



1

0

–1

–2

–3



10 20 30 40 50 60 70 80 90 100



10 20 30 40 50 60 70 80 90 100



Chicken consumption (lb)



Chicken consumption (lb)



(a)



(b)



Interpretation The observation of 0 lb is an outlier, which might be a recording

error or due to a person in the sample who does not eat chicken, such as a vegetarian.

If we remove the outlier 0 lb from the sample data and draw a new normal

probability plot, Fig. 6.30(b) shows that this plot is quite linear.

Exercise 6.127(b)

on page 317



Interpretation It appears plausible that, among people who eat chicken, the

amounts they consume annually are (approximately) normally distributed.



6.4 Assessing Normality; Normal Probability Plots



315



Although the visual assessment of normality that we studied in this section is subjective, it is sufficient for most statistical analyses.



THE TECHNOLOGY CENTER

Most statistical technologies have programs that automatically construct normal probability plots. In this subsection, we present output and step-by-step instructions for such

programs.



EXAMPLE 6.18



Using Technology to Obtain Normal Probability Plots

Adjusted Gross Incomes Use Minitab, Excel, or the TI-83/84 Plus to obtain a normal probability plot for the adjusted gross incomes in Table 6.4 on page 313.

Solution We applied the normal-probability-plot programs to the data, resulting

in Output 6.4. Steps for generating that output are presented in Instructions 6.3 on

the following page.



OUTPUT 6.4 Normal probability plots for the sample of adjusted gross incomes

MINITAB



TI-83/84 PLUS



EXCEL



316



CHAPTER 6 The Normal Distribution



INSTRUCTIONS 6.3



Steps for generating Output 6.4



MINITAB

1

2

3

4

5

6

7



Store the data from Table 6.4 in a column named AGI

Choose Graph ➤ Probability Plot. . .

Click OK

Press the F3 key to reset the dialog box

Specify AGI in the Graph variables text box

Click the Distribution. . . button

Click the Data Display tab, select the Symbols only

option button from the Data Display list, and click OK

8 Click the Scale. . . button

9 Click the Y-Scale Type tab, select the Score option

button from the Y-Scale Type list, and click OK

10 Click OK

EXCEL

1 Store the data from Table 6.4 in a column named AGI

2 Choose XLSTAT ➤ Visualizing data ➤ Univariate plots

3 Click the reset button in the lower left corner of the

dialog box



4 Click in the Quantitative data selection box and then

select the column of the worksheet that contains the

AGI data

5 Click the Options tab and uncheck the Descriptive

statistics check box

6 Click the Charts (1) tab, uncheck the Box plots check

box, and check the Normal Q-Q plots check box

7 Click OK

8 Click the Continue button in the XLSTAT – Selections

dialog box

TI-83/84 PLUS

1

2

3

4

5

6

7

8



Store the data from Table 6.4 in a list named AGI

Ensure that all stat plots and all Y = functions are off

Press 2nd ➤ STAT PLOT and then press ENTER twice

Arrow to the sixth graph icon and press ENTER

Press the down-arrow key

Press 2nd ➤ LIST

Arrow down to AGI and press ENTER

Press ZOOM, then 9, and then TRACE



Exercises 6.4

Understanding the Concepts and Skills



6.122

3



6.116 Under what circumstances is using a normal probability plot

to assess the normality of a variable usually better than using a histogram, stem-and-leaf diagram, or dotplot?



2

1

0



6.117 Explain why assessing the normality of a variable is often

important.



–1

–2



6.118 Explain in detail what a normal probability plot is and how it

is used to assess the normality of a variable.



–3

600



6.119 How is a normal probability plot used to detect outliers?



800



1000



1200



1400



6.123

3



6.120 Explain how to obtain normal scores from Table III in Appendix A when a sample contains equal observations.



2

1



In each of Exercises 6.121–6.126, we have provided a normal probability plot of data from a sample of a population. In each case, assess

the normality of the variable under consideration.



0

−1

−2

−3



6.121

3



0



2



6.124



1

0



20 40 60 80 100 120 140



3

2



−1



1



−2



0



−3



−1

70



80



90



100



110



120



−2

−3

40



50



60



70



80



6.4 Assessing Normality; Normal Probability Plots



6.125



317



by consumers on nonalcoholic beverages was $333. A random sample

of 12 consumers yielded the following data, in dollars, on last year’s

expenditures on nonalcoholic beverages.



3

2

1



472 287 295 376

370 392 351 384

370 360 305 369



0

−1

−2

−3

60



6.126



70



80



90



100



3



In Exercises 6.131–6.134,

a. obtain a normal probability plot of the given data.

b. use part (a) to identify any outliers.

c. use part (a) to assess the normality of the variable under consideration.

6.131 Suppose a sample of the height in inches of players in the starting lineup of a particular basketball team was obtained and shown below. Use a normal probability plot to assess whether the sample data

could have come from a population that is normally distributed.

a. Construct a normal probability plot of the data.

b. Use the normal probability plot to identify any outliers.

c. Based on the probability plot, does the sample appear to come

from a normally distributed population?



2

1

0

−1

−2

−3

50



60



70



80



90



Applying the Concepts and Skills

In Exercises 6.127–6.130,

a. use Table III in Appendix A to construct a normal probability plot

of the given data.

b. use part (a) to identify any outliers.

c. use part (a) to assess the normality of the variable under consideration.

6.127 Exam Scores. A sample of the final exam scores in a large

introductory statistics course is as follows.

88

85

90

81



67

82

63

96



64

39

89

100



76

75

90

70



86

34

84

96



6.128 Cellular Bills. CTIA–The Wireless Association collects data

on cell phones and publishes the results in Semi-annual Wireless

Survey. A sample of 15 monthly cell-phone bills gave the following

data (to the nearest dollar).

25

55

45



95

15

55



75

45

20



15

45

65



55

35

60



6.129 Sachin Tendulkar. Sachin Tendulkar is a former Indian

cricket player who played international cricket from 1989 to 2013.

The following table shows the number of runs he scored in 20 years

of his entire career in one day international matches.



6.132 Skydiving Fatalities. In 2013, United States Parachute Association (USPA) recorded 24 fatal skydiving accidents in the U.S.

out of roughly 3.2 million jumps. The data in the following table is

the number of fatalities that occurred from 1999 to 2013.

32

27

21



35 33 25 21

21 18 30 16

25 19 24 14



6.133 Oxygen Distribution. In the article “Distribution of Oxygen

in Surface Sediments from Central Sagami Bay, Japan: In Situ Measurements by Microelectrodes and Planar Optodes” (Deep Sea Research Part I: Oceanographic Research Papers, Vol. 52, Issue 10,

pp. 1974–1987), R. Glud et al. explored the distributions of oxygen

in surface sediments from central Sagami Bay. The oxygen distribution gives important information on the general biogeochemistry of

marine sediments. Measurements were performed at 16 sites. A sample of 22 depths yielded the following data, in millimoles per square

meter per day (mmol m−2 d−1 ), on diffusive oxygen uptake (DOU).

1.8

3.3

1.1



2.0 1.8

1.2 3.6

0.7 1.0



2.3 3.8

1.9 7.6

1.8 1.8



3.4 2.7

2.0 1.5

6.7



1.1

2.0



6.134 Medieval Cremation Burials. In the article “Material Culture as Memory: Combs and Cremations in Early Medieval Britain”

(Early Medieval Europe, Vol. 12, Issue 2, pp. 89–128), H. Williams

discussed the frequency of cremation burials found in 17 archaeological sites in eastern England. Here are the data.

83

64 46 48 523 35

34 265 2484

46 385 21 86 429 51 258 119



0

239 417 704

319 1089 444 1141 412 1425

1611 1011 1894 843 1328

904 741

812 628

460



6.130 Beverage Expenditures. The Bureau of Labor Statistics

publishes information on average annual expenditures by consumers

in the Consumer Expenditure Survey. In 2010, the mean amount spent



Working with Large Data Sets

6.135 The data set to the right is the body mass index of 20 men.

Use technology to obtain a histogram and normal probability plot of



318



CHAPTER 6 The Normal Distribution



the data and use them to assess the (approximate) normality of the

data.

26.0

26.9

23.1

25.6



23.8

32.0

23.1

26.7



25.3

20.7

26.4

21.7



23.4

28.5

24.1

24.5



27.7

19.9

33.4

27.3



a. Construct a histogram of the data.

b. Construct a normal probability plot of the data.



Extending the Concepts and Skills

6.138 Finger Length of Criminals. In 1902, W. R. Macdonell published the article “On Criminal Anthropometry and the Identification

of Criminals” (Biometrika,Vol. 1, pp. 177–227). Among other things,

the author presented data on the left middle finger length, in centimeters. The following table provides the midpoints and frequencies of

the finger-length classes used.

Midpoint

(cm)



Frequency



Midpoint

(cm)



Frequency



9.5

9.8

10.1

10.4

10.7

11.0

11.3



1

4

24

67

193

417

575



11.6

11.9

12.2

12.5

12.8

13.1

13.4



691

509

306

131

63

16

3



6.136 Vegetarians and Omnivores. Philosophical and health issues are prompting an increasing number of Taiwanese to switch

to a vegetarian lifestyle. In the paper “LDL of Taiwanese Vegetarians Are Less Oxidizable than Those of Omnivores” (Journal of Nutrition, Vol. 130, pp. 1591–1596), S. Lu et al. compared the daily

intake of nutrients by vegetarians and omnivores living in Taiwan.

Among the nutrients considered was protein. Too little protein stunts

growth and interferes with all bodily functions; too much protein

puts a strain on the kidneys, can cause diarrhea and dehydration,

and can leach calcium from bones and teeth. The daily protein intakes, in grams, for 51 female vegetarians and 53 female omnivores are provided on the WeissStats site. Use the technology of

your choice to do the following for each of the two sets of sample

data.

a. Obtain a histogram of the data and use it to assess the (approximate) normality of the variable under consideration.

b. Obtain a normal probability plot of the data and use it to assess

the (approximate) normality of the variable under consideration.

c. Compare your results in parts (a) and (b).



Use these data and the technology of your choice to assess the normality of middle finger length of criminals by using

a. a histogram.

b. a normal probability plot.



6.137 “Chips Ahoy! 1,000 Chips Challenge.” Students in an introductory statistics course at the U.S. Air Force Academy participated in Nabisco’s “Chips Ahoy! 1,000 Chips Challenge” by

confirming that there were at least 1000 chips in every 18-ounce

bag of cookies that they examined. As part of their assignment,

they concluded that the number of chips per bag is approximately

normally distributed. Their conclusion was based on the data provided on the WeissStats site, which gives the number of chips

per bag for 42 bags. Do you agree with the conclusion of the

students? Explain your answer. [SOURCE: B. Warner and J. Rutledge, “Checking the Chips Ahoy! Guarantee,” Chance, Vol. 12(1),

pp. 10–14]



6.140 Emergency Room Traffic. Desert Samaritan Hospital in

Mesa, Arizona, keeps records of emergency room traffic. Those

records reveal that the times between arriving patients have a special type of reverse-J-shaped distribution called an exponential distribution. The records also show that the mean time between arriving

patients is 8.7 minutes.

a. Use the technology of your choice to simulate four random samples of 75 interarrival times each.

b. Obtain a normal probability plot of each sample in part (a).

c. Are the normal probability plots in part (b) what you expected?

Explain your answer.



6.5



6.139 Household Expenditure. In a certain city, household expenditure is generally distributed with a mean of $460 and a standard

deviation of $54.

a. Use the technology of your choice to simulate four random samples of 100 houses each.

b. Obtain a normal probability plot of each sample in part (a).

c. Are the normal probability plots in part (b) what you expected?

Explain your answer.



Normal Approximation to the Binomial Distribution∗ †

In this section, we demonstrate the approximation of binomial probabilities by using

areas under a suitable normal curve. The development of the mathematical theory for

doing so is credited to Abraham de Moivre (1667–1754) and Pierre-Simon Laplace

(1749–1827). For more information on de Moivre and Laplace, see the biographies at

the end of Chapters 12 and 7, respectively.

First, we need to review briefly the binomial distribution, which we discussed in

detail in Section 5.3. Suppose that n identical independent success–failure experiments

are performed, with the probability of success on any given trial being p. Let X denote

the total number of successes in the n trials. Then, the probability distribution of the



† Coverage of the binomial distribution (Section 5.3) is prerequisite to this section.



6.5 Normal Approximation to the Binomial Distribution∗



319



random variable X is given by the binomial probability formula,

P(X = x) =



n x

p (1 − p)n−x ,

x



x = 0, 1, 2, . . . , n.



We say that X has the binomial distribution with parameters n and p.

You might be wondering why we would use normal-curve areas to approximate

binomial probabilities when we can obtain them exactly with the binomial probability

formula. Example 6.19 provides the reason.



EXAMPLE 6.19



The Need to Approximate Binomial Probabilities

Mortality Mortality tables enable actuaries to obtain the probability that a person

at any particular age will live a specified number of years. Insurance companies

and others use such probabilities to determine life-insurance premiums, retirement

pensions, and annuity payments.

According to tables provided by the National Center for Health Statistics in Vital

Statistics of the United States, a person of age 20 years has about an 80% chance of

being alive at age 65 years. In Example 5.12 on pages 264–266, we used the binomial

probability formula to determine probabilities for the number of 20-year-olds out of

three who will be alive at age 65.

For most real-world problems, the number of people under investigation is much

larger than three. Although in principle we can use the binomial probability formula

to determine probabilities regardless of number, in practice we do not. Suppose, for

instance, that 500 people of age 20 years are selected at random. Find the probability that

a. exactly 390 of them will be alive at age 65.

b. between 375 and 425 of them, inclusive, will be alive at age 65.



Solution Let X denote the number of people of the 500 who are alive at age 65.

Then X has the binomial distribution with parameters n = 500 (the 500 people) and

p = 0.8 (the probability a person of age 20 will be alive at age 65). In principle, we

can determine probabilities for X exactly by using the binomial probability formula,

P(X = x) =



500

(0.8)x (0.2)500−x .

x



Let’s use that formula for parts (a) and (b).

a. The “answer” is

P(X = 390) =



500

(0.8)390 (0.2)110 .

390



However, obtaining the numerical value of the expression on the right-hand side

is not easy, even with a calculator. Such computations often lead to roundoff

errors and to numbers so large or so small that they are outside the range of

the calculator. Fortunately, we can sidestep the calculations altogether by using

normal-curve areas.

b. The “answer” is

P(375 ≤ X ≤ 425) = P(X = 375) + P(X = 376) + · · · + P(X = 425)

=



500

500

(0.8)375 (0.2)125 +

(0.8)376 (0.2)124

375

376

500

+ ··· +

(0.8)425 (0.2)75 .

425



Here we have the same computational difficulties as we did in part (a), except that we

must evaluate 51 complex expressions instead of 1. Again, the binomial probability

formula is too difficult to use, and we will need to use normal-curve areas.



320



CHAPTER 6 The Normal Distribution



The previous example makes clear that using the binomial probability formula

when the number of trials, n, is very large is impractical. Under certain conditions

on n and p, the distribution of a binomial random variable is roughly bell shaped. In

such cases, we can approximate probabilities for the random variable by areas under a

suitable normal curve, as shown in the next example.



EXAMPLE 6.20



Approximating Binomial Probabilities,

Using Normal-Curve Areas

True–False Exams A student is taking a true–false exam with 10 questions. Assume that the student guesses at all 10 questions.

a. Determine the probability that the student gets either 7 or 8 answers correct.

b. Approximate the probability obtained in part (a) by an area under a suitable

normal curve.



TABLE 6.7

Probability distribution of the number

of correct answers of 10 by the student



Number correct

x



Probability

P(X = x)



0

1

2

3

4

5

6

7

8

9

10



0.0010

0.0098

0.0439

0.1172

0.2051

0.2461

0.2051

0.1172

0.0439

0.0098

0.0010



Solution Let X denote the number of correct answers by the student. Then X has

the binomial distribution with parameters n = 10 (the 10 questions) and p = 0.5

(the probability of a correct guess).

a. Probabilities for X are given by the binomial probability formula

P(X = x) =



10

(0.5)x (1 − 0.5)10−x .

x



Using this formula, we get the probability distribution of X , as shown in

Table 6.7. According to that table, the probability the student gets either 7 or

8 answers correct is

P(X = 7 or 8) = P(X = 7) + P(X = 8) = 0.1172 + 0.0439 = 0.1611.

b. Referring to Table 6.7, we drew the probability histogram of X in Fig. 6.31.

Because the probability histogram is bell shaped, probabilities for X can be

approximated by areas under a normal curve. The appropriate normal curve is

the one whose parameters are the same as the mean and standard deviation of X ,

which, by Formula 5.2 on page 267, are

μ = np = 10 · 0.5 = 5

and

np(1 − p) =



σ =



10 · 0.5 · (1 − 0.5) = 1.58.



Therefore, the required normal curve has parameters μ = 5 and σ = 1.58; it is

superimposed on the probability histogram in Fig. 6.31.

FIGURE 6.31

Probability histogram for X

with superimposed normal curve



P(X = x )

0.30

0.25

0.20



P (X = 7 or 8)

Normal curve

(␮ = 5, ␴ = 1.58)



Area under normal curve

between 6.5 and 8.5



0.15

0.10

0.05

0.00



0



1



2



3



4



5



6



7



6.5



8



9 10



x



8.5



The probability P(X = 7 or 8) equals the area of the corresponding bars of

the histogram, cross-hatched in Fig. 6.31. Note that the cross-hatched area approximately equals the area under the normal curve between 6.5 and 8.5, shaded

in Fig. 6.31.



6.5 Normal Approximation to the Binomial Distribution∗



?



What Does It Mean?



The normal-curve area

provides an excellent approximation of the exact probability.



321



Figure 6.31 makes clear why we consider the area under the normal curve

between 6.5 and 8.5 instead of between 7 and 8. This adjustment is called the

correction for continuity. It is required because we are approximating the distribution of a discrete variable by that of a continuous variable.

Figure 6.31 shows that P(X = 7 or 8) roughly equals the area under the

normal curve with parameters μ = 5 and σ = 1.58 that lies between 6.5 and 8.5.

To compute this area, we convert to z-scores and then find the corresponding

area under the standard normal curve in the usual way, as shown in Fig. 6.32.

The last line in Fig. 6.32 shows that the area under the normal curve between 6.5 and 8.5 is 0.1579. This area is close to P(X = 7 or 8), which, as we

found in part (a), is 0.1611.



FIGURE 6.32



Normal curve

(␮ = 5, ␴ = 1.58)



Determination of the area under the

normal curve with parameters μ = 5 and

σ = 1.58 that lies between 6.5 and 8.5



5 6.5



8.5



x



0 0.95 2.22



z



z-score computations:

x = 6.5

x = 8.5



6.5 − 5

z=

= 0.95

1.58

8.5 − 5

z=

= 2.22

1.58



Area to the left of z :

0.8289

0.9868



Shaded area = 0.9868 − 0.8289 = 0.1579



As indicated by the previous example, we can use normal-curve areas to approximate probabilities for binomial random variables that have bell-shaped distributions.

Whether a particular binomial random variable has a bell-shaped distribution depends

on its parameters, n and p. Figure 6.33 on the next page shows nine different binomial

distributions.

As illustrated in Figs. 6.33(a) and 6.33(c), a binomial distribution with p = 0.5

is always skewed. For small n, such a distribution is too skewed to allow a normal

approximation but, for large n, is sufficiently bell shaped to permit it. In contrast,

Fig. 6.33(b) illustrates that a binomial distribution with p = 0.5 is always symmetric.

Nonetheless, such a distribution will not be sufficiently bell shaped to permit a normal

approximation if n is too small.

The customary rule of thumb for using the normal approximation is that both np

and n(1 − p) are 5 or greater. This restriction indicates that the farther the success

probability is from 0.5, the larger the number of trials must be to use the normal approximation.



Procedure for Using the Normal Approximation

to the Binomial Distribution

We can now write a general step-by-step method for approximating binomial probabilities by areas under a normal curve.



PROCEDURE 6.3



To Approximate Binomial Probabilities by Normal-Curve Areas

Step 1



Find n, the number of trials, and p, the success probability.



Step 2



Continue only if both np and n(1 − p) are 5 or greater.



Find μ and σ , using the formulas μ = np and σ = np(1 − p).



Step 3



Step 4 Make the correction for continuity, and find the required area under the

normal curve with parameters μ and σ .



322



CHAPTER 6 The Normal Distribution



FIGURE 6.33 Nine different binomial distributions

P (X = x )



P(X = x )



P (X = x )



0.40



0.40



0.40



0.35



0.35



0.35



0.30



0.30



0.30



0.25



0.25



0.25



0.20



0.20



0.20



0.15



0.15



0.15



0.10



0.10



0.10



0.05

0.00



0.05

0 2 4



x



0.00



(n = 5, p = 0.3)



0.05

x



0 2 4



0.00



(n = 5, p = 0.5)



P (X = x )



(n = 5, p = 0.8)



P(X = x )



P (X = x )



0.30



0.30



0.30



0.25



0.25



0.25



0.20



0.20



0.20



0.15



0.15



0.15



0.10



0.10



0.10



0.05

0.00



0.05

0 2 4 6 8 10



x



0.00



(n = 10, p = 0.3)



0.05

0 2 4 6 8 10



x



0.00



(n = 10, p = 0.5)



P (X = x )



0.25



0.25



0.20



0.20



0.20



0.15



0.15



0.15



0.10



0.10



0.10



0.05



0.05

0 2 4 6 8 10 12 14 16 18 20



0.00



x



P (X = x )



0.25



0.00



0 2 4 6 8 10

(n = 10, p = 0.8)



P(X = x )



x



x



0 2 4



0.05

0 2 4 6 8 10 12 14 16 18 20



x



0.00



0 2 4 6 8 10 12 14 16 18 20



(n = 20, p = 0.3)



(n = 20, p = 0.5)



(n = 20, p = 0.8)



(a) p = 0.3



(b) p = 0.5



(c) p = 0.8



x



Step 4 of Procedure 6.3 requires the correction for continuity, as illustrated in

Example 6.20. For instance, when using normal-curve areas to approximate the probability that an observed value of a binomial random variable will be between two whole

numbers, inclusive, we subtract 0.5 from the smaller whole number and add 0.5 to the

larger whole number before finding the area under the normal curve.

In general, we always make the correction factor (add or subtract 0.5) that leads

us to the original whole numbers. For example, if we want to approximate P(X < 16),

the whole numbers in question are 0, 1, 2, . . . , 15; thus, we would find the area under

the normal curve that lies between −0.5 and 15.5. Similarly, if we want to approximate

P(12 < X ≤ 16), the whole numbers in question are 13, 14, 15, and 16; hence, we

would find the area under the normal curve that lies between 12.5 and 16.5.



EXAMPLE 6.21



Normal Approximation to the Binomial

Mortality The probability is 0.80 that a person of age 20 years will be alive at

age 65 years. Suppose that 500 people of age 20 are selected at random. Determine

the probability that



6.5 Normal Approximation to the Binomial Distribution∗



323



a. exactly 390 of them will be alive at age 65.

b. between 375 and 425 of them, inclusive, will be alive at age 65.



Solution We will approximate the probabilities in parts (a) and (b) by using

Procedure 6.3.

Step 1 Find n, the number of trials, and p, the success probability.

We have n = 500 and p = 0.8.

Step 2 Continue only if both np and n(1 − p) are 5 or greater.

From the values for n and p noted in Step 1,

np = 500 · 0.8 = 400 and



n(1 − p) = 500 · 0.2 = 100.



Both np and n(1 − p) are greater than 5, so we can continue.



Step 3 Find μ and σ , using the formulas μ = np and σ = np(1 − p).



We get μ = 500 · 0.8 = 400 and σ = 500 · 0.8 · 0.2 = 8.94.

Step 4 Make the correction for continuity, and find the required area under the

normal curve with parameters μ and σ .

a. To make the correction for continuity, we subtract 0.5 from 390 and add 0.5

to 390. Thus we need to find the area under the normal curve with parameters

μ = 400 and σ = 8.94 that lies between 389.5 and 390.5. This area, 0.0236, is

found in Fig. 6.34. So, P(X = 390) = 0.0236, approximately.



FIGURE 6.34



Normal curve

(␮ = 400, ␴ = 8.94)



Determination of the area under the

normal curve with parameters

μ = 400 and σ = 8.94 that lies

between 389.5 and 390.5



400

389.5



390.5



−1.17



−1.06



z -score computations:

x = 389.5

x = 390.5



389.5 − 400

= −1.17

8.94

390.5 − 400

z=

= −1.06

8.94

z=



x

z

Area to the left of z :

0.1210

0.1446



Shaded area = 0.1446 − 0.1210 = 0.0236



Interpretation The probability is about 0.0236 that exactly 390 of the

500 people selected will be alive at age 65.

b. To make the correction for continuity, we subtract 0.5 from 375 and add 0.5

to 425. Thus we need to determine the area under the normal curve with parameters μ = 400 and σ = 8.94 that lies between 374.5 and 425.5. As in

part (a), we convert to z-scores, and then find the corresponding area under the

standard normal curve. This area is 0.9956. So, P(375 ≤ X ≤ 425) = 0.9956,

approximately.

Exercise 6.167

on page 324



Interpretation The probability is approximately 0.9956 that between 375

and 425 of the 500 people selected will be alive at age 65.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

4 Assessing Normality; Normal Probability Plots

Tải bản đầy đủ ngay(0 tr)

×