4 Assessing Normality; Normal Probability Plots
Tải bản đầy đủ - 0trang
6.4 Assessing Normality; Normal Probability Plots
KEY FACT 6.7
?
313
Guidelines for Assessing Normality Using
a Normal Probability Plot
What Does It Mean?
To assess the normality of a variable using sample data, construct a normal
probability plot.
r If the plot is roughly linear, you can assume that the variable is approximately normally distributed.
r If the plot is not roughly linear, you can assume that the variable is not
approximately normally distributed.
Roughly speaking, a
normal probability plot that
falls nearly in a straight line
indicates a normal variable, and
one that does not indicates a
nonnormal variable.
These guidelines should be interpreted loosely for small samples but usually
interpreted strictly for large samples.
In practice, normal probability plots are generated by computer. However, to better
understand these plots, constructing a few by hand is helpful. Table III in Appendix A
gives the normal scores for sample sizes from 5 to 30. In the next example, we explain
how to use Table III to obtain a normal probability plot.
TABLE 6.4
Adjusted gross incomes ($1000s)
9.7
81.4
12.8
93.1
51.1
7.8
33.0
43.5
18.1
21.2
10.6
12.7
TABLE 6.5
Ordered data and normal scores
Adjusted gross
income
Normal
score
7.8
9.7
10.6
12.7
12.8
18.1
21.2
33.0
43.5
51.1
81.4
93.1
−1.64
−1.11
−0.79
−0.53
−0.31
−0.10
0.10
0.31
0.53
0.79
1.11
1.64
Normal Probability Plots
Adjusted Gross Incomes The Internal Revenue Service publishes data on federal
individual income tax returns in Statistics of Income, Individual Income Tax Returns.
A simple random sample of 12 returns from last year revealed the adjusted gross
incomes, in thousands of dollars, shown in Table 6.4. Construct a normal probability
plot for these data, and use the plot to assess the normality of adjusted gross incomes.
Solution Here the variable is adjusted gross income, and the population consists
of all of last year’s federal individual income tax returns. To construct a normal
probability plot, we first arrange the data in increasing order and obtain the normal
scores from Table III. The ordered data are shown in the first column of Table 6.5;
the normal scores, from the n = 12 column of Table III, are shown in the second
column of Table 6.5.
Next, we plot the points in Table 6.5, using the horizontal axis for the adjusted
gross incomes and the vertical axis for the normal scores. For instance, the first
point plotted has a horizontal coordinate of 7.8 and a vertical coordinate of −1.64.
Figure 6.29 shows all 12 points from Table 6.5. This graph is the normal probability
plot for the sample of adjusted gross incomes. Note that the normal probability plot
in Fig. 6.29 is curved, not linear.
FIGURE 6.29 Normal probability plot for the
sample of adjusted gross incomes
3
Normal score
EXAMPLE 6.16
2
1
0
−1
−2
−3
10 20 30 40 50 60 70 80 90 100
Adjusted gross income
($1000s)
314
CHAPTER 6 The Normal Distribution
Interpretation In light of Key Fact 6.7, last year’s adjusted gross incomes
apparently are not (approximately) normally distributed.
Report 6.4
Exercise 6.127(a), (c)
on page 317
Note: If two or more observations in a sample are equal, you can think of them as
slightly different from one another for purposes of obtaining their normal scores.
In some books and statistical technologies, you may encounter one or more of the
following differences in normal probability plots:
r The vertical axis is used for the data and the horizontal axis for the normal scores.
r A probability or percent scale is used instead of normal scores.
r An averaging process is used to assign equal normal scores to equal observations.
r The method used for computing normal scores differs from the one used to obtain
Table III.
Detecting Outliers with Normal Probability Plots
Recall that outliers are observations that fall well outside the overall pattern of the data.
We can also use normal probability plots to detect outliers.
EXAMPLE 6.17
TABLE 6.6
Sample of last year’s chicken
consumption (lb)
57
72
60
69
65
75
63
91
55
49
59
80
63
0
73
61
82
Using Normal Probability Plots to Detect Outliers
Chicken Consumption The U.S. Department of Agriculture publishes data on
U.S. chicken consumption in Food Consumption, Prices, and Expenditures. The
annual chicken consumption, in pounds, for 17 randomly selected people is displayed in Table 6.6. A normal probability plot for these observations is presented in
Fig. 6.30(a). Use the plot to discuss the distribution of chicken consumption and to
detect any outliers.
Solution Figure 6.30(a) reveals that the normal probability plot falls roughly in a
straight line, except for the point corresponding to 0 lb, which falls well outside the
overall pattern of the plot.
3
3
2
2
Normal score
Normal score
FIGURE 6.30 Normal probability plots for chicken consumption: (a) original data;
(b) data with outlier removed
1
0
outlier
–1
–2
–3
1
0
–1
–2
–3
10 20 30 40 50 60 70 80 90 100
10 20 30 40 50 60 70 80 90 100
Chicken consumption (lb)
Chicken consumption (lb)
(a)
(b)
Interpretation The observation of 0 lb is an outlier, which might be a recording
error or due to a person in the sample who does not eat chicken, such as a vegetarian.
If we remove the outlier 0 lb from the sample data and draw a new normal
probability plot, Fig. 6.30(b) shows that this plot is quite linear.
Exercise 6.127(b)
on page 317
Interpretation It appears plausible that, among people who eat chicken, the
amounts they consume annually are (approximately) normally distributed.
6.4 Assessing Normality; Normal Probability Plots
315
Although the visual assessment of normality that we studied in this section is subjective, it is sufficient for most statistical analyses.
THE TECHNOLOGY CENTER
Most statistical technologies have programs that automatically construct normal probability plots. In this subsection, we present output and step-by-step instructions for such
programs.
EXAMPLE 6.18
Using Technology to Obtain Normal Probability Plots
Adjusted Gross Incomes Use Minitab, Excel, or the TI-83/84 Plus to obtain a normal probability plot for the adjusted gross incomes in Table 6.4 on page 313.
Solution We applied the normal-probability-plot programs to the data, resulting
in Output 6.4. Steps for generating that output are presented in Instructions 6.3 on
the following page.
OUTPUT 6.4 Normal probability plots for the sample of adjusted gross incomes
MINITAB
TI-83/84 PLUS
EXCEL
316
CHAPTER 6 The Normal Distribution
INSTRUCTIONS 6.3
Steps for generating Output 6.4
MINITAB
1
2
3
4
5
6
7
Store the data from Table 6.4 in a column named AGI
Choose Graph ➤ Probability Plot. . .
Click OK
Press the F3 key to reset the dialog box
Specify AGI in the Graph variables text box
Click the Distribution. . . button
Click the Data Display tab, select the Symbols only
option button from the Data Display list, and click OK
8 Click the Scale. . . button
9 Click the Y-Scale Type tab, select the Score option
button from the Y-Scale Type list, and click OK
10 Click OK
EXCEL
1 Store the data from Table 6.4 in a column named AGI
2 Choose XLSTAT ➤ Visualizing data ➤ Univariate plots
3 Click the reset button in the lower left corner of the
dialog box
4 Click in the Quantitative data selection box and then
select the column of the worksheet that contains the
AGI data
5 Click the Options tab and uncheck the Descriptive
statistics check box
6 Click the Charts (1) tab, uncheck the Box plots check
box, and check the Normal Q-Q plots check box
7 Click OK
8 Click the Continue button in the XLSTAT – Selections
dialog box
TI-83/84 PLUS
1
2
3
4
5
6
7
8
Store the data from Table 6.4 in a list named AGI
Ensure that all stat plots and all Y = functions are off
Press 2nd ➤ STAT PLOT and then press ENTER twice
Arrow to the sixth graph icon and press ENTER
Press the down-arrow key
Press 2nd ➤ LIST
Arrow down to AGI and press ENTER
Press ZOOM, then 9, and then TRACE
Exercises 6.4
Understanding the Concepts and Skills
6.122
3
6.116 Under what circumstances is using a normal probability plot
to assess the normality of a variable usually better than using a histogram, stem-and-leaf diagram, or dotplot?
2
1
0
6.117 Explain why assessing the normality of a variable is often
important.
–1
–2
6.118 Explain in detail what a normal probability plot is and how it
is used to assess the normality of a variable.
–3
600
6.119 How is a normal probability plot used to detect outliers?
800
1000
1200
1400
6.123
3
6.120 Explain how to obtain normal scores from Table III in Appendix A when a sample contains equal observations.
2
1
In each of Exercises 6.121–6.126, we have provided a normal probability plot of data from a sample of a population. In each case, assess
the normality of the variable under consideration.
0
−1
−2
−3
6.121
3
0
2
6.124
1
0
20 40 60 80 100 120 140
3
2
−1
1
−2
0
−3
−1
70
80
90
100
110
120
−2
−3
40
50
60
70
80
6.4 Assessing Normality; Normal Probability Plots
6.125
317
by consumers on nonalcoholic beverages was $333. A random sample
of 12 consumers yielded the following data, in dollars, on last year’s
expenditures on nonalcoholic beverages.
3
2
1
472 287 295 376
370 392 351 384
370 360 305 369
0
−1
−2
−3
60
6.126
70
80
90
100
3
In Exercises 6.131–6.134,
a. obtain a normal probability plot of the given data.
b. use part (a) to identify any outliers.
c. use part (a) to assess the normality of the variable under consideration.
6.131 Suppose a sample of the height in inches of players in the starting lineup of a particular basketball team was obtained and shown below. Use a normal probability plot to assess whether the sample data
could have come from a population that is normally distributed.
a. Construct a normal probability plot of the data.
b. Use the normal probability plot to identify any outliers.
c. Based on the probability plot, does the sample appear to come
from a normally distributed population?
2
1
0
−1
−2
−3
50
60
70
80
90
Applying the Concepts and Skills
In Exercises 6.127–6.130,
a. use Table III in Appendix A to construct a normal probability plot
of the given data.
b. use part (a) to identify any outliers.
c. use part (a) to assess the normality of the variable under consideration.
6.127 Exam Scores. A sample of the final exam scores in a large
introductory statistics course is as follows.
88
85
90
81
67
82
63
96
64
39
89
100
76
75
90
70
86
34
84
96
6.128 Cellular Bills. CTIA–The Wireless Association collects data
on cell phones and publishes the results in Semi-annual Wireless
Survey. A sample of 15 monthly cell-phone bills gave the following
data (to the nearest dollar).
25
55
45
95
15
55
75
45
20
15
45
65
55
35
60
6.129 Sachin Tendulkar. Sachin Tendulkar is a former Indian
cricket player who played international cricket from 1989 to 2013.
The following table shows the number of runs he scored in 20 years
of his entire career in one day international matches.
6.132 Skydiving Fatalities. In 2013, United States Parachute Association (USPA) recorded 24 fatal skydiving accidents in the U.S.
out of roughly 3.2 million jumps. The data in the following table is
the number of fatalities that occurred from 1999 to 2013.
32
27
21
35 33 25 21
21 18 30 16
25 19 24 14
6.133 Oxygen Distribution. In the article “Distribution of Oxygen
in Surface Sediments from Central Sagami Bay, Japan: In Situ Measurements by Microelectrodes and Planar Optodes” (Deep Sea Research Part I: Oceanographic Research Papers, Vol. 52, Issue 10,
pp. 1974–1987), R. Glud et al. explored the distributions of oxygen
in surface sediments from central Sagami Bay. The oxygen distribution gives important information on the general biogeochemistry of
marine sediments. Measurements were performed at 16 sites. A sample of 22 depths yielded the following data, in millimoles per square
meter per day (mmol m−2 d−1 ), on diffusive oxygen uptake (DOU).
1.8
3.3
1.1
2.0 1.8
1.2 3.6
0.7 1.0
2.3 3.8
1.9 7.6
1.8 1.8
3.4 2.7
2.0 1.5
6.7
1.1
2.0
6.134 Medieval Cremation Burials. In the article “Material Culture as Memory: Combs and Cremations in Early Medieval Britain”
(Early Medieval Europe, Vol. 12, Issue 2, pp. 89–128), H. Williams
discussed the frequency of cremation burials found in 17 archaeological sites in eastern England. Here are the data.
83
64 46 48 523 35
34 265 2484
46 385 21 86 429 51 258 119
0
239 417 704
319 1089 444 1141 412 1425
1611 1011 1894 843 1328
904 741
812 628
460
6.130 Beverage Expenditures. The Bureau of Labor Statistics
publishes information on average annual expenditures by consumers
in the Consumer Expenditure Survey. In 2010, the mean amount spent
Working with Large Data Sets
6.135 The data set to the right is the body mass index of 20 men.
Use technology to obtain a histogram and normal probability plot of
318
CHAPTER 6 The Normal Distribution
the data and use them to assess the (approximate) normality of the
data.
26.0
26.9
23.1
25.6
23.8
32.0
23.1
26.7
25.3
20.7
26.4
21.7
23.4
28.5
24.1
24.5
27.7
19.9
33.4
27.3
a. Construct a histogram of the data.
b. Construct a normal probability plot of the data.
Extending the Concepts and Skills
6.138 Finger Length of Criminals. In 1902, W. R. Macdonell published the article “On Criminal Anthropometry and the Identification
of Criminals” (Biometrika,Vol. 1, pp. 177–227). Among other things,
the author presented data on the left middle finger length, in centimeters. The following table provides the midpoints and frequencies of
the finger-length classes used.
Midpoint
(cm)
Frequency
Midpoint
(cm)
Frequency
9.5
9.8
10.1
10.4
10.7
11.0
11.3
1
4
24
67
193
417
575
11.6
11.9
12.2
12.5
12.8
13.1
13.4
691
509
306
131
63
16
3
6.136 Vegetarians and Omnivores. Philosophical and health issues are prompting an increasing number of Taiwanese to switch
to a vegetarian lifestyle. In the paper “LDL of Taiwanese Vegetarians Are Less Oxidizable than Those of Omnivores” (Journal of Nutrition, Vol. 130, pp. 1591–1596), S. Lu et al. compared the daily
intake of nutrients by vegetarians and omnivores living in Taiwan.
Among the nutrients considered was protein. Too little protein stunts
growth and interferes with all bodily functions; too much protein
puts a strain on the kidneys, can cause diarrhea and dehydration,
and can leach calcium from bones and teeth. The daily protein intakes, in grams, for 51 female vegetarians and 53 female omnivores are provided on the WeissStats site. Use the technology of
your choice to do the following for each of the two sets of sample
data.
a. Obtain a histogram of the data and use it to assess the (approximate) normality of the variable under consideration.
b. Obtain a normal probability plot of the data and use it to assess
the (approximate) normality of the variable under consideration.
c. Compare your results in parts (a) and (b).
Use these data and the technology of your choice to assess the normality of middle finger length of criminals by using
a. a histogram.
b. a normal probability plot.
6.137 “Chips Ahoy! 1,000 Chips Challenge.” Students in an introductory statistics course at the U.S. Air Force Academy participated in Nabisco’s “Chips Ahoy! 1,000 Chips Challenge” by
confirming that there were at least 1000 chips in every 18-ounce
bag of cookies that they examined. As part of their assignment,
they concluded that the number of chips per bag is approximately
normally distributed. Their conclusion was based on the data provided on the WeissStats site, which gives the number of chips
per bag for 42 bags. Do you agree with the conclusion of the
students? Explain your answer. [SOURCE: B. Warner and J. Rutledge, “Checking the Chips Ahoy! Guarantee,” Chance, Vol. 12(1),
pp. 10–14]
6.140 Emergency Room Traffic. Desert Samaritan Hospital in
Mesa, Arizona, keeps records of emergency room traffic. Those
records reveal that the times between arriving patients have a special type of reverse-J-shaped distribution called an exponential distribution. The records also show that the mean time between arriving
patients is 8.7 minutes.
a. Use the technology of your choice to simulate four random samples of 75 interarrival times each.
b. Obtain a normal probability plot of each sample in part (a).
c. Are the normal probability plots in part (b) what you expected?
Explain your answer.
6.5
6.139 Household Expenditure. In a certain city, household expenditure is generally distributed with a mean of $460 and a standard
deviation of $54.
a. Use the technology of your choice to simulate four random samples of 100 houses each.
b. Obtain a normal probability plot of each sample in part (a).
c. Are the normal probability plots in part (b) what you expected?
Explain your answer.
Normal Approximation to the Binomial Distribution∗ †
In this section, we demonstrate the approximation of binomial probabilities by using
areas under a suitable normal curve. The development of the mathematical theory for
doing so is credited to Abraham de Moivre (1667–1754) and Pierre-Simon Laplace
(1749–1827). For more information on de Moivre and Laplace, see the biographies at
the end of Chapters 12 and 7, respectively.
First, we need to review briefly the binomial distribution, which we discussed in
detail in Section 5.3. Suppose that n identical independent success–failure experiments
are performed, with the probability of success on any given trial being p. Let X denote
the total number of successes in the n trials. Then, the probability distribution of the
† Coverage of the binomial distribution (Section 5.3) is prerequisite to this section.
6.5 Normal Approximation to the Binomial Distribution∗
319
random variable X is given by the binomial probability formula,
P(X = x) =
n x
p (1 − p)n−x ,
x
x = 0, 1, 2, . . . , n.
We say that X has the binomial distribution with parameters n and p.
You might be wondering why we would use normal-curve areas to approximate
binomial probabilities when we can obtain them exactly with the binomial probability
formula. Example 6.19 provides the reason.
EXAMPLE 6.19
The Need to Approximate Binomial Probabilities
Mortality Mortality tables enable actuaries to obtain the probability that a person
at any particular age will live a specified number of years. Insurance companies
and others use such probabilities to determine life-insurance premiums, retirement
pensions, and annuity payments.
According to tables provided by the National Center for Health Statistics in Vital
Statistics of the United States, a person of age 20 years has about an 80% chance of
being alive at age 65 years. In Example 5.12 on pages 264–266, we used the binomial
probability formula to determine probabilities for the number of 20-year-olds out of
three who will be alive at age 65.
For most real-world problems, the number of people under investigation is much
larger than three. Although in principle we can use the binomial probability formula
to determine probabilities regardless of number, in practice we do not. Suppose, for
instance, that 500 people of age 20 years are selected at random. Find the probability that
a. exactly 390 of them will be alive at age 65.
b. between 375 and 425 of them, inclusive, will be alive at age 65.
Solution Let X denote the number of people of the 500 who are alive at age 65.
Then X has the binomial distribution with parameters n = 500 (the 500 people) and
p = 0.8 (the probability a person of age 20 will be alive at age 65). In principle, we
can determine probabilities for X exactly by using the binomial probability formula,
P(X = x) =
500
(0.8)x (0.2)500−x .
x
Let’s use that formula for parts (a) and (b).
a. The “answer” is
P(X = 390) =
500
(0.8)390 (0.2)110 .
390
However, obtaining the numerical value of the expression on the right-hand side
is not easy, even with a calculator. Such computations often lead to roundoff
errors and to numbers so large or so small that they are outside the range of
the calculator. Fortunately, we can sidestep the calculations altogether by using
normal-curve areas.
b. The “answer” is
P(375 ≤ X ≤ 425) = P(X = 375) + P(X = 376) + · · · + P(X = 425)
=
500
500
(0.8)375 (0.2)125 +
(0.8)376 (0.2)124
375
376
500
+ ··· +
(0.8)425 (0.2)75 .
425
Here we have the same computational difficulties as we did in part (a), except that we
must evaluate 51 complex expressions instead of 1. Again, the binomial probability
formula is too difficult to use, and we will need to use normal-curve areas.
320
CHAPTER 6 The Normal Distribution
The previous example makes clear that using the binomial probability formula
when the number of trials, n, is very large is impractical. Under certain conditions
on n and p, the distribution of a binomial random variable is roughly bell shaped. In
such cases, we can approximate probabilities for the random variable by areas under a
suitable normal curve, as shown in the next example.
EXAMPLE 6.20
Approximating Binomial Probabilities,
Using Normal-Curve Areas
True–False Exams A student is taking a true–false exam with 10 questions. Assume that the student guesses at all 10 questions.
a. Determine the probability that the student gets either 7 or 8 answers correct.
b. Approximate the probability obtained in part (a) by an area under a suitable
normal curve.
TABLE 6.7
Probability distribution of the number
of correct answers of 10 by the student
Number correct
x
Probability
P(X = x)
0
1
2
3
4
5
6
7
8
9
10
0.0010
0.0098
0.0439
0.1172
0.2051
0.2461
0.2051
0.1172
0.0439
0.0098
0.0010
Solution Let X denote the number of correct answers by the student. Then X has
the binomial distribution with parameters n = 10 (the 10 questions) and p = 0.5
(the probability of a correct guess).
a. Probabilities for X are given by the binomial probability formula
P(X = x) =
10
(0.5)x (1 − 0.5)10−x .
x
Using this formula, we get the probability distribution of X , as shown in
Table 6.7. According to that table, the probability the student gets either 7 or
8 answers correct is
P(X = 7 or 8) = P(X = 7) + P(X = 8) = 0.1172 + 0.0439 = 0.1611.
b. Referring to Table 6.7, we drew the probability histogram of X in Fig. 6.31.
Because the probability histogram is bell shaped, probabilities for X can be
approximated by areas under a normal curve. The appropriate normal curve is
the one whose parameters are the same as the mean and standard deviation of X ,
which, by Formula 5.2 on page 267, are
μ = np = 10 · 0.5 = 5
and
np(1 − p) =
σ =
10 · 0.5 · (1 − 0.5) = 1.58.
Therefore, the required normal curve has parameters μ = 5 and σ = 1.58; it is
superimposed on the probability histogram in Fig. 6.31.
FIGURE 6.31
Probability histogram for X
with superimposed normal curve
P(X = x )
0.30
0.25
0.20
P (X = 7 or 8)
Normal curve
( = 5, = 1.58)
Area under normal curve
between 6.5 and 8.5
0.15
0.10
0.05
0.00
0
1
2
3
4
5
6
7
6.5
8
9 10
x
8.5
The probability P(X = 7 or 8) equals the area of the corresponding bars of
the histogram, cross-hatched in Fig. 6.31. Note that the cross-hatched area approximately equals the area under the normal curve between 6.5 and 8.5, shaded
in Fig. 6.31.
6.5 Normal Approximation to the Binomial Distribution∗
?
What Does It Mean?
The normal-curve area
provides an excellent approximation of the exact probability.
321
Figure 6.31 makes clear why we consider the area under the normal curve
between 6.5 and 8.5 instead of between 7 and 8. This adjustment is called the
correction for continuity. It is required because we are approximating the distribution of a discrete variable by that of a continuous variable.
Figure 6.31 shows that P(X = 7 or 8) roughly equals the area under the
normal curve with parameters μ = 5 and σ = 1.58 that lies between 6.5 and 8.5.
To compute this area, we convert to z-scores and then find the corresponding
area under the standard normal curve in the usual way, as shown in Fig. 6.32.
The last line in Fig. 6.32 shows that the area under the normal curve between 6.5 and 8.5 is 0.1579. This area is close to P(X = 7 or 8), which, as we
found in part (a), is 0.1611.
FIGURE 6.32
Normal curve
( = 5, = 1.58)
Determination of the area under the
normal curve with parameters μ = 5 and
σ = 1.58 that lies between 6.5 and 8.5
5 6.5
8.5
x
0 0.95 2.22
z
z-score computations:
x = 6.5
x = 8.5
6.5 − 5
z=
= 0.95
1.58
8.5 − 5
z=
= 2.22
1.58
Area to the left of z :
0.8289
0.9868
Shaded area = 0.9868 − 0.8289 = 0.1579
As indicated by the previous example, we can use normal-curve areas to approximate probabilities for binomial random variables that have bell-shaped distributions.
Whether a particular binomial random variable has a bell-shaped distribution depends
on its parameters, n and p. Figure 6.33 on the next page shows nine different binomial
distributions.
As illustrated in Figs. 6.33(a) and 6.33(c), a binomial distribution with p = 0.5
is always skewed. For small n, such a distribution is too skewed to allow a normal
approximation but, for large n, is sufficiently bell shaped to permit it. In contrast,
Fig. 6.33(b) illustrates that a binomial distribution with p = 0.5 is always symmetric.
Nonetheless, such a distribution will not be sufficiently bell shaped to permit a normal
approximation if n is too small.
The customary rule of thumb for using the normal approximation is that both np
and n(1 − p) are 5 or greater. This restriction indicates that the farther the success
probability is from 0.5, the larger the number of trials must be to use the normal approximation.
Procedure for Using the Normal Approximation
to the Binomial Distribution
We can now write a general step-by-step method for approximating binomial probabilities by areas under a normal curve.
PROCEDURE 6.3
To Approximate Binomial Probabilities by Normal-Curve Areas
Step 1
Find n, the number of trials, and p, the success probability.
Step 2
Continue only if both np and n(1 − p) are 5 or greater.
√
Find μ and σ , using the formulas μ = np and σ = np(1 − p).
Step 3
Step 4 Make the correction for continuity, and ﬁnd the required area under the
normal curve with parameters μ and σ .
322
CHAPTER 6 The Normal Distribution
FIGURE 6.33 Nine different binomial distributions
P (X = x )
P(X = x )
P (X = x )
0.40
0.40
0.40
0.35
0.35
0.35
0.30
0.30
0.30
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
0.10
0.10
0.10
0.05
0.00
0.05
0 2 4
x
0.00
(n = 5, p = 0.3)
0.05
x
0 2 4
0.00
(n = 5, p = 0.5)
P (X = x )
(n = 5, p = 0.8)
P(X = x )
P (X = x )
0.30
0.30
0.30
0.25
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
0.10
0.10
0.10
0.05
0.00
0.05
0 2 4 6 8 10
x
0.00
(n = 10, p = 0.3)
0.05
0 2 4 6 8 10
x
0.00
(n = 10, p = 0.5)
P (X = x )
0.25
0.25
0.20
0.20
0.20
0.15
0.15
0.15
0.10
0.10
0.10
0.05
0.05
0 2 4 6 8 10 12 14 16 18 20
0.00
x
P (X = x )
0.25
0.00
0 2 4 6 8 10
(n = 10, p = 0.8)
P(X = x )
x
x
0 2 4
0.05
0 2 4 6 8 10 12 14 16 18 20
x
0.00
0 2 4 6 8 10 12 14 16 18 20
(n = 20, p = 0.3)
(n = 20, p = 0.5)
(n = 20, p = 0.8)
(a) p = 0.3
(b) p = 0.5
(c) p = 0.8
x
Step 4 of Procedure 6.3 requires the correction for continuity, as illustrated in
Example 6.20. For instance, when using normal-curve areas to approximate the probability that an observed value of a binomial random variable will be between two whole
numbers, inclusive, we subtract 0.5 from the smaller whole number and add 0.5 to the
larger whole number before finding the area under the normal curve.
In general, we always make the correction factor (add or subtract 0.5) that leads
us to the original whole numbers. For example, if we want to approximate P(X < 16),
the whole numbers in question are 0, 1, 2, . . . , 15; thus, we would find the area under
the normal curve that lies between −0.5 and 15.5. Similarly, if we want to approximate
P(12 < X ≤ 16), the whole numbers in question are 13, 14, 15, and 16; hence, we
would find the area under the normal curve that lies between 12.5 and 16.5.
EXAMPLE 6.21
Normal Approximation to the Binomial
Mortality The probability is 0.80 that a person of age 20 years will be alive at
age 65 years. Suppose that 500 people of age 20 are selected at random. Determine
the probability that
6.5 Normal Approximation to the Binomial Distribution∗
323
a. exactly 390 of them will be alive at age 65.
b. between 375 and 425 of them, inclusive, will be alive at age 65.
Solution We will approximate the probabilities in parts (a) and (b) by using
Procedure 6.3.
Step 1 Find n, the number of trials, and p, the success probability.
We have n = 500 and p = 0.8.
Step 2 Continue only if both np and n(1 − p) are 5 or greater.
From the values for n and p noted in Step 1,
np = 500 · 0.8 = 400 and
n(1 − p) = 500 · 0.2 = 100.
Both np and n(1 − p) are greater than 5, so we can continue.
√
Step 3 Find μ and σ , using the formulas μ = np and σ = np(1 − p).
√
We get μ = 500 · 0.8 = 400 and σ = 500 · 0.8 · 0.2 = 8.94.
Step 4 Make the correction for continuity, and ﬁnd the required area under the
normal curve with parameters μ and σ .
a. To make the correction for continuity, we subtract 0.5 from 390 and add 0.5
to 390. Thus we need to find the area under the normal curve with parameters
μ = 400 and σ = 8.94 that lies between 389.5 and 390.5. This area, 0.0236, is
found in Fig. 6.34. So, P(X = 390) = 0.0236, approximately.
FIGURE 6.34
Normal curve
( = 400, = 8.94)
Determination of the area under the
normal curve with parameters
μ = 400 and σ = 8.94 that lies
between 389.5 and 390.5
400
389.5
390.5
−1.17
−1.06
z -score computations:
x = 389.5
x = 390.5
389.5 − 400
= −1.17
8.94
390.5 − 400
z=
= −1.06
8.94
z=
x
z
Area to the left of z :
0.1210
0.1446
Shaded area = 0.1446 − 0.1210 = 0.0236
Interpretation The probability is about 0.0236 that exactly 390 of the
500 people selected will be alive at age 65.
b. To make the correction for continuity, we subtract 0.5 from 375 and add 0.5
to 425. Thus we need to determine the area under the normal curve with parameters μ = 400 and σ = 8.94 that lies between 374.5 and 425.5. As in
part (a), we convert to z-scores, and then find the corresponding area under the
standard normal curve. This area is 0.9956. So, P(375 ≤ X ≤ 425) = 0.9956,
approximately.
Exercise 6.167
on page 324
Interpretation The probability is approximately 0.9956 that between 375
and 425 of the 500 people selected will be alive at age 65.