2: Probability as a Basis for Making Decisions
Tải bản đầy đủ - 0trang
6.2
313
Probability as a Basis for Making Decisions
T A B L E 6 .1 Age and Gender Distribution
AGE
Gender
Under 17
17–20
21–24
25–30
31–39
40 and
Older
Male
Female
.006
.004
.15
.18
.16
.14
.08
.10
.04
.04
.01
.09
Suppose that you are told that a 43-year-old student from this university is waiting to meet you. You are asked to decide whether the student is male or female. How
would you respond? What if the student had been 27 years old? What about 33 years
old? Are you equally confident in all three of your choices?
A reasonable response to these questions could be based on the probability information in Table 6.1. We would decide that the 43-year-old student was female. We
cannot be certain that this is correct, but we can see that someone in the 40-and-over
age group is much less likely to be male than female. We would also decide that the
27-year-old was female. However, we would be less confident in our conclusion than
we were for the 43-year-old student. For the age group 31–39, the proportion of
males and the proportion of females are equal, so we would think it equally likely that
a 33-year-old student would be male or female. We could decide in favor of male (or
female), but with little confidence in our conclusion; in other words, there is a good
chance of being incorrect.
E X A M P L E 6 . 6 Can You Pass by Guessing?
A professor planning to give a quiz that consists of 20 true–false questions is interested
in knowing how someone who answers by guessing would do on such a test. To investigate, he asks the 500 students in his introductory psychology course to write the
numbers from 1 to 20 on a piece of paper and then to arbitrarily write T or F next to
each number. The students are forced to guess at the answer to each question, because
they are not even told what the questions are. These answer sheets are then collected
and graded using the key for the quiz. The results are summarized in Table 6.2.
T A B L E 6 .2 Quiz “Guessing” Distribution
Number of
Correct
Responses
Number of
Students
Proportion
of Students
0
1
2
3
4
5
6
7
8
9
10
0
0
1
1
2
8
18
37
58
81
88
.000
.000
.002
.002
.004
.016
.036
.074
.116
.162
.176
Number of
Correct
Responses
Number of
Students
Proportion
of Students
11
12
13
14
15
16
17
18
19
20
79
61
39
18
7
1
1
0
0
0
.158
.122
.078
.036
.014
.002
.002
.000
.000
.000
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
314
Chapter 6 Probability
Because probabilities are long-run proportions, an entry in the “Proportion of
Students” column of Table 6.2 can be considered an estimate of the probability
of correctly guessing a specific number of responses. For example, a proportion of
.122 (or 12.2%) of the 500 students got 12 of the 20 correct when guessing. We
then estimate the long-run proportion of guessers who would get 12 correct to be
.122, and we say that the probability that a student who is guessing will get 12 correct is (approximately) .122.
Let’s use the information in Table 6.2 to answer the following questions.
1. Would you be surprised if someone who is guessing on a 20-question true–
false quiz got only 3 correct? The approximate probability of a guesser getting
3 correct is .002. This means that, in the long run, only about 2 in 1000 guessers would score exactly 3 correct. This would be an unlikely outcome, and we
would consider its occurrence surprising.
2. If a score of 15 or more correct is required to receive a passing grade on the
quiz, is it likely that someone who is guessing will be able to pass? The longrun proportion of guessers who would pass is the sum of the proportions for all
the passing scores (15, 16, 17, 18, 19, and 20). Then,
probability of passing Ϸ .014 1 .002 1 .002 1 .000 1 .000 1 .000 5 .018
It would be unlikely that a student who is guessing would be able to pass.
3. The professor actually gives the quiz, and a student scores 16 correct. Do you
think that the student was just guessing? Let’s begin by assuming that the student
was guessing and determine whether a score as high as 16 is a likely or an unlikely
occurrence. Table 6.2 tells us that the approximate probability of getting a score
at least as high as this student’s score is
probability of scoring 16 or higher < .002 1 .002 1 .000 1 .000 1 .000
5 .004
That is, in the long run, only about 4 times in 1000 would a guesser score 16 or
higher. This would be rare. There are two possible explanations for the observed
score: (1) The student was guessing and was really lucky, or (2) the student was
not just guessing. Given that the first explanation is highly unlikely, a more plausible choice is the second explanation. We would conclude that the student was
not just guessing at the answers. Although we cannot be certain that we are correct in this conclusion, the evidence is compelling.
4. What score on the quiz would it take to convince us that a student was not just
guessing? We would be convinced that a student was not just guessing if his or
her score was high enough that it was unlikely that a guesser would have been
able to do as well. Consider the following approximate probabilities (computed
from the entries in Table 6.2):
Score
Approximate Probability
20
19 or better
18 or better
17 or better
16 or better
15 or better
14 or better
13 or better
.000
.000 1 .000 5 .000
.000 1 .000 1 .000 5 .000
.002 1 .000 1 .000 1 .000 5 .002
.002 1 .002 1 .000 1 .000 1 .000 5 .004
.014 1 .002 1 .002 1 .000 1 .000 1 .000 5 .018
.036 1 .014 1 .002 1 .002 1 .000 1 .000 1 .000 5 .054
.078 1 .036 1 .014 1 .002 1 .002 1 .000 1 .000 1 .000 5 .132
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6.2
315
Probability as a Basis for Making Decisions
We might say that a score of 14 or higher is reasonable evidence that someone is not
guessing, because the approximate probability that a guesser would score this high is
only .054. Of course, if we conclude that a student is not guessing based on a quiz
score of 14 or higher, there is a risk that we are incorrect (about 1 in 20 guessers
would score this high by chance). About 13.2% of the time, a guesser will score 13
or more correct. This would happen by chance often enough that most people would
not rule out the student being a guesser.
Examples 6.5 and 6.6 show how probability information can be used to make a
decision. This is a primary goal of statistical inference. Later chapters look more formally at the problem of drawing a conclusion based on available but often incomplete
information and then assessing the reliability of such a conclusion.
E X E RC I S E S 6 . 1 5 - 6 . 1 8
6.15 Is ultrasound a reliable method for determining
the gender of an unborn baby? Consider the following
data on 1000 births, which are consistent with summary
values that appeared in the online Journal of Statistics
Education (“New Approaches to Learning Probability
in the First Statistics Course” [2001]):
Actual Gender Is Female
Actual Gender Is Male
Ultrasound
Predicted
Female
Ultrasound
Predicted
Male
432
130
48
390
Do you think that a prediction that a baby is male and a
prediction that a baby is female are equally reliable? Explain, using the information in the table to calculate estimates of any probabilities that are relevant to your
conclusion.
6.16 Researchers at UCLA were interested in whether
working mothers were more likely to suffer workplace
injuries than women without children. They studied
1400 working women, and a summary of their findings
was reported in the San Luis Obispo Telegram-Tribune
(February 28, 1995). The information in the following
table is consistent with summary values reported in the
article:
Children
Under 6
Children,
but None
Under 6
32
68
56
156
368
232
644
1244
400
300
700
1400
No
Children
Injured on the
Job in 1989
Not Injured
on the Job
in 1989
Total
The researchers drew the following conclusion: Women
with children younger than age 6 are much more likely
to be injured on the job than childless women or mothers with older children. Provide a justification for the
researchers’ conclusion. Use the information in the table
to calculate estimates of any probabilities that are relevant to your justification.
6.17 A Gallup Poll conducted in November 2002 examined how people perceived the risks associated with
smoking. The following table summarizes data on smoking status and perceived risk of smoking that is consistent
with summary quantities published by Gallup:
Perceived Risk
Very
Somewhat Not Too Not at All
Smoking Status Harmful Harmful
Harmful Harmful
Current Smoker 60
Former Smoker 78
Never Smoked 86
Bold exercises answered in back
Data set available online
Total
30
16
10
5
3
2
1
2
1
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
316
Chapter 6 Probability
Assume that it is reasonable to consider these data representative of the U.S. adult population. Consider the following conclusion: Current smokers are less likely to
view smoking as very harmful than either former smokers or those who have never smoked. Provide a justification for this conclusion. Use the information in the table
to calculate estimates of any probabilities that are relevant to your justification.
Number of Units Secured During First
Attempt to Register
Priority
Group
0–3
4–6
7–9
10–12
More
Than 12
1
2
3
4
.01
.02
.04
.04
.01
.03
.06
.08
.06
.06
.06
.07
.10
.09
.06
.05
.07
.05
.03
.01
6.18 Students at a particular university use an online
registration system to select their courses for the next term.
There are four different priority groups, with students in
Group 1 registering first, followed by those in Group 2,
and so on. Suppose that the university provided the
accompanying information on registration for the fall semester. The entries in the table represent the proportion
of students falling into each of the 20 priority–unit
combinations.
Bold excercises answered in back
6.3
Data set available online
a. What proportion of students at this university got
10 or more units during the first attempt to
register?
b. Suppose that a student reports receiving 11 units
during the first attempt to register. Is it more likely
that he or she is in the first or the fourth priority
group?
c. If you are in the third priority group next term, is it
likely that you will get more than 9 units during the
first attempt to register? Explain.
Video Solution available
Estimating Probabilities Empirically
and by Using Simulation
In the examples presented so far, reaching conclusions required knowledge of the
probabilities of various outcomes. In some cases, this is reasonable, and we know the
true long-run proportion of the time that each outcome will occur. In other situations, these probabilities are not known and must be determined. Sometimes probabilities can be determined analytically, by using mathematical rules and probability
properties, including the basic ones introduced in this chapter.
In this section, we change gears a bit and focus on an empirical approach to probability. When an analytical approach is impossible, impractical, or just beyond the
limited probability tools of the introductory course, we can estimate probabilities
empirically through observation or by using simulation.
Estimating Probabilities Empirically
It is fairly common practice to use observed long-run proportions to estimate probabilities. The process used to estimate probabilities is simple:
1. Observe a large number of chance outcomes under controlled circumstances.
2. By appealing to the interpretation of probability as a long-run relative frequency,
estimate the probability of an outcome by using the observed proportion of
occurrence.
This process is illustrated in Examples 6.7 and 6.8.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
6.3 Estimating Probabilities Empirically and by Using Simulation
EXAMPLE 6.7
317
Fair Hiring Practices
The biology department at a university plans to recruit a new faculty member and
intends to advertise for someone with a Ph.D. in biology and at least 10 years of
college-level teaching experience. A member of the department expresses the belief
that requiring at least 10 years of teaching experience will exclude most potential applicants and will exclude far more female applicants than male applicants. The biology department would like to determine the probability that someone with a Ph.D.
in biology who is looking for an academic position would be eliminated from consideration because of the experience requirement.
A similar university just completed a search in which there was no requirement
for prior teaching experience but the information about prior teaching experience was
recorded. The 410 applications yielded the following data:
NUMBER OF APPLICANTS
Less Than 10 Years
of Experience
10 Years of
Experience or More
Total
178
99
277
112
21
133
290
120
410
Male
Female
Total
Let’s assume that the populations of applicants for the two positions can be regarded as
the same. We will use the available information to approximate the probability that an
applicant will fall into each of the four gender–experience combinations. The estimated
probabilities (obtained by dividing the number of applicants for each gender–experience
combination by 410) are given in Table 6.3. From Table 6.3, we calculate
estimate of P(candidate excluded) 5 .4341 1 .2415 5 .6756
We can also assess the impact of the experience requirement separately for male applicants and for female applicants. From the given information, we calculate
that the proportion of male applicants who have less than 10 years of experience is
178/290 5 .6138, whereas the corresponding proportion for females is 99/120 5
.8250. Therefore, approximately 61% of the male applicants would be eliminated by
the experience requirement, and about 83% of the female applicants would be
eliminated.
T A B L E 6 .3 Estimated Probabilities for Example 6.7
Male
Female
Less Than 10 Years
of Experience
10 Years of
Experience or More
.4341
.2415
.2732
.0512
These subgroup proportions—.6138 for males and .8250 for females—are examples of conditional probabilities. As discussed in Section 6.1, outcomes are dependent if the occurrence of one outcome changes our assessment of the probability that
the other outcome will occur. A conditional probability shows how the original probability changes in light of new information. In this example, the probability that a
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.