ACTIVITY 11.2: Thinking About Data Collection
Tải bản đầy đủ - 0trang
566
Chapter 11 Comparing Two Populations or Treatments
3. Which of the two proposed designs would you recommend, and why?
4. If assigned to do so by your instructor, carry out one
of your experiments and analyze the resulting data.
Write a brief report that describes the experimental
design, includes both graphical and numerical summaries of the resulting data, and communicates the
conclusions that follow from your data analysis.
AC TI V I TY 1 1 . 3 A Meaningful Paragraph
Write a meaningful paragraph that includes the following six terms: paired samples, significantly different,
P-value, sample, population, alternative hypothesis.
A “meaningful paragraph” is a coherent piece of
writing in an appropriate context that uses all of the
listed words. The paragraph should show that you un-
derstand the meaning of the terms and their relationship
to one another. A sequence of sentences that just define
the terms is not a meaningful paragraph. When choosing
a context, think carefully about the terms you need to
use. Choosing a good context will make writing a meaningful paragraph easier.
Summary of Key Concepts and Formulas
TERM OR FORMULA
COMMENT
Independent samples
Two samples where the individuals or objects in the first
sample are selected independently from those in the second
sample.
Paired samples
Two samples for which each observation in one sample is
paired in a meaningful way with a particular observation in a
second sample.
t5
1x1 2 x22 2 hypothesized value
s 21
Å n1
1
n2
1x1 2 x22 6 1t critical value2
df 5
The test statistic for testing H0: m1 2 m2 5 hypothesized
value when the samples are independently selected and the
sample sizes are large or it is reasonable to assume that both
population distributions are normal.
s 22
1V1 1 V22 2
V 21
V 22
1
n1 2 1
n2 2 1
s 22
s 21
1
n2
Å n1
where V1 5
A formula for constructing a confidence interval for
m1 2 m2 when the samples are independently selected and
the sample sizes are large or it is reasonable to assume that
the population distributions are normal.
s 21
s2
and V2 5 2
n1
n2
The formula for determining df for the two-sample t test and
confidence interval.
xd
The sample mean difference.
sd
The standard deviation of the sample differences.
md
The mean value for the population of differences.
sd
The standard deviation for the population of differences.
xd 2 hypothesized value
sd
!n
s
xd 6 1t critical value2 d
!n
The paired t test statistic for testing
H0: md 5 hypothesized value.
t5
The paired t confidence interval formula.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter Review Exercises
n1p^ 1 1 n2 p^ 2
n1 1 n2
p^ 1 2 p^ 2
z5
p^ c 11 2 p^ c2
p^ c 11 2 p^ c2
1
n1
n2
Å
p^ c is the statistic for estimating the common population proportion when p1 ϭ p2.
p^ c 5
1 p^ 1 2 p^ 22 6 1z critical value2
567
The test statistic for testing
H0: p1 Ϫ p2 ϭ 0
when the samples are independently selected and both
sample sizes are large.
p^ 2 11 2 p^ 22
p^ 1 11 2 p^ 12
1
n
n2
Å
1
A formula for constructing a confidence interval for p1 Ϫ p2
when both sample sizes are large.
Chapter Review Exercises 11.61 - 11.82
11.61 Do faculty and students have similar perceptions of
what types of behavior are inappropriate in the classroom?
This question was examined by the author of the article
“Faculty and Student Perceptions of Classroom Etiquette” (Journal of College Student Development (1998):
515–516). Each individual in a random sample of 173 students in general education classes at a large public university was asked to judge various behaviors on a scale from 1
(totally inappropriate) to 5 (totally appropriate). Individuals in a random sample of 98 faculty members also rated
the same behaviors. The mean rating for three of the behaviors studied are shown here (the means are consistent
with data provided by the author of the article). The sample standard deviations were not given, but for purposes of
this exercise, assume that they are all equal to 1.0.
Student Behavior
Wearing hats in the classroom
Addressing instructor by first name
Talking on a cell phone
Student
Mean
Rating
Faculty
Mean
Rating
2.80
2.90
1.11
3.63
2.11
1.10
a. Is there sufficient evidence to conclude that the
mean “appropriateness” score assigned to wearing a
hat in class differs for students and faculty?
b. Is there sufficient evidence to conclude that the
mean “appropriateness” score assigned to addressing
an instructor by his or her first name is higher for
students than for faculty?
c. Is there sufficient evidence to conclude that the
mean “appropriateness” score assigned to talking on
Bold exercises answered in back
Data set available online
a cell phone differs for students and faculty? Does
the result of your test imply that students and faculty
consider it acceptable to talk on a cell phone during
class?
11.62 Are girls less inclined to enroll in science courses
than boys? One study (“Intentions of Young Students
to Enroll in Science Courses in the Future: An Examination of Gender Differences,” Science Education
[1999]: 55–76) asked randomly selected fourth-, fifth-,
and sixth-graders how many science courses they intend
to take. The following data were obtained:
Males
Females
n
Mean
Standard
Deviation
203
224
3.42
2.42
1.49
1.35
Calculate a 99% confidence interval for the difference
between males and females in mean number of science
courses planned. Interpret your interval. Based on your
interval, how would you answer the question posed at
the beginning of the exercise?
11.63
A deficiency of the trace element selenium in
the diet can negatively impact growth, immunity, muscle
and neuromuscular function, and fertility. The introduction of selenium supplements to dairy cows is justified
when pastures have low selenium levels. Authors of the
paper “Effects of Short-Term Supplementation with
Selenised Yeast on Milk Production and Composition
of Lactating Cows” (Australian Journal of Dairy Technology, [2004]: 199–203) supplied the following data
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
568
Chapter 11 Comparing Two Populations or Treatments
on milk selenium concentration (mg/L) for a sample of
cows given a selenium supplement (the treatment group)
and a control sample given no supplement, both initially
and after a 9-day period.
Initial Measurement
After 9 Days
Treatment
Control
Treatment
Control
11.4
9.1
138.3
9.3
9.6
8.7
104.0
8.8
10.1
9.7
96.4
8.8
8.5
10.8
89.0
10.1
10.3
10.9
88.0
9.6
10.6
10.6
103.8
8.6
11.8
10.1
147.3
10.4
9.8
12.3
97.1
12.4
10.9
8.8
172.6
9.3
10.3
10.4
146.3
9.5
10.2
10.9
99.0
8.4
11.4
10.4
122.3
8.7
9.2
11.6
103.0
12.5
10.6
10.9
117.8
9.1
10.8
121.5
8.2
93.0
a. Use the given data for the treatment group to determine if there is sufficient evidence to conclude that
the mean selenium concentration is greater after 9
days of the selenium supplement.
b. Are the data for the cows in the control group (no
selenium supplement) consistent with the hypothesis of no significant change in mean selenium concentration over the 9-day period?
c. Would you use the paired t test to determine if there
was a significant difference in the initial mean selenium concentration for the control group and the
treatment group? Explain why or why not.
11.64
The Oregon Department of Health web site
provides information on the cost-to-charge ratio (the
percentage of billed charges that are actual costs to the
hospital). The cost-to-charge ratios for both inpatient
and outpatient care in 2002 for a sample of six hospitals
in Oregon follow.
Bold exercises answered in back
Data set available online
Hospital
2002
Inpatient
Ratio
2002
Outpatient
Ratio
1
2
3
4
5
6
68
100
71
74
100
83
54
75
53
56
74
71
Is there evidence that the mean cost-to-charge ratio for
Oregon hospitals is lower for outpatient care than for
inpatient care? Use a significance level of .05.
11.65 The article “A ‘White’ Name Found to Help in
Job Search” (Associated Press, January 15, 2003) described an experiment to investigate if it helps to have a
“white-sounding” first name when looking for a job.
Researchers sent 5000 resumes in response to ads that
appeared in the Boston Globe and Chicago Tribune. The
resumes were identical except that 2500 of them had
“white-sounding” first names, such as Brett and Emily,
whereas the other 2500 had “black-sounding” names
such as Tamika and Rasheed. Resumes of the first type
elicited 250 responses and resumes of the second type
only 167 responses. Do these data support the theory
that the proportion receiving responses is higher for
those resumes with “white-sounding first” names?
11.66 In a study of a proposed approach for diabetes
prevention, 339 people under the age of 20 who were
thought to be at high risk of developing type I diabetes
were assigned at random to two groups. One group received twice-daily injections of a low dose of insulin. The
other group (the control) did not receive any insulin, but
was closely monitored. Summary data (from the article
“Diabetes Theory Fails Test,” USA Today, June 25,
2001) follow.
Group
n
Number
Developing
Diabetes
Insulin
Control
169
170
25
24
a. Use the given data to construct a 90% confidence
interval for the difference in the proportion that
develop diabetes for the control group and the insulin group.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
569
Chapter Review Exercises
b. Give an interpretation of the confidence interval and
the associated confidence level.
c. Based on your interval from Part (a), write a few
sentences commenting on the effectiveness of the
proposed prevention treatment.
11.67 When a surgeon repairs injuries, sutures (stitched
knots) are used to hold together and stabilize the injured
area. If these knots elongate and loosen through use, the
injury may not heal properly because the tissues would
not be optimally positioned. Researchers at the University of California, San Francisco, tied a series of different
types of knots with two types of suture material, Maxon
and Ticron. Suppose that 112 tissue specimens were
available and that for each specimen the type of knot and
suture material were randomly assigned. The investigators tested the knots to see how much the loops elongated; the elongations (in mm) were measured and the
resulting data are summarized here. For purposes of this
exercise, assume it is reasonable to regard the elongation
distributions as approximately normal.
Maxon
Types of knot
n
x
sd
Square (control)
Duncan Loop
Overhand
Roeder
Snyder
10
15
15
10
10
10.0
11.0
11.0
13.5
13.5
.1
.3
.9
.1
2.0
Ticron
Types of knot
n
x
sd
Square (control)
Duncan Loop
Overhand
Roeder
Snyder
10
11
11
10
10
2.5
10.9
8.1
5.0
8.1
.06
.40
1.00
.04
.06
a. Is there a significant difference in mean elongation
between the square knot and the Duncan loop for
Maxon thread?
b. Is there a significant difference in mean elongation
between the square knot and the Duncan loop for
Ticron thread?
c . For the Duncan loop knot, is there a significant difference in mean elongation between the Maxon and
Ticron threads?
Bold exercises answered in back
Data set available online
11.68 The article “Trial Lawyers and Testosterone:
Blue-Collar Talent in a White-Collar World” (Journal
of Applied Social Psychology [1998]: 84–94) compared
trial lawyers and nontrial lawyers on the basis of mean
testosterone level. Random samples of 35 male trial lawyers, 31 male nontrial lawyers, 13 female trial lawyers,
and 18 female nontrial lawyers were selected for study.
The article includes the following statement: “Trial lawyers had higher testosterone levels than did nontrial
lawyers. This was true for men, t(64) ϭ 3.75, p Ͻ .001,
and for women, t(29) ϭ 2.26, p Ͻ .05.”
a. Based on the information given, is the mean testosterone level for male trial lawyers significantly higher
than for male nontrial lawyers?
b. Based on the information given, is the mean testosterone level for female trial lawyers significantly
higher than for female nontrial lawyers?
c. Do you have enough information to carry out a test
to determine whether there is a significant difference
in the mean testosterone levels of male and female
trial lawyers? If so, carry out such a test. If not, what
additional information would you need to be able to
conduct the test?
11.69 In a study of memory recall, eight students from
a large psychology class were selected at random and
given 10 minutes to memorize a list of 20 nonsense
words. Each was asked to list as many of the words as he
or she could remember both 1 hour and 24 hours later.
The data are as shown in the accompanying table. Is
there evidence to suggest that the mean number of words
recalled after 1 hour exceeds the mean recall after 24
hours by more than 3? Use a level .01 test.
Subject
1 hour later
24 hour later
1
14
10
2
12
4
3
18
14
4
7
6
5
11
9
6
9
6
7
16
12
8
15
12
11.70 As part of a study to determine the effects of allowing the use of credit cards for alcohol purchases in
Canada (see “Changes in Alcohol Consumption Pat-
terns Following the Introduction of Credit Cards in
Ontario Liquor Stores,” Journal of Studies on Alcohol
[1999]: 378–382), randomly selected individuals were
given a questionnaire asking them (among other things)
how many drinks they had consumed during the previous week. A year later (after liquor stores started accepting credit cards for purchases), these same individuals
were again asked how many drinks they had consumed
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
570
Chapter 11 Comparing Two Populations or Treatments
in the previous week. The data shown are consistent with
summary statistics presented in the article.
n
96
Credit-Card
Shoppers
850
Non-CreditCard Shoppers
1994
Mean
1995
Mean
xd
sd
6.72
6.34
.38
5.52
4.09
3.97
.12
4.58
a. The standard deviations of the differences were quite
large. Explain how this could be the case.
b. Calculate a 95% confidence interval for the mean
difference in drink consumption for credit-card
shoppers between 1994 and 1995. Is there evidence
that the mean number of drinks decreased?
c. Test the hypothesis that there was no change in the
mean number of drinks between 1994 and 1995 for
the non-credit-card shoppers. Be sure to calculate
and interpret the P-value for this test.
11.71
Several methods of estimating the number of
seeds in soil samples have been developed by ecologists.
An article in the Journal of Ecology (“A Comparison
11.72 Are college students who take a freshman orientation course more or less likely to stay in college than those
who do not take such a course? The article “A Longitudi-
nal Study of the Retention and Academic Performance
of Participants in Freshmen Orientation Courses”
(Journal of College Student Development [1994]: 444–
449) reported that 50 of 94 randomly selected students
who did not participate in an orientation course returned
for a second year. Of 94 randomly selected students who
did take the orientation course, 56 returned for a second
year. Construct a 95% confidence interval for p1 Ϫ p2,
the difference in the proportion returning for students
who do not take an orientation course and those who do.
Give an interpretation of this interval.
11.73 The article “Truth and DARE: Tracking Drug Education to Graduation” (Social Problems [1994]: 448–
456) compared the drug use of 288 randomly selected
high school seniors exposed to a drug education program
(DARE) and 335 randomly selected high school seniors
who were not exposed to such a program. Data for marijuana use are given in the accompanying table. Is there evidence that the proportion using marijuana is lower for
students exposed to the DARE program? Use a ϭ .05.
of Methods for Estimating Seed Numbers in the Soil”
[1990]: 1079–1093) considered three such methods. The
accompanying data give number of seeds detected by the
direct method and by the stratified method for 27 soil
specimens.
Specimen Direct Stratified Specimen Direct Stratified
1
3
5
7
9
11
13
15
17
19
21
23
25
27
24
0
20
40
12
4
76
32
36
92
40
0
12
40
8
8
52
28
8
0
68
28
36
92
48
0
40
76
2
4
6
8
10
12
14
16
18
20
22
24
26
32
60
64
8
92
68
24
0
16
4
24
8
16
36
56
64
8
100
56
52
0
12
12
24
12
12
Do the data provide sufficient evidence to conclude that
the mean number of seeds detected differs for the two
methods? Test the relevant hypotheses using a ϭ.05.
Bold exercises answered in back
Data set available online
Exposed to DARE
Not Exposed to DARE
n
Number Who
Use Marijuana
288
335
141
181
11.74 The article “Softball Sliding Injuries” (American
Journal of Diseases of Children [1988]: 715–716) provided
a comparison of breakaway bases (designed to reduce injuries) and stationary bases. Consider the accompanying data
(which agree with summary values given in the paper).
Stationary Bases
Breakaway Bases
Number of
Games Played
Number of
Games Where a
Player Suffered
a Sliding Injury
1250
1250
90
20
Is the proportion of games with a player suffering a sliding injury significantly lower for games using breakaway
bases? Answer by performing a level .01 test. What did
you have to assume in order for your conclusion to be
valid? Do you think it is likely that this assumption was
satisfied in this study?
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter Review Exercises
11.75 The positive effect of water fluoridation on dental
health is well documented. One study that validates this
is described in the article “Impact of Water Fluoridation
on Children’s Dental Health: A Controlled Study of
Two Pennsylvania Communities” (American Statistical Association Proceedings of the Social Statistics
Section [1981]: 262–265). Two communities were compared. One had adopted fluoridation in 1966, whereas
the other had no such program. Of 143 randomly selected children from the town without fluoridated water,
106 had decayed teeth, and 67 of 119 randomly selected
children from the town with fluoridated water had decayed teeth. Let p1 denote proportion of all children in
the community with fluoridated water who have decayed
teeth, and let p2 denote the analogous proportion for
children in the community with unfluoridated water.
Estimate p1 Ϫ p2 using a 90% confidence interval. Does
the interval contain 0? Interpret the interval.
11.76 Wayne Gretzky was one of ice hockey’s most prolific scorers when he played for the Edmonton Oilers.
During his last season with the Oilers, Gretzky played in
41 games and missed 17 games due to injury. The article
“The Great Gretzky” (Chance [1991]: 16–21) looked at
the number of goals scored by the Oilers in games with
and without Gretzky, as shown in the accompanying
table. If we view the 41 games with Gretzky as a random
sample of all Oiler games in which Gretzky played and
the 17 games without Gretzky as a random sample of all
Oiler games in which Gretzky did not play, is there evidence that the mean number of goals scored by the Oilers is higher for games in which Gretzky played? Use
a ϭ .01.
Games with Gretzky
Games without Gretzky
closed-mouth potato chip, and so on). We are not making this up! Summary values taken from plots given in
the article appear in the accompanying table. For purposes of this exercise, suppose that it is reasonable to regard the peak loudness distributions as approximately
normal.
Potato Chip
Open mouth
Closed mouth
Tortilla Chip
Open mouth
Closed mouth
n
x
s
10
10
63
54
13
16
10
10
60
53
15
16
a. Construct a 95% confidence interval for the difference in mean peak loudness between open-mouth
and closed-mouth chewing of potato chips. Interpret
the resulting interval.
b. For closed-mouth chewing (the recommended
method!), is there sufficient evidence to indicate that
there is a difference between potato chips and tortilla
chips with respect to mean peak loudness? Test the
relevant hypotheses using a ϭ .01.
c. The means and standard deviations given here were
actually for stale chips. When ten measurements of
peak loudness were recorded for closed-mouth chewing of fresh tortilla chips, the resulting mean and
standard deviation were 56 and 14, respectively. Is
there sufficient evidence to conclude that fresh tortilla chips are louder than stale chips? Use a ϭ .05.
11.78 Are very young infants more likely to imitate ac-
n
Sample
Mean
Sample
sd
tions that are modeled by a person or simulated by an
object? This question was the basis of a research study
summarized in the article “The Role of Person and Ob-
41
17
4.73
3.88
1.29
1.18
ject in Eliciting Early Imitation” (Journal of Experimental Child Psychology [1991]: 423–433). One action ex-
11.77 Here’s one to sink your teeth into: The authors of
the article “Analysis of Food Crushing Sounds During
Mastication: Total Sound Level Studies” (Journal of
Texture Studies [1990]: 165–178) studied the nature of
sounds generated during eating. Peak loudness (in decibels at 20 cm away) was measured for both open-mouth
and closed-mouth chewing of potato chips and of tortilla
chips. Forty subjects participated, with ten assigned at
random to each combination of conditions (such as
Bold exercises answered in back
571
Data set available online
amined was mouth opening. This action was modeled
repeatedly by either a person or a doll, and the number
of times that the infant imitated the behavior was recorded. Twenty-seven infants participated, with 12 exposed to a human model and 15 exposed to the doll.
Summary values are at the top of the following page. Is
there sufficient evidence to conclude that the mean number of imitations is higher for infants who watch a human model than for infants who watch a doll? Test the
relevant hypotheses using a .01 significance level.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
572
Chapter 11 Comparing Two Populations or Treatments
x
s
Person Model
Doll Model
5.14
1.60
3.46
1.30
11.79 Dentists make many people nervous (even more
so than statisticians!). To see whether such nervousness
elevates blood pressure, the blood pressure and pulse
rates of 60 subjects were measured in a dental setting and
in a medical setting (“The Effect of the Dental Setting
on Blood Pressure Measurement,” American Journal
of Public Health [1983]: 1210–1214). For each subject, the
difference (dental-setting blood pressure minus medicalsetting blood pressure) was calculated. The analogous
differences were also calculated for pulse rates. Summary
data follows.
Systolic Blood Pressure
Pulse (beats/min)
Mean
Difference
Standard
Deviation of
Differences
4.47
Ϫ1.33
8.77
8.84
a. Do the data strongly suggest that true mean blood
pressure is higher in a dental setting than in a medical setting? Use a level .01 test.
b. Is there sufficient evidence to indicate that true mean
pulse rate in a dental setting differs from the true mean
pulse rate in a medical setting? Use a significance level
of .05.
11.80 Key terms in survey questions too often are not
well understood, and such ambiguity can affect responses. As an example, the article “How Unclear Terms
Is there any difference between the true proportions of
yes responses to these questions? Can a procedure from
this chapter be used to answer the question posed? If yes,
use it; if not, explain why not.
11.81 An electronic implant that stimulates the auditory
nerve has been used to restore partial hearing to a number of deaf people. In a study of implant acceptability
(Los Angeles Times, January 29, 1985), 250 adults born
deaf and 250 adults who went deaf after learning to
speak were followed for a period of time after receiving
an implant. Of those deaf from birth, 75 had removed
the implant, whereas only 25 of those who went deaf
after learning to speak had done so. Does this suggest
that the true proportion who remove the implants differs
for those who were born deaf and those who went deaf
after learning to speak? Test the relevant hypotheses using a .01 significance level.
Samples of both surface soil and subsoil were
taken from eight randomly selected agricultural locations
in a particular county. The soil samples were analyzed to
determine both surface pH and subsoil pH, with the results shown in the accompanying table.
11.82
Location
1
2
3
4
5
6
7
8
Surface pH 6.55 5.98 5.59 6.17 5.92 6.18 6.43 5.68
Subsoil pH 6.78 6.14 5.80 5.91 6.10 6.01 6.18 5.88
a. Compute a 90% confidence interval for the mean
difference between surface and subsoil pH for agricultural land in this county.
b. What assumptions are necessary for the interval in
Part (a) to be valid?
Affect Survey Data” (Public Opinion Quarterly [1992]:
218–231) described a survey in which each individual in
a sample was asked, “Do you exercise or play sports regularly?” But what constitutes exercise? The following revised question was then asked of each individual in the
same sample: “Do you do any sports or hobbies involving physical activities, or any exercise, including walking,
on a regular basis?” The resulting data are shown in the
accompanying table.
Initial Question
Revised Question
Bold exercises answered in back
Yes
No
48
60
52
40
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER
12
The Analysis of
Categorical Data
and Goodnessof-Fit Tests
It is often the case that information is collected on categorical variables, such as political affiliation, gender, or college major. As with numerical data, categorical data sets
can be univariate (consisting of observations on a single
categorical variable), bivariate (observations on two categorical variables), or even multivariate. In this chapter, we
will first consider inferential methods for analyzing univariate categorical data sets and then turn to techniques
appropriate for use with bivariate categorical data.
Greg Flume/NewSport/Corbis
Make the most of your study time by accessing everything you need to succeed
online with CourseMate.
Visit http://www.cengagebrain.com where you will find:
• An interactive eBook, which allows you to take notes, highlight, bookmark, search
•
•
•
•
•
•
the text, and use in-context glossary definitions
Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP
Video solutions to selected exercises
Data sets available for selected examples and exercises
Online quizzes
Flashcards
Videos
573
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
574
12.1
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests
Chi-Square Tests for Univariate Data
Univariate categorical data sets arise in a variety of settings. If each student in a
sample of 100 is classified according to whether he or she is enrolled full-time or parttime, data on a categorical variable with two categories result. Each airline passenger
in a sample of 50 might be classified into one of three categories based on type of
ticket—coach, business class, or first class. Each registered voter in a sample of
100 selected from those registered in a particular city might be asked which of the five
city council members he or she favors for mayor. This would yield observations on a
categorical variable with five categories.
Univariate categorical data are most conveniently summarized in a one-way frequency table. For example, the article “Fees Keeping American Taxpayers From
Using Credit Cards to Make Tax Payments” (IPSOS Insight, March 24, 2006) surveyed American taxpayers regarding their intent to pay taxes with a credit card. Suppose that 100 randomly selected taxpayers participated in such a survey, with possible
responses being definitely will use a credit card to pay taxes next year, probably will
use a credit card, probably won’t use a credit card, and definitely won’t use a credit
card. The first few observations might be
Probably will
Definitely will not
Probably will not
Probably will not
Definitely will
Definitely will not
Counting the number of observations of each type might then result in the following
one-way table:
Outcome
Frequency
Definitely
Will
Probably
Will
Probably
Will Not
Definitely
Will Not
14
12
24
50
For a categorical variable with k possible values (k different levels or categories),
sample data are summarized in a one-way frequency table consisting of k cells, which
may be displayed either horizontally or vertically.
In this section, we consider testing hypotheses about the proportion of the population that falls into each of the possible categories. For example, the manager of a tax
preparation company might be interested in determining whether the four possible responses to the tax credit card question occur equally often. If this is indeed the case, the
long-run proportion of responses falling into each of the four categories is 1/4, or .25.
The test procedure to be presented shortly would allow the manager to decide whether
the hypothesis that all four category proportions are equal to .25 is plausible.
Notation
k ϭ number of categories of a categorical variable
p1 ϭ true proportion for Category 1
p2 ϭ true proportion for Category 2
(
pk ϭ true proportion for Category k
(Note: p1 1 p2 1 % 1 pk 5 1)
(continued)
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
12.1 Chi-Square Tests for Univariate Data
575
The hypotheses to be tested have the form
H0: p1 ϭ hypothesized proportion for Category 1
p2 ϭ hypothesized proportion for Category 2
(
pk ϭ hypothesized proportion for Category k
Ha: H0 is not true, so at least one of the true category proportions differs
from the corresponding hypothesized value.
For the example involving responses to the tax survey, let
p1 ϭ the proportion of all taxpayers who will definitely pay by credit card
p2 ϭ the proportion of all taxpayers who will probably pay by credit card
p3 ϭ the proportion of all taxpayers who will probably not pay by credit card
and
p4 ϭ the proportion of all taxpayers who will definitely not pay by credit card
The null hypothesis of interest is then
H0: p1 ϭ .25, p2 ϭ .25, p3 ϭ .25, p4 ϭ .25
A null hypothesis of the type just described can be tested by first selecting a random
sample of size n and then classifying each sample response into one of the k possible
categories. To decide whether the sample data are compatible with the null hypothesis,
we compare the observed cell counts (frequencies) to the cell counts that would have
been expected when the null hypothesis is true. The expected cell counts are
Expected cell count for Category 1 ϭ np1
Expected cell count for Category 2 ϭ np2
and so on. The expected cell counts when H0 is true result from substituting the corresponding hypothesized proportion for each pi.
Anthony Ise/PhotoDisc/Getty Images//
Cengage Learning/Getty Images
E X A M P L E 1 2 . 1 Births and the Lunar Cycle
A common urban legend is that more babies than expected are born during certain
phases of the lunar cycle, especially near the full moon. The paper “The Effect of the
Lunar Cycle on Frequency of Births and Birth Complications” (American Journal
of Obstetrics and Gynecology [2005]: 1462–1464) classified births according to the
lunar cycle. Data for a sample of randomly selected births occurring during 24 lunar
Lunar Phase
New moon
Waxing crescent
First quarter
Waxing gibbous
Full moon
Waning gibbous
Last quarter
Waning crescent
Number
of Days
Number
of Births
24
152
24
149
24
150
24
152
7,680
48,442
7,579
47,814
7,711
47,595
7,733
48,230
Data set available online
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.