ACTIVITY 12.2: Color and Perceived Taste
Tải bản đầy đủ - 0trang
Chapter Review Exercises
607
TERM OR FORMULA
COMMENT
Two-way frequency table (contingency table)
A rectangular table used to summarize a categorical data
set; two-way tables are used to compare several populations on the basis of a categorical variable or to determine
if an association exists between two categorical variables.
X 2 test for homogeneity
The hypothesis test performed to determine whether category proportions are the same for two or more populations or treatments.
X 2 test for independence
The hypothesis test performed to determine whether an
association exists between two categorical variables.
Chapter Review Exercises 12.35 - 12.45
Each observation in a random sample of 100
bicycle accidents resulting in death was classified according to the day of the week on which the accident occurred. Data consistent with information given on the
web site www.highwaysafety.com are given in the following table
12.35
Day of Week
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Frequency
14
13
12
15
14
17
15
Color
First Peck Frequency
Blue
Green
Yellow
Red
16
8
6
3
Do the data provide evidence of a color preference? Test
using a ϭ .01.
In November 2005, an international study to
assess public opinion on the treatment of suspected terrorists was conducted (“Most in U.S., Britain, S. Korea
12.37
Based on these data, is it reasonable to conclude that the
proportion of accidents is not the same for all days of the
week? Use a ϭ .05.
The color vision of birds plays a role in their
foraging behavior: Birds use color to select and avoid
certain types of food. The authors of the article “Colour
12.36
Avoidance in Northern Bobwhites: Effects of Age, Sex,
and Previous Experience” (Animal Behaviour [1995]:
519–526) studied the pecking behavior of 1-day-old bobwhites. In an area painted white, they inserted four pins
with different colored heads. The color of the pin chosen
Bold exercises answered in back
on the bird’s first peck was noted for each of 33 bobwhites, resulting in the accompanying table.
Data set available online
and France Say Torture Is OK in at Least Rare Instances,” Associated Press, December 7, 2005). Each
individual in random samples of 1000 adults from each
of nine different countries was asked the following question: “Do you feel the use of torture against suspected
terrorists to obtain information about terrorism activities
is justified?” Responses consistent with percentages given
in the article for the samples from Italy, Spain, France,
the United States, and South Korea are summarized in
the table at the top of the next page. Based on these data,
is it reasonable to conclude that the response proportions
are not the same for all five countries? Use a .01 significance level to test the appropriate hypotheses.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
608
Chapter 12
The Analysis of Categorical Data and Goodness-of-Fit Tests
Response
Country
Never
Rarely
Sometimes
Often
Not
Sure
Italy
Spain
France
United
States
South
Korea
600
540
400
360
140
160
250
230
140
140
200
270
90
70
120
110
30
90
30
30
100
330
470
60
40
12.40
Each boy in a sample of Mexican American
males, age 10 to 18, was classified according to smoking
status and response to a question asking whether he likes
to do risky things. The following table is based on data
given in the article “The Association Between Smoking
and Unhealthy Behaviors Among a National Sample
of Mexican-American Adolescents” (Journal of School
Health [1998]: 376–379):
Smoking Status
According to Census Bureau data, in 1998 the
California population consisted of 50.7% whites, 6.6%
blacks, 30.6% Hispanics, 10.8% Asians, and 1.3% other
ethnic groups. Suppose that a random sample of 1000
students graduating from California colleges and universities in 1998 resulted in the accompanying data on
ethnic group. These data are consistent with summary
statistics contained in the article titled “Crumbling Public
12.38
School System a Threat to California’s Future (Investor’s Business Daily, November 12, 1999).
Ethnic Group
Number in Sample
White
Black
Hispanic
Asian
Other
679
51
77
190
3
Nonsmoker
45
36
46
153
Assume that it is reasonable to regard the sample as a
random sample of Mexican-American male adolescents.
a. Is there sufficient evidence to conclude that there is
an association between smoking status and desire to
do risky things? Test the relevant hypotheses using
a ϭ .05.
b. Based on your conclusion in Part (a), is it reasonable
to conclude that smoking causes an increase in the
desire to do risky things? Explain.
The article “Cooperative Hunting in Lions:
The Role of the Individual” (Behavioral Ecology and
Sociobiology [1992]: 445–454) discusses the different
12.41
Do the data provide evidence that the proportion of
students graduating from colleges and universities in
California for these ethnic group categories differs from
the respective proportions in the population for California? Test the appropriate hypotheses using a ϭ .01.
12.39 Criminologists have long debated whether there
is a relationship between weather and violent crime. The
author of the article “Is There a Season for Homicide?”
(Criminology [1988]: 287–296) classified 1361 homicides according to season, resulting in the accompanying
data. Do these data support the theory that the homicide
rate is not the same over the four seasons? Test the relevant hypotheses using a significance level of .05.
Season
Winter
Spring
Summer
Fall
328
334
372
327
Bold exercises answered in back
Likes Risky Things
Doesn’t Like Risky Things
Smoker
Data set available online
roles taken by lionesses as they attack and capture prey.
The authors were interested in the effect of the position
in line as stalking occurs; an individual lioness may be in
the center of the line or on the wing (end of the line) as
they advance toward their prey. In addition to position,
the role of the lioness was also considered. A lioness
could initiate a chase (be the first one to charge the prey),
or she could participate and join the chase after it has
been initiated. Data from the article are summarized in
the accompanying table.
Role
Position
Initiate Chase
Participate in Chase
Center
Wing
28
66
48
41
Is there evidence of an association between position and
role? Test the relevant hypotheses using a ϭ .01. What
assumptions about how the data were collected must be
true for the chi-square test to be an appropriate way to
analyze these data?
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter Review Exercises
12.42 The authors of the article “A Survey of Parent
Attitudes and Practices Regarding Underage Drinking”
(Journal of Youth and Adolescence [1995]: 315–334)
would like to know whether male and female inmates
differ with respect to type of offense.
conducted a telephone survey of parents with preteen
and teenage children. One of the questions asked was
“How effective do you think you are in talking to your
children about drinking?” Responses are summarized in
the accompanying 3 ϫ 2 table. Using a significance level
of .05, carry out a test to determine whether there is an
association between age of children and parental
response.
Age of Children
Response
Preteen
Teen
126
44
51
149
41
26
Very Effective
Somewhat Effective
Not at All Effective or Don’t Know
The article “Regional Differences in Attitudes
Toward Corporal Punishment” (Journal of Marriage
and Family [1994]: 314–324) presents data resulting
12.43
from a random sample of 978 adults. Each individual in
the sample was asked whether he or she agreed with the
following statement: “Sometimes it is necessary to discipline a child with a good, hard spanking.” Respondents
were also classified according to the region of the United
States in which they lived. The resulting data are summarized in the accompanying table. Is there an association between response (agree, disagree) and region of
residence? Use a 5 .01.
Response
Region
Northeast
West
Midwest
South
Agree
Disagree
130
146
211
291
59
42
52
47
Jail inmates can be classified into one of the
following four categories according to the type of crime
committed: violent crime, crime against property, drug
offenses, and public-order offenses. Suppose that random samples of 500 male inmates and 500 female inmates are selected, and each inmate is classified according to type of offense. The data in the accompanying
table are based on summary values given in the article
“Profile of Jail Inmates” (USA Today, April 25, 1991). We
12.44
Bold exercises answered in back
Data set available online
609
Gender
Type of Crime
Male
Female
Violent
Property
Drug
Public-Order
117
150
109
124
66
160
168
106
a. Is this a test of homogeneity or a test of
independence?
b. Test the relevant hypotheses using a significance
level of .05.
Drivers born under the astrological sign of
Capricorn are the worst drivers in Australia, according to
an article that appeared in the Australian newspaper The
Mercury (October 26, 1998). This statement was based
on a study of insurance claims that resulted in the following data for male policyholders of a large insurance
company.
12.45
Astrological Sign
Aquarius
Aries
Cancer
Capricorn
Gemini
Leo
Libra
Pisces
Sagittarius
Scorpio
Taurus
Virgo
Number of
Policyholders
35,666
37,926
38,126
54,906
37,179
37,354
37,910
36,677
34,175
35,352
37,179
37,718
a. Assuming that it is reasonable to treat the male policyholders of this particular insurance company as a
random sample of male insured drivers in Australia,
are the observed data consistent with the hypothesis
that the proportion of male insured drivers is the
same for each of the 12 astrological signs?
b. Why do you think that the proportion of Capricorn
policyholders is so much higher than would be expected if the proportions are the same for all astrological signs?
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
610
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests
c. Suppose that a random sample of 1000 accident
claims submitted to this insurance company is selected and each claim classified according to the astrological sign of the driver. (The accompanying table is consistent with accident rates given in the
article.)
Astrological Sign
Aquarius
Aries
Cancer
Capricorn
Gemini
Leo
Libra
Pisces
Sagittarius
Observed Number
in Sample
85
83
82
88
83
83
83
82
81
Astrological Sign
Scorpio
Taurus
Virgo
Observed Number
in Sample
85
84
81
Test the null hypothesis that the proportion of accident
claims submitted by drivers of each astrological sign is
consistent with the proportion of policyholders of each
sign. Use the given information on the distribution of
policyholders to compute expected frequencies and then
carry out an appropriate test.
Continued
Bold exercises answered in back
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER
13
Simple Linear
Regression and
Correlation:
Inferential
Methods
Regression and correlation were introduced in Chapter 5
as techniques for describing and summarizing bivariate
numerical data consisting of (x, y) pairs. For example,
consider a scatterplot of data on y 5 percentage of courses
taught by teachers with inappropriate or no license and
x 5 spending per pupil for a sample of Missouri public
school districts (“Is Teacher Pay Adequate?” Research
Arne Hodalic/Encyclopedia/Corbis
Working Papers Series, Kennedy School of Government, Harvard University, October 2005). A scatterplot of the data shows a surprising linear pattern. The sample correlation coefficient is r 5 .27, and the equation of
the least-squares line has a positive slope, indicating that school districts with higher
expenditures per student also tended to have a higher percentage of courses taught by
Make the most of your study time by accessing everything you need to succeed
online with CourseMate.
Visit http://www.cengagebrain.com where you will find:
• An interactive eBook, which allows you to take notes, highlight, bookmark, search
•
•
•
•
•
•
the text, and use in-context glossary definitions
Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP
Video solutions to selected exercises
Data sets available for selected examples and exercises
Online quizzes
Flashcards
Videos
611
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
612
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
teachers with an inappropriate license or no license. Could the pattern observed in
the scatterplot be plausibly explained by chance, or does the sample provide convincing evidence of a linear relationship between these two variables for school districts in
Missouri? If there is evidence of a meaningful relationship between these two variables, the regression line could be used as the basis for predicting the percentage of
teachers with inappropriate or no license for a school district with a specified expenditure per student or for estimating the average percentage of teachers with inappropriate or no license for all school districts with a specified expenditure per student. In
this chapter, we develop inferential methods for bivariate numerical data, including a
confidence interval (interval estimate) for a mean y value, a prediction interval for a
single y value, and a test of hypotheses regarding the extent of correlation in the entire
population of (x, y) pairs.
13.1
Simple Linear Regression Model
A deterministic relationship is one in which the value of y is completely determined by
the value of an independent variable x. Such a relationship can be described using
traditional mathematical notation, such as y 5 f (x) where f(x) is a specified function
of x. For example, we might have
y 5 f 1x2 5 10 1 2x
or
y 5 f 1x2 5 4 2 1102 2x
However, in many situations, the variables of interest are not deterministically related. For example, the value of y 5 first-year college grade point average is certainly
not determined solely by x 5 high school grade point average, and y 5 crop yield is
determined partly by factors other than x 5 amount of fertilizer used.
A description of the relationship between two variables x and y that are not deterministically related can be given by specifying a probabilistic model. The general
form of an additive probabilistic model allows y to be larger or smaller than f(x) by
a random amount e. The model equation is of the form
y 5 deterministic function of x 1 random deviation 5 f 1x2 1 e
Thinking geometrically, if e . 0, the corresponding point will lie above the graph of
y 5 f(x). If e , 0 the corresponding point will fall below the graph. If f (x) is a function used in a probabilistic model relating y to x and if observations on y are made for
various values of x, the resulting (x, y) points will be distributed about the graph of
f(x), some falling above it and some falling below it.
For example, consider the probabilistic model
f
y 5 50 2 10x 1 x 2 1 e
f (x)
The graph of the function y 5 50 2 10x 1 x 2 is shown as the orange curve in Figure 13.1. The observed point (4, 30) is also shown in the figure. Because
f 142 5 50 2 10 142 1 42 5 50 2 40 1 16 5 26
for the point (4, 30), we can write y 5 f 1x2 1 e, where e 5 4. The point (4, 30) falls
4 above the graph of the function y 5 50 2 10x 1 x 2.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
13.1
613
Simple Linear Regression Model
y
Observation (4, 30)
e=4
26
Graph of
y = 50 – 10x + x 2
FIGURE 13.1
A deviation from the deterministic part
of a probabilistic model.
x
4
Simple Linear Regression
The simple linear regression model is a special case of the general probabilistic model
in which the deterministic function f (x) is linear (so its graph is a straight line).
DEFINITION
The simple linear regression model assumes that there is a line with vertical
or y intercept a and slope b, called the population regression line. When a
value of the independent variable x is ﬁxed and an observation on the dependent variable y is made,
y ϭ a ϩ bx ϩ e
Without the random deviation e, all observed (x, y) points would fall exactly on
the population regression line. The inclusion of e in the model equation recognizes that points will deviate from the line by a random amount.
Figure 13.2 shows two observations in relation to the population regression line.
y
Observation when x = x1
(positive deviation)
Population regression
line (slope β)
e2
e1
Observation when x = x2
(negative deviation)
α = vertical
intercept
FIGURE 13.2
Two observations and deviations from
the population regression line.
x
0
0
x = x1
x = x2
Before we make an observation on y for any particular value of x, we are uncertain
about the value of e. It could be negative, positive, or even 0. Also, it might be quite
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.