ACTIVITY 13.1: Are Tall Women from “Big” Families?
Tải bản đầy đủ - 0trang
Summary of Key Concepts and Formulas
661
Summary of Key Concepts and Formulas
TERM OR FORMULA
COMMENT
Simple linear regression model, y 5 a 1 bx 1 e
This model assumes that there is a line with slope b and
y intercept a, called the population regression line, such
that an observation deviates from the line by a random
amount e. The random deviation is assumed to have a
normal distribution with mean zero and standard deviation s, and random deviations for different observations
are assumed to be independent of one another.
Estimated regression line, y^ 5 a 1 bx
The least-squares line introduced in Chapter 5.
se 5
sb 5
SSResid
Å n22
se
The point estimate of the standard deviation s, with associated degrees of freedom n 2 2.
The estimated standard deviation of the statistic b.
"Sxx
b 6 1t critical value2 sb
t5
A confidence interval for the slope b of the population
regression line, where the t critical value is based on
n 2 2 degrees of freedom.
b 2 hypothesized value
sb
Model utility test, with test statistic t 5
The test statistic for testing hypotheses about b. The test
is based on n 2 2 degrees of freedom.
b
sb
A test of H0: b 5 0, which asserts that there is no useful
linear relationship between x and y, versus Ha: b ϶ 0, the
claim that there is a useful linear relationship.
Residual analysis
Methods based on the residuals or standardized residuals
for checking the assumptions of a regression model.
Standardized residual
A residual divided by its standard deviation.
Standardized residual plot
A plot of the (x, standardized residual) pairs. A pattern in
this plot suggests that the simple linear regression model
may not be appropriate.
sa1bx* 5 se
1x* 2 x2 2
1
1
Ån
Sxx
The estimated standard deviation of the statistic
a 1 bx*, where x* denotes a particular value of x.
a 1 bx* 6 1t critical value2 sa1bx*
A confidence interval for a 1 bx*, the mean value of y
when x 5 x*.
a 1 bx* 6 1t critical value2 "s 2e 1 s 2a1bx*
A prediction interval for a single y value to be observed
when x 5 x*.
Population correlation coefficient r
A measure of the extent to which the x and y values in an
entire population are linearly related.
t5
r
1 2 r2
Ån22
The test statistic for testing H0: r 5 0, according to which
(assuming a bivariate normal population distribution) x
and y are independent of one another.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
662
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
Chapter Review Exercises 13.56 - 13.70
13.56 The effects of grazing animals on grasslands have
been the focus of numerous investigations by ecologists.
One such study, reported in “The Ecology of Plants,
Large Mammalian Herbivores, and Drought in Yellowstone National Park” (Ecology [1992]: 2043–2058),
proposed using the simple linear regression model to relate y 5 green biomass concentration (g/cm 3) to x 5
elapsed time since snowmelt (days).
a. The estimated regression equation was given as y^ 5
106.3 2 .640x. What is the estimate of average
change in biomass concentration associated with a
1-day increase in elapsed time?
b. What value of biomass concentration would you
predict when elapsed time is 40 days?
c. The sample size was n 5 58, and the reported value
of the coefficient of determination was .470. Does
this suggest that there is a useful linear relationship
between the two variables? Carry out an appropriate
test.
13.57 A random sample of n 5 347 students was selected, and each one was asked to complete several questionnaires, from which a Coping Humor Scale value x
and a Depression Scale value y were determined (“Depression and Sense of Humor,” (Psychological Reports
[1994]: 1473–1474). The resulting value of the sample correlation coefficient was 2.18.
a. The investigators reported that P-value , .05. Do
you agree?
b. Is the sign of r consistent with your intuition? Explain.
(Higher scale values correspond to more developed
sense of humor and greater extent of depression.)
c. Would the simple linear regression model give accurate predictions? Why or why not?
13.58 Example 13.4 gave data on x 5 treadmill run
time to exhaustion and y 5 20-km ski time for a sample
of 11 biathletes. Use the accompanying Minitab output
to answer the following questions.
The regression equation is
ski = 88.8 – 2.33tread
Predictor
Coef
Stdev
t-ratio p
Constant
88.796
5.750
15.44 0.000
tread
–2.3335 0.5911 –3.95 0.003
s = 2.188
R-sq = 63.4%
R-sq(adj) = 59.3%
Bold exercises answered in back
Data set available online
Analysis of Variance
Source
Regression
Error
Total
DF
1
9
10
SS
MS
F
74.630 74.630 15.58
43.097 4.789
117.727
a. Carry out a test at significance level .01 to decide
whether the simple linear regression model is useful.
b. Estimate the average change in ski time associated
with a 1-minute increase in treadmill time, and do
so in a way that conveys information about the precision of estimation.
c. Minitab reported that sa1b1102 5 .689. Predict ski
time for a single biathlete whose treadmill time is 10
minutes, and do so in a way that conveys information about the precision of prediction.
d. Minitab also reported that sa1b1112 5 1.029. Why is
x
this larger than sa1b1102?
13.59 A sample of n 5 61 penguin burrows was selected, and values of both y 5 trail length (m) and x 5
soil hardness (force required to penetrate the substrate to
a depth of 12 cm with a certain gauge, in kg) were determined for each one (“Effects of Substrate on the Distribution of Magellanic Penguin Burrows,” The Auk
[1991]: 923–933). The equation of the least-squares line
was y^ 5 11.607 2 1.4187x, and r 2 5 .386.
a. Does the relationship between soil hardness and trail
length appear to be linear, with shorter trails associated with harder soil (as the article asserted)? Carry
out an appropriate test of hypotheses.
b. Using se 5 2.35, x 5 4.5, and g 1x 2 x 2 2 5 250,
predict trail length when soil hardness is 6.0 in a way
that conveys information about the reliability and
precision of the prediction.
c. Would you use the simple linear regression model to
predict trail length when hardness is 10.0? Explain
your reasoning.
The article “Photocharge Effects in Dye Sensitized Ag[Br,I] Emulsions at Millisecond Range Exposures” (Photographic Science and Engineering [1981]:
138–144) gave the accompanying data on x 5 % light
13.60
absorption and y 5 peak photovoltage.
x 4.0 8.7 12.7 19.1 21.4 24.6 28.9 29.8 30.5
y 0.12 0.28 0.55 0.68 0.85 1.02 1.15 1.34 1.29
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Chapter Review Exercises
a.
b.
c.
d.
e.
f.
g.
2
a x 5 179.7 a x 5 4334.41
2
a y 5 7.28 a y 5 7.4028 a xy 5 178.683
Construct a scatterplot of the data. What does it
suggest?
Assuming that the simple linear regression model is
appropriate, obtain the equation of the estimated
regression line. y^ 5 20.08259 1 0.4464x
How much of the observed variation in peak photovoltage can be explained by the model relationship?
Predict peak photovoltage when percent absorption
is 19.1, and compute the value of the corresponding
residual.
The authors claimed that there is a useful linear relationship between the two variables. Do you agree?
Carry out a formal test.
Give an estimate of the average change in peak photovoltage associated with a 1 percentage point increase in
light absorption. Your estimate should convey information about the precision of estimation.
Give an estimate of mean peak photovoltage when
percentage of light absorption is 20, and do so in a
way that conveys information about precision.
13.61
Reduced visual performance with increasing
age has been a much-studied phenomenon in recent
years. This decline is due partly to changes in optical
properties of the eye itself and partly to neural degeneration throughout the visual system. As one aspect of this
problem, the article “Morphometry of Nerve Fiber
Bundle Pores in the Optic Nerve Head of the Human”
(Experimental Eye Research [1988]: 559–568) presented the accompanying data on x 5 age and y 5 percentage of the cribriform area of the lamina scleralis occupied by pores.
x
y
22
75
25
62
27
50
39
49
42
54
43
49
44
59
46
47
x
y
48
52
50
58
57
49
58
52
63
49
63
31
74
42
74
41
46
54
a. Suppose that prior to this study the researchers had
believed that the average decrease in percentage area
associated with a 1-year age increase was .5%. Do the
data contradict this prior belief? State and test the appropriate hypotheses using a .10 significance level.
b. Estimate true average percentage area covered by
pores for all 50-year-olds in the population in a way
that conveys information about the precision of
estimation.
Bold exercises answered in back
Data set available online
663
Occasionally an investigator may wish to compute a confidence interval for a, the y intercept of the
true regression line, or test hypotheses about a. The estimated y intercept is simply the height of the estimated
line when x 5 0, since a 1 b(0) 5 a. This implies that
sa the estimated standard deviation of the statistic a, results from substituting x* 5 0 in the formula for sa1bx*.
The desired confidence interval is then
13.62
a 6 1t critical value2 sa
and a test statistic is
t5
a 2 hypothesized value
sa
a. The article “Comparison of Winter-Nocturnal Geo-
stationary Satellite Infrared-Surface Temperature
with Shelter-Height Temperature in Florida” (Remote Sensing of the Environment [1983]: 313–327)
used the simple linear regression model to relate surface temperature as measured by a satellite (y) to actual air temperature (x) as determined from a thermocouple placed on a traversing vehicle. Selected data are
given (read from a scatterplot in the article).
x
22
21
0
1
2
3
4
y 23.9 22.1 22.0 21.2 0.0 1.9 0.6
x
5
6
7
y
2.1
1.2
3.0
Estimate the population regression line.
b. Compute the estimated standard deviation sa. Carry
out a test at level of significance .05 to see whether
the y intercept of the population regression line differs from zero.
c. Compute a 95% confidence interval for a. Does the
result indicate that a 5 0 is plausible? Explain.
In some studies, an investigator has n (x, y)
pairs sampled from one population and m (x, y) pairs
from a second population. Let b and br denote the
slopes of the first and second population lines, respectively, and let b and bЈ denote the estimated slopes calculated from the first and second samples, respectively. The
investigator may then wish to test the null hypothesis
H0: b 2 br 5 0 (that is, b 5 br) against an appropriate
alternative hypothesis. Suppose that s 2, the variance
about the population line, is the same for both populations. Then this common variance can be estimated by
13.63
s2 5
SSR
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
664
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
where SSResid and SSResid9 are the residual sums of
squares for the first and second samples, respectively. With
Sxx and Srxx denoting the quantity g 1x 2 x 2 2 for the first
and second samples, respectively, the test statistic is
t5
b 2 br
s2
s2
1
Å Sxx
Srxx
When H0 is true, this statistic has a t distribution based
on 1n 1 m 2 42 df .
The data below are a subset of the data in the article
“Diet and Foraging Model of Bufa marinus and Leptodactylus ocellatus” (Journal of Herpetology [1984]:
138–146). The independent variable x is body length
(cm) and the dependent variable y is mouth width (cm),
with n 5 9 observations for one type of nocturnal frog
and m 5 8 observations for a second type. Carry out a
test to determine if the slopes of the true regression lines
for the two different frog populations are equal. Use a
sigificance level of .05. (Summary statistics are also given
in the accompanying table.)
Leptodactylus ocellatus
x
y
3.8 4.0 4.9 7.1 8.1 8.5 8.9 9.1 9.8
1.0 1.2 1.7 2.0 2.7 2.5 2.4 2.9 3.2
Bufa marinus
x
y
Data Set
Variable
3.8
1.6
4.3
1.7
6.2
2.3
6.3
2.5
7.8
3.2
8.5
3.0
9.0
3.5
Leptodactylus
Bufa
9
64.2
500.78
19.6
47.28
153.36
8
55.9
425.15
21.6
62.92
163.36
Sample size:
gx
gx2
gy
gy 2
gxy
10.0
3.8
1
y
2
y
3
y
4
x
4
y
11.0
14.0
6.0
4.0
12.0
7.0
5.0
8.33
9.96
7.24
4.26
10.84
4.82
5.68
9.26
8.10
6.13
3.10
9.13
7.26
4.74
7.81
8.84
6.08
5.39
8.15
6.42
5.73
8.0
8.0
8.0
19.0
8.0
8.0
8.0
8.47
7.04
5.25
12.50
5.56
7.91
6.89
For each of these data sets, the values of the summary
quantities x, y, g 1x 2 x 2 2, and g 1x 2 x 2 1 y 2 y 2 are
identical, so all quantities computed from these will be
identical for the four sets: the estimated regression line,
SSResid, se, r 2, and so on. The summary quantities provide no way of distinguishing among the four data sets.
Based on a scatterplot for each set, comment on the
appropriateness or inappropriateness of fitting the simple
linear regression model in each case.
13.65 The accompanying scatterplot, based on 34 sediment samples with x 5 sediment depth (cm) and y 5 oil
and grease content (mg/kg), appeared in the article
“Mined Land Reclamation Using Polluted Urban Navigable Waterway Sediments” (Journal of Environmental Quality [1984]: 415–422). Discuss the effect that the
observation (20, 33,000) will have on the estimated regression line. If this point were omitted, what can you
say about the slope of the estimated regression line?
What do you think will happen to the slope if this observation is included in the computations?
Oil and grease
(mg/kg)
32,000
28,000
Consider the following four (x, y) data sets: the
first three have the same x values, so these values are
listed only once (from “Graphs in Statistical Analysis,”
American Statistician [1973]: 17–21).
13.64
Data Set
Variable
1–3
x
1–3
x
1
y
2
y
3
y
4
x
4
y
10.0
8.0
13.0
9.0
8.04
6.95
7.58
8.81
9.14
8.14
8.74
8.77
7.46
6.77
12.74
7.11
8.0
8.0
8.0
8.0
6.58
5.76
7.71
8.84
24,000
20,000
16,000
12,000
8,000
4,000
0
30
60
90
120
150
Subsample mean depth (cm)
180
(continued)
Bold exercises answered in back
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
665
Chapter Review Exercises
13.66
The article “Improving Fermentation Produc-
600
on y 5 glucose concentration (g/L) and x 5 fermentation time (days) for a blend of malt liquor.
x
y
1
74
2
54
3
52
4
51
5
52
6
53
7
58
8
71
a. Use the data to calculate the estimated regression
line. y^ 5 57.964 1 0.0357x
b. Do the data indicate a linear relationship between y
and x? Test using a .10 significance level.
c. Using the estimated regression line of Part (a), compute the residuals and construct a plot of the residuals versus x (that is, of the (x, residual) pairs).
d. Based on the plot in Part (c), do you think that the
simple linear regression model is appropriate for describing the relationship between y and x? Explain.
13.67 The employee relations manager of a large company was concerned that raises given to employees during a recent period might not have been based strictly on
objective performance criteria. A sample of n 5 20 employees was selected, and the values of x, a quantitative
measure of productivity, and y, the percentage salary increase, were determined for each one. A computer package was used to fit the simple linear regression model,
and the resulting output gave the P-value 5 .0076 for
the model utility test. Does the percentage raise appear
to be linearly related to productivity? Explain.
Biomass (g/m2)
tivity with Reverse Osmosis” (Food Technology [1984]:
92–96) gave the following data (read from a scatterplot)
400
200
0
0
5
10
Soil depth (cm)
15
20
13.69 Give a brief answer, comment, or explanation
for each of the following.
a. What is the difference between e1, e2, . . . , en and the
n residuals?
b. The simple linear regression model states that
y 5 a 1bx.
c. Does it make sense to test hypotheses about b?
d. SSResid is always positive.
e. A student reported that a data set consisting of
n 5 6 observations yielded residuals 2, 0, 5, 3, 0,
and 1 from the least-squares line.
f. A research report included the following summary
quantities obtained from a simple linear regression
analysis:
2
2
a 1 y 2 y 2 5 615 a 1 y 2 y^ 2 5 731
13.68 The figure at the top of the page is based on data
from the article “Root and Shoot Competition Intensity Along a Soil Depth Gradient” (Ecology [1995]:
673–682). It shows the relationship between aboveground biomass and soil depth within the experimental
plots. The relationship is described by the estimated regression equation: biomass 5 29.85 1 25.29(soil depth)
and r 2 5 .65; P , 0.001; n 5 55. Do you think the
simple linear regression model is appropriate here? Explain. What would you expect to see in a plot of the
standardized residuals versus x?
Bold exercises answered in back
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
666
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
Cumulative Review Exercises CR13.1-CR13.18
CR13.1 The article “You Will Be Tested on This” (The
Chronicle of Higher Education, June 8, 2007) describes
an experiment to investigate the effect of quizzes on student learning. The goal of the experiment was to determine if students who take daily quizzes have better endof-semester retention than students who attend the same
lectures and complete the same homework assignments
but who do not take the daily quizzes. Describe how you
would design such an experiment using the 400 students
enrolled in an introductory psychology course as
subjects.
CR13.2 The paper “Pistachio Nut Consumption and
Serum Lipid Levels” (Journal of the American College
of Nutrition [2007]: 141–148) describes a study to determine if eating pistachio nuts can have an effect on blood
cholesterol levels in people with high cholesterol. Fifteen
subjects followed their regular diet for 4 weeks and then
followed a diet in which 15% of the daily caloric intake
was from pistachio nuts for 4 weeks. Total blood cholesterol was measured for each subject at the end of each of
the 2 four-week periods, resulting in two samples (one
for the regular diet and one for the pistachio diet).
a. Are the two samples independent or paired?
Explain.
b. The mean difference in total cholesterol (regular
diet—pistachio diet) was 11 mg/dL. The standard
deviation of the differences was 24 mg/dL. Assume
that it is reasonable to regard the 15 study participants as representative of adults with high cholesterol and that total cholesterol differences are approximately normally distributed. Do the data
support the claim that eating the pistachio diet for
4 weeks is effective in reducing total cholesterol
level? Test the relevant hypotheses using a 5 .01.
The article “Fines Show Airline Problems”
(USA Today, February 2, 2010) gave the accompanying
data on the number of fines for violating FAA maintenance regulations assessed against each of the 25 U.S.
airlines from 2004 to 2009.
CR13.3
1
4
2
12
10
2
3
6
1
7
2
0
23
2
0
36
2
6
2
14
2
1
3
3
1
a. Construct a boxplot of these data. Are any of the observations in the data set outliers? If so, which ones?
Bold exercises answered in back
b. Explain why it may not be reasonable to assume that
the two airlines with the highest number of fines assessed are the worst airlines in terms of maintenance
violations.
CR13.4 The article “Odds Are, It’s Wrong” (Science
News, March 27, 2010) poses the following scenario:
Suppose that a test for steroid use among baseball
players is 95% accurate—that is, it correctly identifies actual steroid users 95% of the time, and misidentifies non-users as users 5 percent of the
time. . . . Now suppose, based on previous testing,
that experts have established that about 5 percent
of professional baseball players use steroids.
Answer the following questions for this scenario.
a. If 400 professional baseball players are selected at
random, how many would you expect to be steroid
users and how many would you expect to be
non-users?
b. How many of the steroid users would you expect to
test positive for steroid use?
c. How many of the players who do not use steroids
would you expect to test positive for steroid use (a
false positive)?
d. Use your answers to Parts (b) and (c) to estimate the
proportion of those who test positive for steroid use
who actually do use steroids.
e. Write a few sentences explaining why, in this scenario, the proportion of those who test positive for
steroid use who actually use steroids is not .95.
CR13.5 The press release “Luxury or Necessity? The
Public Makes a U-Turn” (Pew Research Center, April
23, 2009) summarizes results from a survey of a nationally representative sample of n 5 1003 adult Americans.
a. One question in the survey asked participants if they
think of a landline phone as a necessity or as a luxury
that they could do without. Sixty-eight percent said
they thought a landline phone was a necessity. Estimate the proportion of adult Americans who view a
landline phone as a necessity using a 95% confidence interval.
b. In the same survey, 52% said they viewed a television set as a necessity. Is there convincing evidence
that a majority of adult Americans view a television
set as a necessity? Test the relevant hypotheses using
a 5 .05.
Data set available online
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Cumulative Review Exercises
c. The press release also described a survey conducted in
2003. When asked about a microwave oven, 68% of
the 2003 sample regarded a microwave oven as a necessity, whereas only 47% of the 2009 sample said
they thought a microwave oven was a necessity. Assume that the sample size for the 2003 survey was also
1003. Is there convincing evidence that the proportion of adult Americans who regard a microwave oven
as a necessity decreased between 2003 and 2009? Test
the appropriate hypotheses using a 5 .01.
CR13.6 The accompanying graphical display appeared
in USA Today (February 19, 2010). It is meant to be a
pie chart, but an oval rather than a circle is used to represent the whole pie. Do you think this graph does a
good job of conveying the proportion falling into each of
the three response categories? Explain why or why not.
USA TODAY Snapshots®
Number of Checked Bags
Before fees
After Fees
0
1
2 or more
7
22
70
64
23
14
Consider the following data on y 5 number
of songs stored on an MP3 player and x 5 number of
months the user has owned the MP3 player for a sample
of 15 owners of MP3 players.
Since they entered
the profession,
care has:
Improved
26%
19%
Declined
55%
Source: AMN
Healthcare’s
2010 Survey of
Registered Nurses
By Anne R. Carey and Suzy Parker, USA TODAY
CR13.7 The following quote describing 18- to 29-yearolds is from the article “Study: Millennial Generation
More Educated, Less Employed” (USA Today, February
23, 2010): “38% have a tattoo (and half of those with
tattoos have two to five; 18% have six or more).” These
percentages were based on a representative sample of 830
Americans age 18 to 29, but for purposes of this exercise,
suppose that they hold for the population of all Americans in this age group. Define the random variable x 5
number of tattoos for a randomly selected American age
18 to 29. Find the following probabilities:
a. P 1x 5 02
b. P 1x 5 12
c. P 12 # x # 52
d. P 1x . 52
Bold exercises answered in back
CR13.8 To raise revenues, many airlines now charge
fees to check luggage. Suppose that the number of
checked bags was recorded for each person in a random
sample of 100 airline passengers selected before fees were
imposed and also for each person in a random sample of
100 airline passengers selected after fees were imposed,
resulting in the accompanying data. Do the data provide
convincing evidence that the proportions in each of the
number of checked bags categories is not the same before
and after fees were imposed? Test the appropriate hypotheses using a significance level of .05.
CR13.9
What nurses say about
nursing care today
Remained
the same
667
Data set available online
x
y
23
35
2
28
5
32
23
10
4
26
1
8
13
9
5
486
747
81
581
117
728
445
128
61
476
35
121
266
126
141
a. Construct a scatterplot of the data. Does the relationship between x and y look approximately linear?
b. What is the equation of the estimated regression line?
c. Do you think that the assumptions of the simple
linear regression model are reasonable? Justify your
answer using appropriate graphs.
d. Is the simple linear regression model useful for describing the relationship between x and y? Test the
relevant hypotheses using a significance level of .05.
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
668
Chapter 13 Simple Linear Regression and Correlation: Inferential Methods
CR13.10 Many people take ginkgo supplements advertised to improve memory. Are these over-the-counter
supplements effective? In a study reported in the paper
“Ginkgo for Memory Enhancement” (Journal of the
American Medical Association [2002]: 835–840),
elderly adults were assigned at random to either a treatment
group or a control group. The 104 participants who were
assigned to the treatment group took 40 mg of ginkgo
three times a day for 6 weeks. The 115 participants assigned to the control group took a placebo pill three times
a day for 6 weeks. At the end of 6 weeks, the Wechsler
Memory Scale (a test of short-term memory) was administered. Higher scores indicate better memory function.
Summary values are given in the following table.
Ginkgo
Placebo
n
x
s
104
115
5.6
5.5
.6
.6
Based on these results, is there evidence that taking 40
mg of ginkgo three times a day is effective in increasing
mean performance on the Wechsler Memory Scale? Test
the relevant hypotheses using a 5 .05.
The Harvard University Institute of Politics
surveys undergraduates across the United States annually. Responses to the question “When it comes to voting, do you consider yourself to be affiliated with the
Democratic Party, the Republican Party, or are you Independent or unaffiliated with a major party?” for the
survey conducted in 2003, 2004, and 2005 are summarized in the given table. The samples for each year were
independently selected and are considered to be representative of the population of undergraduate students in
the year the survey was conducted. Is there evidence that
the distribution of political affiliation is not the same for
all three years for which data are given?
time, most of the time, some of the time, or never. Use
the data in the given table and an appropriate hypothesis
test to determine if there is evidence that trust in the
President was not the same in 2005 as it was in 2002.
Year
Response
Political Affiliation
Democrat
Republican
Independent/
unaffiliated
Other
2005
2004
2003
397
301
458
409
349
397
325
373
457
60
48
48
The survey described in the previous exercise
also asked the following question: “Please tell me whether
you trust the President to do the right thing all of the
CR13.12
Bold exercises answered in back
Data set available online
2002
132
337
554
169
180
528
396
96
All of the time
Most of the time
Some of the time
Never
The report “Undergraduate Students and
Credit Cards in 2004” (Nellie Mae, May 2005) included
information collected from individuals in a random sample
of undergraduate students in the United States. Students
were classified according to region of residence and whether
or not they have one or more credit cards, resulting in the
accompanying two-way table. Carry out a test to determine
if there is evidence that region of residence and having a
credit card are not independent. Use a 5 .05.
CR13.13
Credit Card?
CR13.11
Year
2005
Region
Northeast
Midwest
South
West
At Least One
Credit Card
No Credit Cards
401
162
408
104
164
36
115
23
CR13.14
The report described in the previous exercise
also classified students according to region of residence
and whether or not they had a credit card with a balance
of more than $7000. Do these data support the conclusion that there is an association between region of residence and whether or not the student has a balance exceeding $7000? Test the relevant hypotheses using a .01
significance level.
Balance Over $7000?
Region
No
Yes
Northeast
Midwest
South
West
28
162
42
9
537
182
481
118
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Cumulative Review Exercises
CR13.15 The discharge of industrial wastewater into
rivers affects water quality. To assess the effect of a particular power plant on water quality, 24 water specimens
were taken 16 km upstream and 4 km downstream of the
plant. Alkalinity (mg/L) was determined for each specimen, resulting in the summary quantities in the accompanying table. Do the data suggest that the true mean
alkalinity is higher downstream than upstream by more
than 50 mg/L? Use a .05 significance level.
Location
n
Mean
Upstream
Downstream
24
24
75.9
183.6
Standard
Deviation
1.83
1.70
CR13.16
The report of a European commission on
radiation protection titled “Cosmic Radiation Exposure
of Aircraft Crew” (2004) measured the exposure to radiation on eight international flights from Madrid using
several different methods for measuring radiation. Data
for two of the methods are given in the accompanying
table. Use these data to test the hypothesis that there is
no significant difference in mean radiation measurement
for the two methods.
Flight
Method 1
Method 2
1
2
3
4
5
6
7
8
27.5
41.3
3.5
24.3
27.0
17.7
12.0
20.9
34.4
38.6
3.5
21.9
24.4
21.4
11.8
24.1
CR13.17 It is hypothesized that when homing pigeons
are disoriented in a certain manner, they will exhibit no
Bold exercises answered in back
Data set available online
669
preference for any direction of flight after takeoff. To test
this, 120 pigeons are disoriented and released, and the
direction of flight of each is recorded. The resulting data
are given in the accompanying table.
Direction
08 to , 458
458 to , 908
908 to , 1358
1358 to , 1808
1808 to , 2258
2258 to , 2708
2708 to , 3158
3158 to , 3608
Frequency
12
16
17
15
13
20
17
10
Use the goodness-of-fit test with significance level .10 to
determine whether the data are consistent with this hypothesis.
CR 13.18 The authors of the paper “Inadequate Physician Knowledge of the Effects of Diet on Blood Lipids
and Lipoproteins” (Nutrition Journal [2003]: 19–26)
summarize the responses to a questionnaire on basic
knowledge of nutrition that was mailed to 6000 physicians selected at random from a list of physicians licensed
in the United States. Sixteen percent of those who received the questionnaire completed and returned it. The
authors report that 26 of 120 cardiologists and 222 of
419 internists did not know that carbohydrate was the
diet component most likely to raise triglycerides.
a. Estimate the difference between the proportion of
cardiologists and the proportion of internists who
did not know that carbohydrate was the diet component most likely to raise triglycerides using a 95%
confidence interval.
b. What potential source of bias might limit your ability to generalize the estimate from Part (a) to the
populations of all cardiologists and all internists?
Video Solution available
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER
14
Multiple
Regression
Analysis
© David Zimmerman/Getty Images
The general objective of regression analysis is to model the
relationship between a dependent variable y and one or
more independent (i.e., predictor or explanatory) variables.
The simple linear regression model y ϭ a ϩ bx ϩ e, discussed in Chapter 13, has been used successfully by many
investigators in a wide variety of disciplines to relate y to a
single predictor variable x. In many situations, the relationship between y and any single predictor variable is not
strong, but knowing the values of several independent variables may considerably reduce uncertainty about the associated y value. For example, some variation in house prices
in a large city can certainly be attributed to house size, but
knowledge of size by itself would not usually enable a bank
appraiser to accurately predict a home’s value. Price is also determined to some extent
by other variables, such as age, lot size, number of bedrooms and bathrooms, and
distance from schools.
In this chapter, we extend the regression methodology developed in the previous
chapter to multiple regression models, which include at least two predictor variables.
Make the most of your study time by accessing everything you need to succeed
online with CourseMate.
Visit http://www.cengagebrain.com where you will find:
• An interactive eBook, which allows you to take notes, highlight, bookmark, search
•
•
•
•
•
•
the text, and use in-context glossary definitions
Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP
Video solutions to selected exercises
Data sets available for selected examples and exercises
Online quizzes
Flashcards
Videos
671
Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.