5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions
Tải bản đầy đủ - 0trang
Inference About Simple Regression
20
Residual
14.25 Example 2.17 (p. 49) described the heights of a
sample of married British women. The ages of the
women and their husbands were available in the
same dataset for n ϭ 170 couples. For the sample,
the linear regression line relating y ϭ husband’s age
and x ϭ wife’s age is yˆ ϭ 3.59 ϩ 0.967x. The following
results computed with Minitab give a conﬁdence interval and prediction interval for husband’s age when
wife’s age is 40:
629
10
0
–10
SE Fit
0.313
95.0% CI
(41.641, 42.875)
95.0% PI
(34.200, 50.316)
a. Verify that the “Fit” is consistent with the predicted value that would be given by the regression
equation.
b. Interpret the “95% CI” given by Minitab. Be speciﬁc about what the interval estimates.
c. Interpret the “95% PI” given by Minitab. Be speciﬁc about what the interval estimates.
d. Explain why the “95% PI” is much wider than the
“95% CI.”
14.26 Refer to Exercise 14.25. Using the general format for a
95% conﬁdence interval, verify the conﬁdence interval for the mean given by Minitab. Notice that the
standard error of the “Fit” is given.
14.27 Suppose a linear regression analysis of the relationship between y ϭ systolic blood pressure and x ϭ age
is done for women between 40 and 60 years old. For
women who are 45 years old, a 90% conﬁdence interval for E(Y ) is determined to be 128.2 to 131.3. Explain why it is incorrect to conclude that about 90% of
women who are 45 years old have a systolic blood
pressure in this range. Write a sentence that correctly
interprets the interval.
45
55
85
75
65
Pulse before marching
95
Figure for Exercise 14.29
14.30 ● ◆ The ﬁgure for this exercise is a histogram of the
residuals for a linear regression relating y ϭ height
(in.) and x ϭ foot length (cm) for a sample of college
men. Discuss what the histogram indicates about
Conditions 2 and 4 for linear regression listed at the
beginning of Section 14.5. (Data source: heightfoot
dataset on the CD for this book.)
10
Frequency
Fit
42.258
5
0
–5
0
5
15
10
Residual
14.28 ● There are ﬁve conditions listed at the beginning of
Section 14.5 that should be at least approximately
true for linear regression. Which of the conditions
can be checked by using each of the following methods? In each case, list all of them that can be checked.
a. Drawing a histogram of the residuals.
b. Drawing a scatterplot of the residuals versus the
x values.
c. Learning how the data were collected.
d. Drawing a scatterplot of the raw data, y versus x.
14.29 ● ◆ Refer to Exercise 14.24 about a linear regression
for y ϭ pulse after marching in place and x ϭ pulse
before marching in place. The ﬁgure for this exercise
is a plot of residuals versus the pulse before marching
for a sample of 40 students. Discuss what the plot indicates about Conditions 1, 2, and 3 for linear regression listed at the beginning of Section 14.5.
● Basic skills
◆ Dataset available but not required
14.31 The ﬁgure accompanying this exercise is a histogram
of the residuals for a simple linear regression. What
does this plot indicate about the necessary conditions for doing a linear regression? Be speciﬁc about
which of the ﬁve necessary conditions are veriﬁed in
this ﬁgure.
20
Frequency
Section 14.5
10
0
–4
–3
–2
–1
1
0
Residual
2
Bold-numbered exercises answered in the back
3
4
630
Chapter 14
14.32 Plot residuals versus x for the data given in Exercise 14.13.
14.33 ◆ Observed data along with the sample regression
line for the relationship between body weight (lb.)
and neck girth (in.) in 19 female bears of various ages
are shown in the ﬁgure below. (Note: The data are in
the dataset bears-female on the CD for this book.)
Weight = –158.8 + 16.95 Neck girth
s = 40.1264 R 2 = 79.3%
350
300
Chapter Exercises
250
Weight (lb)
a. Which of the ﬁve necessary conditions for linear
regression appears to be violated in this situation?
b. What corrective action would you take to correctly
estimate the connection between stopping distance and vehicle speed?
c. Make a sketch that illustrates the pattern of a plot
of residuals versus speed for the data shown in the
ﬁgure in this exercise. Your sketch does not need to
be numerically accurate but should correctly show
the pattern of the plot.
14.35 Data for y ϭ hours of sleep the previous day and x ϭ
hours of studying the previous day for n ϭ 116 college
students were shown in Figure 5.14 (p. 168) and described in Example 5.14. Some regression results for
those data are as follows:
200
150
100
50
0
10
20
Neck girth (in.)
30
a. Which of the necessary conditions for linear regression appears to be violated in this dataset?
b. What corrective actions would you consider in order to properly estimate the relationship between
body weight and neck girth?
c. Sketch a histogram that illustrates the pattern of
the distribution of the residuals for this problem.
Your sketch does not have to be accurate in the
numerical details but should correctly show the
pattern.
14.34 The ﬁgure for this exercise shows data for the relationship between the average stopping distance (ft)
of a car when the brakes are applied and vehicle
speed (mph). The regression line for these data is also
shown on the plot. (Note: The raw data were given in
Exercise 5.7.)
Distance = –44.2 + 5.67 Speed
s = 34.2 R 2 = 95.1%
400
Speed (ft)
300
200
100
0
0
10
● Basic skills
20
30
40
Distance (mph)
50
60
70
◆ Dataset available but not required
The regression equation is
Sleep = 7.56 – 0.269 Study
Predictor
Constant
study
S = 1.509
Coef
7.5555
Ϫ0.26917
R-Sq = 12.7%
SE Coef
0.2239
0.06616
T
33.74
——
P
0.000
0.000
R-Sq(adj) = 11.9%
a. What is the estimated mean decrease in hours of
sleep per 1-hour increase in hours of studying?
What notation is used for this value?
b. Calculate an approximate 95% conﬁdence interval
for b1. Write a sentence that interprets this interval.
c. Using proper statistical notation, write null and
alternative hypotheses for assessing the statistical
signiﬁcance of the relationship.
d. We omitted from the output the t-statistic for testing the hypotheses of part (c). Compute the value
of this t-statistic using other information shown
in the output. What are the degrees of freedom for
this t-statistic?
e. Is there a statistically signiﬁcant relationship between hours of sleep and hours of studying? Justify
your answer on the basis of information shown in
the output.
14.36 Refer to Exercise 14.35 about hours of sleep and hours
of study.
a. What is the value of the standard deviation from
the regression line? Write a sentence that interprets this value.
b. Calculate the predicted value of hours of sleep the
previous day for a student who studied 4 hours the
previous day.
c. Using the Empirical Rule, determine an interval
that describes hours of sleep for approximately
95% of students who studied 4 hours the previous day.
Bold-numbered exercises answered in the back
631
Inference About Simple Regression
d. The value of R 2 is given as 12.7%. Write a sentence
that interprets this value.
14.37 Refer to Exercises 14.35 and 14.36 about hours of
sleep and hours spent studying. What is the intercept
of the regression line? Does this value have a useful
interpretation in the context of this problem? If so,
what is the interpretation? If not, why not?
14.38 Regression results for the relationship between y ϭ
hours of sleep the previous day and x ϭ hours
spent studying the previous day were given in Exercise 14.35. The ﬁgure for this exercise is a plot of
residuals versus hours spent studying. What does the
plot indicate about the necessary conditions for doing a linear regression? Be speciﬁc about which of the
ﬁve conditions given in Section 14.5 are veriﬁed by
this plot.
x ؍Quizzes
y ؍Exam
80
72
68
71
94
96
72
77
74
82
83
72
56
58
68
83
65
78
75
80
88
92
a. Plot the data, and describe the important features
of this plot.
b. Using statistical software, calculate the regression
line for this sample.
c. What is the predicted midterm exam score for a
student with a quiz average equal to 75?
d. With statistical software, determine a 50% prediction interval for the midterm exam score of a student whose quiz average was 75.
14.42 ◆ This exercise refers to the following Minitab output, relating y ϭ son’s height to x ϭ father’s height for
a sample of n ϭ 76 college males. (Note: The data are
in the dataset UCDavis1 on the CD for this book.)
4
3
The regression equation is
Height = 30.0 + 0.576 dadheight
Residual
2
1
76 cases used 3 cases contain missing values
0
Predictor
Constant
dadheigh
Coef
29.981
0.57568
S = 2.657
R-Sq = 44.7%
–1
SE Coef
5.129
0.07445
T
5.85
7.73
P
0.000
0.000
–2
–3
Predicted Values [for dad’s heights of 65, 70, and 74]
–4
0
1
2
3
4
5
6
Hours of study
7
8
9
14.39 Exercise 14.35 gave linear regression results for the
relationship between y ϭ hours of sleep the previous
day and x ϭ hours spent studying the previous day.
Following is Minitab output showing a conﬁdence
interval and a prediction interval for hours of sleep
when hours of studying ϭ 3 hours:
Fit
6.748
SE Fit
0.142
95.0% CI
(6.466, 7.029)
95.0% PI
(3.746, 9.750)
Write down the interval given for predicting the
hours of sleep when hours of studying is 3 hours. Give
two different interpretations of this interval.
14.40 Refer to the output given in Exercise 14.8 about handspan and height. Compute a 90% conﬁdence interval
for the population slope. Write a sentence that interprets this interval.
14.41 The following data are x ϭ average on ﬁve quizzes
before the midterm exam and y ϭ score on the midterm exam for n ϭ 11 students randomly selected
from a multiple-section statistics class of about 950
students:
● Basic skills
◆ Dataset available but not required
Fit
67.400
70.279
75.581
SE Fit
0.415
0.318
0.494
95.0% CI
(66.574, 68.226)
(69.645, 70.913)
(71.596, 73.566)
95.0% PI
(62.041, 72.759)
(64.946, 75.612)
(67.195, 77.967)
a. What is the equation for the regression line?
b. Identify the value of the t-statistic for testing
whether or not the slope is 0. Verify that the value
is correct using the formula for the t-statistic and
the information provided by Minitab for the parts
that go into the formula.
c. State and test the hypotheses about whether or not
the population slope is 0. Use relevant information
provided in the output.
d. Compute a 95% conﬁdence interval for b1, the
slope of the relationship in the population. Write a
sentence that interprets this interval.
14.43 ◆ Refer to Exercise 14.42.
a. What is the value of R 2 for the observed linear relationship between height and father’s height? Write
a sentence that interprets this value.
b. What is the value of the correlation coefﬁcient r?
14.44 ◆ Refer to Exercises 14.42 and 14.43. The output provides prediction intervals and conﬁdence intervals
for father’s heights of 65, 70, and 74 inches.
a. Verify that the “Fit” given by Minitab for father’s
height of 65 inches is consistent with the predicted height that would be given by the regression
equation.
Bold-numbered exercises answered in the back
632
Chapter 14
b. Write down the interval Minitab provided for predicting an individual son’s height if his father’s
height is 70 inches. Provide two different interpretations for the interval.
c. Write a sentence that interprets the “95% CI”
given for a son’s height when the father’s height is
74 inches.
d. Explain why the prediction interval is much wider
than the corresponding conﬁdence interval for
each father’s height provided.
e. The accompanying ﬁgure is a plot of the residuals versus father’s height. Which of the ﬁve necessary conditions for regression listed in Section 14.5
appears to be violated? What corrective actions
should you consider?
14.48
Residual
5
0
14.49
–5
–10
55
65
Dad height (in.)
75
14.45 The ﬁve steps for hypothesis testing were given in
Chapters 12 and 13. Describe those steps as they
apply to testing whether there is a relationship between two variables in the simple regression model.
14.46 Explain why rejecting H0: b1 ϭ 0 in a simple linear
regression model does not prove that the relationship is linear. To answer this question, you might ﬁnd
it helpful to consider the ﬁgure in Exercise 14.34,
which shows stopping distance and vehicle speed for
automobiles.
14.50
Dataset Exercises
Datasets are required to solve these
exercises and can be found at http://1pass.thomson.com
or on your CD.
14.47 Use the dataset letters for this exercise. A sample
of 63 students wrote as many letters of the alphabet
in order, as capital letters, as they could in 15 seconds using their dominant hand and then repeated
this task using their nondominant hand. The variables dom and nondom contain the raw data for the
results.
a. Plot y ϭ dom versus x ϭ nondom. Describe the important features of the plot.
b. Compute the simple regression equation for the
relationship. What is the equation?
● Basic skills
◆ Dataset available but not required
14.51
c. What are the values of the standard deviation from
the regression line and R 2? Interpret these values
in the context of this problem.
d. Determine a 95% conﬁdence interval for the population slope. Write a sentence that interprets this
interval.
e. Consider the statement, “On average, a student
can write about 23 more letters in 15 seconds
with the dominant hand than with the nondominant hand.” What regression equation would accompany this statement? Based on your answers
to parts (b) and (d), explain why this statement is
reasonable.
Refer to the previous exercise about letters written
with the dominant ( y) and nondominant (x) hands.
a. Plot residuals versus x ϭ nondom. What does this
plot indicate about conditions for using the linear
regression model?
b. Create a histogram of the residuals. What does this
plot indicate about the conditions for using the
linear regression model?
Use the bears-female dataset for this exercise.
Weights (lb) and chest girths (in.) are given for n ϭ
19 female wild bears. The corresponding variable
names are Weight and Chest.
a. Plot y ϭ Weight versus x ϭ Chest. Describe the important features of the plot.
b. Compute a simple linear regression equation for
y ϭ Weight and x ϭ Chest.
c. What is the value of R 2 for this relationship? Write
a sentence interpreting this value.
d. What is the predicted weight for a bear with a chest
girth of 40 inches?
e. Compute a 95% prediction interval for the weight
of a bear with a chest girth of 40 inches. Write a
sentence interpreting this interval.
f. Compute a 95% conﬁdence interval for the mean
weight of bears with a chest girth of 40 inches.
Write a sentence interpreting this interval.
Use the dataset heightfoot for this exercise. Heights
(in.) and foot lengths (cm) are given for 33 men.
a. Plot y ϭ height versus x ϭ foot length. What important features are evident in the plot?
b. Omit any outliers evident in the plot in part (a),
and compute the linear regression line for predicting y ϭ height from x ϭ foot length.
c. Determine a 90% prediction interval for the height
of a man whose foot length is 28 centimeters.
d. Discuss whether height can be accurately predicted from foot length. Use regression results to
justify your answer.
e. For the data used for part (b), plot residuals versus x ϭ foot length. What does this plot indicate
about the necessary conditions for doing a linear
regression?
Refer to Exercise 14.50 about the relationship between height and foot length.
a. Do not omit any outliers. Use the complete
dataset to determine a 90% prediction interval
Bold-numbered exercises answered in the back