Tải bản đầy đủ - 0 (trang)
5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

Tải bản đầy đủ - 0trang

Inference About Simple Regression

20



Residual



14.25 Example 2.17 (p. 49) described the heights of a

sample of married British women. The ages of the

women and their husbands were available in the

same dataset for n ϭ 170 couples. For the sample,

the linear regression line relating y ϭ husband’s age

and x ϭ wife’s age is yˆ ϭ 3.59 ϩ 0.967x. The following

results computed with Minitab give a confidence interval and prediction interval for husband’s age when

wife’s age is 40:



629



10



0



–10

SE Fit

0.313



95.0% CI

(41.641, 42.875)



95.0% PI

(34.200, 50.316)



a. Verify that the “Fit” is consistent with the predicted value that would be given by the regression

equation.

b. Interpret the “95% CI” given by Minitab. Be specific about what the interval estimates.

c. Interpret the “95% PI” given by Minitab. Be specific about what the interval estimates.

d. Explain why the “95% PI” is much wider than the

“95% CI.”

14.26 Refer to Exercise 14.25. Using the general format for a

95% confidence interval, verify the confidence interval for the mean given by Minitab. Notice that the

standard error of the “Fit” is given.

14.27 Suppose a linear regression analysis of the relationship between y ϭ systolic blood pressure and x ϭ age

is done for women between 40 and 60 years old. For

women who are 45 years old, a 90% confidence interval for E(Y ) is determined to be 128.2 to 131.3. Explain why it is incorrect to conclude that about 90% of

women who are 45 years old have a systolic blood

pressure in this range. Write a sentence that correctly

interprets the interval.



45



55



85



75

65

Pulse before marching



95



Figure for Exercise 14.29



14.30 ● ◆ The figure for this exercise is a histogram of the

residuals for a linear regression relating y ϭ height

(in.) and x ϭ foot length (cm) for a sample of college

men. Discuss what the histogram indicates about

Conditions 2 and 4 for linear regression listed at the

beginning of Section 14.5. (Data source: heightfoot

dataset on the CD for this book.)



10



Frequency



Fit

42.258



5



0

–5



0



5



15



10



Residual



14.28 ● There are five conditions listed at the beginning of

Section 14.5 that should be at least approximately

true for linear regression. Which of the conditions

can be checked by using each of the following methods? In each case, list all of them that can be checked.

a. Drawing a histogram of the residuals.

b. Drawing a scatterplot of the residuals versus the

x values.

c. Learning how the data were collected.

d. Drawing a scatterplot of the raw data, y versus x.

14.29 ● ◆ Refer to Exercise 14.24 about a linear regression

for y ϭ pulse after marching in place and x ϭ pulse

before marching in place. The figure for this exercise

is a plot of residuals versus the pulse before marching

for a sample of 40 students. Discuss what the plot indicates about Conditions 1, 2, and 3 for linear regression listed at the beginning of Section 14.5.



● Basic skills



◆ Dataset available but not required



14.31 The figure accompanying this exercise is a histogram

of the residuals for a simple linear regression. What

does this plot indicate about the necessary conditions for doing a linear regression? Be specific about

which of the five necessary conditions are verified in

this figure.

20



Frequency



Section 14.5



10



0

–4



–3



–2



–1



1

0

Residual



2



Bold-numbered exercises answered in the back



3



4



630



Chapter 14



14.32 Plot residuals versus x for the data given in Exercise 14.13.

14.33 ◆ Observed data along with the sample regression

line for the relationship between body weight (lb.)

and neck girth (in.) in 19 female bears of various ages

are shown in the figure below. (Note: The data are in

the dataset bears-female on the CD for this book.)

Weight = –158.8 + 16.95 Neck girth

s = 40.1264 R 2 = 79.3%



350

300



Chapter Exercises



250

Weight (lb)



a. Which of the five necessary conditions for linear

regression appears to be violated in this situation?

b. What corrective action would you take to correctly

estimate the connection between stopping distance and vehicle speed?

c. Make a sketch that illustrates the pattern of a plot

of residuals versus speed for the data shown in the

figure in this exercise. Your sketch does not need to

be numerically accurate but should correctly show

the pattern of the plot.



14.35 Data for y ϭ hours of sleep the previous day and x ϭ

hours of studying the previous day for n ϭ 116 college

students were shown in Figure 5.14 (p. 168) and described in Example 5.14. Some regression results for

those data are as follows:



200

150

100

50

0

10



20

Neck girth (in.)



30



a. Which of the necessary conditions for linear regression appears to be violated in this dataset?

b. What corrective actions would you consider in order to properly estimate the relationship between

body weight and neck girth?

c. Sketch a histogram that illustrates the pattern of

the distribution of the residuals for this problem.

Your sketch does not have to be accurate in the

numerical details but should correctly show the

pattern.

14.34 The figure for this exercise shows data for the relationship between the average stopping distance (ft)

of a car when the brakes are applied and vehicle

speed (mph). The regression line for these data is also

shown on the plot. (Note: The raw data were given in

Exercise 5.7.)

Distance = –44.2 + 5.67 Speed

s = 34.2 R 2 = 95.1%



400



Speed (ft)



300

200

100

0

0



10



● Basic skills



20



30

40

Distance (mph)



50



60



70



◆ Dataset available but not required



The regression equation is

Sleep = 7.56 – 0.269 Study

Predictor

Constant

study

S = 1.509



Coef

7.5555

Ϫ0.26917

R-Sq = 12.7%



SE Coef

0.2239

0.06616



T

33.74

——



P

0.000

0.000



R-Sq(adj) = 11.9%



a. What is the estimated mean decrease in hours of

sleep per 1-hour increase in hours of studying?

What notation is used for this value?

b. Calculate an approximate 95% confidence interval

for b1. Write a sentence that interprets this interval.

c. Using proper statistical notation, write null and

alternative hypotheses for assessing the statistical

significance of the relationship.

d. We omitted from the output the t-statistic for testing the hypotheses of part (c). Compute the value

of this t-statistic using other information shown

in the output. What are the degrees of freedom for

this t-statistic?

e. Is there a statistically significant relationship between hours of sleep and hours of studying? Justify

your answer on the basis of information shown in

the output.

14.36 Refer to Exercise 14.35 about hours of sleep and hours

of study.

a. What is the value of the standard deviation from

the regression line? Write a sentence that interprets this value.

b. Calculate the predicted value of hours of sleep the

previous day for a student who studied 4 hours the

previous day.

c. Using the Empirical Rule, determine an interval

that describes hours of sleep for approximately

95% of students who studied 4 hours the previous day.



Bold-numbered exercises answered in the back



631



Inference About Simple Regression

d. The value of R 2 is given as 12.7%. Write a sentence

that interprets this value.

14.37 Refer to Exercises 14.35 and 14.36 about hours of

sleep and hours spent studying. What is the intercept

of the regression line? Does this value have a useful

interpretation in the context of this problem? If so,

what is the interpretation? If not, why not?

14.38 Regression results for the relationship between y ϭ

hours of sleep the previous day and x ϭ hours

spent studying the previous day were given in Exercise 14.35. The figure for this exercise is a plot of

residuals versus hours spent studying. What does the

plot indicate about the necessary conditions for doing a linear regression? Be specific about which of the

five conditions given in Section 14.5 are verified by

this plot.



x ‫ ؍‬Quizzes

y ‫ ؍‬Exam



80

72



68

71



94

96



72

77



74

82



83

72



56

58



68

83



65

78



75

80



88

92



a. Plot the data, and describe the important features

of this plot.

b. Using statistical software, calculate the regression

line for this sample.

c. What is the predicted midterm exam score for a

student with a quiz average equal to 75?

d. With statistical software, determine a 50% prediction interval for the midterm exam score of a student whose quiz average was 75.

14.42 ◆ This exercise refers to the following Minitab output, relating y ϭ son’s height to x ϭ father’s height for

a sample of n ϭ 76 college males. (Note: The data are

in the dataset UCDavis1 on the CD for this book.)



4

3



The regression equation is

Height = 30.0 + 0.576 dadheight



Residual



2

1



76 cases used 3 cases contain missing values



0



Predictor

Constant

dadheigh



Coef

29.981

0.57568



S = 2.657



R-Sq = 44.7%



–1



SE Coef

5.129

0.07445



T

5.85

7.73



P

0.000

0.000



–2

–3



Predicted Values [for dad’s heights of 65, 70, and 74]



–4

0



1



2



3



4

5

6

Hours of study



7



8



9



14.39 Exercise 14.35 gave linear regression results for the

relationship between y ϭ hours of sleep the previous

day and x ϭ hours spent studying the previous day.

Following is Minitab output showing a confidence

interval and a prediction interval for hours of sleep

when hours of studying ϭ 3 hours:



Fit

6.748



SE Fit

0.142



95.0% CI

(6.466, 7.029)



95.0% PI

(3.746, 9.750)



Write down the interval given for predicting the

hours of sleep when hours of studying is 3 hours. Give

two different interpretations of this interval.

14.40 Refer to the output given in Exercise 14.8 about handspan and height. Compute a 90% confidence interval

for the population slope. Write a sentence that interprets this interval.

14.41 The following data are x ϭ average on five quizzes

before the midterm exam and y ϭ score on the midterm exam for n ϭ 11 students randomly selected

from a multiple-section statistics class of about 950

students:



● Basic skills



◆ Dataset available but not required



Fit

67.400

70.279

75.581



SE Fit

0.415

0.318

0.494



95.0% CI

(66.574, 68.226)

(69.645, 70.913)

(71.596, 73.566)



95.0% PI

(62.041, 72.759)

(64.946, 75.612)

(67.195, 77.967)



a. What is the equation for the regression line?

b. Identify the value of the t-statistic for testing

whether or not the slope is 0. Verify that the value

is correct using the formula for the t-statistic and

the information provided by Minitab for the parts

that go into the formula.

c. State and test the hypotheses about whether or not

the population slope is 0. Use relevant information

provided in the output.

d. Compute a 95% confidence interval for b1, the

slope of the relationship in the population. Write a

sentence that interprets this interval.

14.43 ◆ Refer to Exercise 14.42.

a. What is the value of R 2 for the observed linear relationship between height and father’s height? Write

a sentence that interprets this value.

b. What is the value of the correlation coefficient r?

14.44 ◆ Refer to Exercises 14.42 and 14.43. The output provides prediction intervals and confidence intervals

for father’s heights of 65, 70, and 74 inches.

a. Verify that the “Fit” given by Minitab for father’s

height of 65 inches is consistent with the predicted height that would be given by the regression

equation.



Bold-numbered exercises answered in the back



632



Chapter 14

b. Write down the interval Minitab provided for predicting an individual son’s height if his father’s

height is 70 inches. Provide two different interpretations for the interval.

c. Write a sentence that interprets the “95% CI”

given for a son’s height when the father’s height is

74 inches.

d. Explain why the prediction interval is much wider

than the corresponding confidence interval for

each father’s height provided.

e. The accompanying figure is a plot of the residuals versus father’s height. Which of the five necessary conditions for regression listed in Section 14.5

appears to be violated? What corrective actions

should you consider?



14.48



Residual



5



0



14.49

–5



–10

55



65

Dad height (in.)



75



14.45 The five steps for hypothesis testing were given in

Chapters 12 and 13. Describe those steps as they

apply to testing whether there is a relationship between two variables in the simple regression model.

14.46 Explain why rejecting H0: b1 ϭ 0 in a simple linear

regression model does not prove that the relationship is linear. To answer this question, you might find

it helpful to consider the figure in Exercise 14.34,

which shows stopping distance and vehicle speed for

automobiles.



14.50



Dataset Exercises

Datasets are required to solve these

exercises and can be found at http://1pass.thomson.com

or on your CD.



14.47 Use the dataset letters for this exercise. A sample

of 63 students wrote as many letters of the alphabet

in order, as capital letters, as they could in 15 seconds using their dominant hand and then repeated

this task using their nondominant hand. The variables dom and nondom contain the raw data for the

results.

a. Plot y ϭ dom versus x ϭ nondom. Describe the important features of the plot.

b. Compute the simple regression equation for the

relationship. What is the equation?



● Basic skills



◆ Dataset available but not required



14.51



c. What are the values of the standard deviation from

the regression line and R 2? Interpret these values

in the context of this problem.

d. Determine a 95% confidence interval for the population slope. Write a sentence that interprets this

interval.

e. Consider the statement, “On average, a student

can write about 23 more letters in 15 seconds

with the dominant hand than with the nondominant hand.” What regression equation would accompany this statement? Based on your answers

to parts (b) and (d), explain why this statement is

reasonable.

Refer to the previous exercise about letters written

with the dominant ( y) and nondominant (x) hands.

a. Plot residuals versus x ϭ nondom. What does this

plot indicate about conditions for using the linear

regression model?

b. Create a histogram of the residuals. What does this

plot indicate about the conditions for using the

linear regression model?

Use the bears-female dataset for this exercise.

Weights (lb) and chest girths (in.) are given for n ϭ

19 female wild bears. The corresponding variable

names are Weight and Chest.

a. Plot y ϭ Weight versus x ϭ Chest. Describe the important features of the plot.

b. Compute a simple linear regression equation for

y ϭ Weight and x ϭ Chest.

c. What is the value of R 2 for this relationship? Write

a sentence interpreting this value.

d. What is the predicted weight for a bear with a chest

girth of 40 inches?

e. Compute a 95% prediction interval for the weight

of a bear with a chest girth of 40 inches. Write a

sentence interpreting this interval.

f. Compute a 95% confidence interval for the mean

weight of bears with a chest girth of 40 inches.

Write a sentence interpreting this interval.

Use the dataset heightfoot for this exercise. Heights

(in.) and foot lengths (cm) are given for 33 men.

a. Plot y ϭ height versus x ϭ foot length. What important features are evident in the plot?

b. Omit any outliers evident in the plot in part (a),

and compute the linear regression line for predicting y ϭ height from x ϭ foot length.

c. Determine a 90% prediction interval for the height

of a man whose foot length is 28 centimeters.

d. Discuss whether height can be accurately predicted from foot length. Use regression results to

justify your answer.

e. For the data used for part (b), plot residuals versus x ϭ foot length. What does this plot indicate

about the necessary conditions for doing a linear

regression?

Refer to Exercise 14.50 about the relationship between height and foot length.

a. Do not omit any outliers. Use the complete

dataset to determine a 90% prediction interval



Bold-numbered exercises answered in the back



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

Tải bản đầy đủ ngay(0 tr)

×