Tải bản đầy đủ - 0 (trang)
5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

# 5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

Tải bản đầy đủ - 0trang

Inference About Simple Regression

20

Residual

14.25 Example 2.17 (p. 49) described the heights of a

sample of married British women. The ages of the

women and their husbands were available in the

same dataset for n ϭ 170 couples. For the sample,

the linear regression line relating y ϭ husband’s age

and x ϭ wife’s age is yˆ ϭ 3.59 ϩ 0.967x. The following

results computed with Minitab give a conﬁdence interval and prediction interval for husband’s age when

wife’s age is 40:

629

10

0

–10

SE Fit

0.313

95.0% CI

(41.641, 42.875)

95.0% PI

(34.200, 50.316)

a. Verify that the “Fit” is consistent with the predicted value that would be given by the regression

equation.

b. Interpret the “95% CI” given by Minitab. Be speciﬁc about what the interval estimates.

c. Interpret the “95% PI” given by Minitab. Be speciﬁc about what the interval estimates.

d. Explain why the “95% PI” is much wider than the

“95% CI.”

14.26 Refer to Exercise 14.25. Using the general format for a

95% conﬁdence interval, verify the conﬁdence interval for the mean given by Minitab. Notice that the

standard error of the “Fit” is given.

14.27 Suppose a linear regression analysis of the relationship between y ϭ systolic blood pressure and x ϭ age

is done for women between 40 and 60 years old. For

women who are 45 years old, a 90% conﬁdence interval for E(Y ) is determined to be 128.2 to 131.3. Explain why it is incorrect to conclude that about 90% of

women who are 45 years old have a systolic blood

pressure in this range. Write a sentence that correctly

interprets the interval.

45

55

85

75

65

Pulse before marching

95

Figure for Exercise 14.29

14.30 ● ◆ The ﬁgure for this exercise is a histogram of the

residuals for a linear regression relating y ϭ height

(in.) and x ϭ foot length (cm) for a sample of college

men. Discuss what the histogram indicates about

Conditions 2 and 4 for linear regression listed at the

beginning of Section 14.5. (Data source: heightfoot

dataset on the CD for this book.)

10

Frequency

Fit

42.258

5

0

–5

0

5

15

10

Residual

14.28 ● There are ﬁve conditions listed at the beginning of

Section 14.5 that should be at least approximately

true for linear regression. Which of the conditions

can be checked by using each of the following methods? In each case, list all of them that can be checked.

a. Drawing a histogram of the residuals.

b. Drawing a scatterplot of the residuals versus the

x values.

c. Learning how the data were collected.

d. Drawing a scatterplot of the raw data, y versus x.

14.29 ● ◆ Refer to Exercise 14.24 about a linear regression

for y ϭ pulse after marching in place and x ϭ pulse

before marching in place. The ﬁgure for this exercise

is a plot of residuals versus the pulse before marching

for a sample of 40 students. Discuss what the plot indicates about Conditions 1, 2, and 3 for linear regression listed at the beginning of Section 14.5.

● Basic skills

◆ Dataset available but not required

14.31 The ﬁgure accompanying this exercise is a histogram

of the residuals for a simple linear regression. What

does this plot indicate about the necessary conditions for doing a linear regression? Be speciﬁc about

which of the ﬁve necessary conditions are veriﬁed in

this ﬁgure.

20

Frequency

Section 14.5

10

0

–4

–3

–2

–1

1

0

Residual

2

Bold-numbered exercises answered in the back

3

4

630

Chapter 14

14.32 Plot residuals versus x for the data given in Exercise 14.13.

14.33 ◆ Observed data along with the sample regression

line for the relationship between body weight (lb.)

and neck girth (in.) in 19 female bears of various ages

are shown in the ﬁgure below. (Note: The data are in

the dataset bears-female on the CD for this book.)

Weight = –158.8 + 16.95 Neck girth

s = 40.1264 R 2 = 79.3%

350

300

Chapter Exercises

250

Weight (lb)

a. Which of the ﬁve necessary conditions for linear

regression appears to be violated in this situation?

b. What corrective action would you take to correctly

estimate the connection between stopping distance and vehicle speed?

c. Make a sketch that illustrates the pattern of a plot

of residuals versus speed for the data shown in the

ﬁgure in this exercise. Your sketch does not need to

be numerically accurate but should correctly show

the pattern of the plot.

14.35 Data for y ϭ hours of sleep the previous day and x ϭ

hours of studying the previous day for n ϭ 116 college

students were shown in Figure 5.14 (p. 168) and described in Example 5.14. Some regression results for

those data are as follows:

200

150

100

50

0

10

20

Neck girth (in.)

30

a. Which of the necessary conditions for linear regression appears to be violated in this dataset?

b. What corrective actions would you consider in order to properly estimate the relationship between

body weight and neck girth?

c. Sketch a histogram that illustrates the pattern of

the distribution of the residuals for this problem.

Your sketch does not have to be accurate in the

numerical details but should correctly show the

pattern.

14.34 The ﬁgure for this exercise shows data for the relationship between the average stopping distance (ft)

of a car when the brakes are applied and vehicle

speed (mph). The regression line for these data is also

shown on the plot. (Note: The raw data were given in

Exercise 5.7.)

Distance = –44.2 + 5.67 Speed

s = 34.2 R 2 = 95.1%

400

Speed (ft)

300

200

100

0

0

10

● Basic skills

20

30

40

Distance (mph)

50

60

70

◆ Dataset available but not required

The regression equation is

Sleep = 7.56 – 0.269 Study

Predictor

Constant

study

S = 1.509

Coef

7.5555

Ϫ0.26917

R-Sq = 12.7%

SE Coef

0.2239

0.06616

T

33.74

——

P

0.000

0.000

a. What is the estimated mean decrease in hours of

sleep per 1-hour increase in hours of studying?

What notation is used for this value?

b. Calculate an approximate 95% conﬁdence interval

for b1. Write a sentence that interprets this interval.

c. Using proper statistical notation, write null and

alternative hypotheses for assessing the statistical

signiﬁcance of the relationship.

d. We omitted from the output the t-statistic for testing the hypotheses of part (c). Compute the value

of this t-statistic using other information shown

in the output. What are the degrees of freedom for

this t-statistic?

e. Is there a statistically signiﬁcant relationship between hours of sleep and hours of studying? Justify

your answer on the basis of information shown in

the output.

14.36 Refer to Exercise 14.35 about hours of sleep and hours

of study.

a. What is the value of the standard deviation from

the regression line? Write a sentence that interprets this value.

b. Calculate the predicted value of hours of sleep the

previous day for a student who studied 4 hours the

previous day.

c. Using the Empirical Rule, determine an interval

that describes hours of sleep for approximately

95% of students who studied 4 hours the previous day.

Bold-numbered exercises answered in the back

631

Inference About Simple Regression

d. The value of R 2 is given as 12.7%. Write a sentence

that interprets this value.

14.37 Refer to Exercises 14.35 and 14.36 about hours of

sleep and hours spent studying. What is the intercept

of the regression line? Does this value have a useful

interpretation in the context of this problem? If so,

what is the interpretation? If not, why not?

14.38 Regression results for the relationship between y ϭ

hours of sleep the previous day and x ϭ hours

spent studying the previous day were given in Exercise 14.35. The ﬁgure for this exercise is a plot of

residuals versus hours spent studying. What does the

plot indicate about the necessary conditions for doing a linear regression? Be speciﬁc about which of the

ﬁve conditions given in Section 14.5 are veriﬁed by

this plot.

x ‫ ؍‬Quizzes

y ‫ ؍‬Exam

80

72

68

71

94

96

72

77

74

82

83

72

56

58

68

83

65

78

75

80

88

92

a. Plot the data, and describe the important features

of this plot.

b. Using statistical software, calculate the regression

line for this sample.

c. What is the predicted midterm exam score for a

student with a quiz average equal to 75?

d. With statistical software, determine a 50% prediction interval for the midterm exam score of a student whose quiz average was 75.

14.42 ◆ This exercise refers to the following Minitab output, relating y ϭ son’s height to x ϭ father’s height for

a sample of n ϭ 76 college males. (Note: The data are

in the dataset UCDavis1 on the CD for this book.)

4

3

The regression equation is

Height = 30.0 + 0.576 dadheight

Residual

2

1

76 cases used 3 cases contain missing values

0

Predictor

Constant

Coef

29.981

0.57568

S = 2.657

R-Sq = 44.7%

–1

SE Coef

5.129

0.07445

T

5.85

7.73

P

0.000

0.000

–2

–3

Predicted Values [for dad’s heights of 65, 70, and 74]

–4

0

1

2

3

4

5

6

Hours of study

7

8

9

14.39 Exercise 14.35 gave linear regression results for the

relationship between y ϭ hours of sleep the previous

day and x ϭ hours spent studying the previous day.

Following is Minitab output showing a conﬁdence

interval and a prediction interval for hours of sleep

when hours of studying ϭ 3 hours:

Fit

6.748

SE Fit

0.142

95.0% CI

(6.466, 7.029)

95.0% PI

(3.746, 9.750)

Write down the interval given for predicting the

hours of sleep when hours of studying is 3 hours. Give

two different interpretations of this interval.

14.40 Refer to the output given in Exercise 14.8 about handspan and height. Compute a 90% conﬁdence interval

for the population slope. Write a sentence that interprets this interval.

14.41 The following data are x ϭ average on ﬁve quizzes

before the midterm exam and y ϭ score on the midterm exam for n ϭ 11 students randomly selected

from a multiple-section statistics class of about 950

students:

● Basic skills

◆ Dataset available but not required

Fit

67.400

70.279

75.581

SE Fit

0.415

0.318

0.494

95.0% CI

(66.574, 68.226)

(69.645, 70.913)

(71.596, 73.566)

95.0% PI

(62.041, 72.759)

(64.946, 75.612)

(67.195, 77.967)

a. What is the equation for the regression line?

b. Identify the value of the t-statistic for testing

whether or not the slope is 0. Verify that the value

is correct using the formula for the t-statistic and

the information provided by Minitab for the parts

that go into the formula.

c. State and test the hypotheses about whether or not

the population slope is 0. Use relevant information

provided in the output.

d. Compute a 95% conﬁdence interval for b1, the

slope of the relationship in the population. Write a

sentence that interprets this interval.

14.43 ◆ Refer to Exercise 14.42.

a. What is the value of R 2 for the observed linear relationship between height and father’s height? Write

a sentence that interprets this value.

b. What is the value of the correlation coefﬁcient r?

14.44 ◆ Refer to Exercises 14.42 and 14.43. The output provides prediction intervals and conﬁdence intervals

for father’s heights of 65, 70, and 74 inches.

a. Verify that the “Fit” given by Minitab for father’s

height of 65 inches is consistent with the predicted height that would be given by the regression

equation.

Bold-numbered exercises answered in the back

632

Chapter 14

b. Write down the interval Minitab provided for predicting an individual son’s height if his father’s

height is 70 inches. Provide two different interpretations for the interval.

c. Write a sentence that interprets the “95% CI”

given for a son’s height when the father’s height is

74 inches.

d. Explain why the prediction interval is much wider

than the corresponding conﬁdence interval for

each father’s height provided.

e. The accompanying ﬁgure is a plot of the residuals versus father’s height. Which of the ﬁve necessary conditions for regression listed in Section 14.5

appears to be violated? What corrective actions

should you consider?

14.48

Residual

5

0

14.49

–5

–10

55

65

75

14.45 The ﬁve steps for hypothesis testing were given in

Chapters 12 and 13. Describe those steps as they

apply to testing whether there is a relationship between two variables in the simple regression model.

14.46 Explain why rejecting H0: b1 ϭ 0 in a simple linear

regression model does not prove that the relationship is linear. To answer this question, you might ﬁnd

it helpful to consider the ﬁgure in Exercise 14.34,

which shows stopping distance and vehicle speed for

automobiles.

14.50

Dataset Exercises

Datasets are required to solve these

exercises and can be found at http://1pass.thomson.com

or on your CD.

14.47 Use the dataset letters for this exercise. A sample

of 63 students wrote as many letters of the alphabet

in order, as capital letters, as they could in 15 seconds using their dominant hand and then repeated

this task using their nondominant hand. The variables dom and nondom contain the raw data for the

results.

a. Plot y ϭ dom versus x ϭ nondom. Describe the important features of the plot.

b. Compute the simple regression equation for the

relationship. What is the equation?

● Basic skills

◆ Dataset available but not required

14.51

c. What are the values of the standard deviation from

the regression line and R 2? Interpret these values

in the context of this problem.

d. Determine a 95% conﬁdence interval for the population slope. Write a sentence that interprets this

interval.

e. Consider the statement, “On average, a student

can write about 23 more letters in 15 seconds

with the dominant hand than with the nondominant hand.” What regression equation would accompany this statement? Based on your answers

to parts (b) and (d), explain why this statement is

reasonable.

Refer to the previous exercise about letters written

with the dominant ( y) and nondominant (x) hands.

a. Plot residuals versus x ϭ nondom. What does this

plot indicate about conditions for using the linear

regression model?

b. Create a histogram of the residuals. What does this

plot indicate about the conditions for using the

linear regression model?

Use the bears-female dataset for this exercise.

Weights (lb) and chest girths (in.) are given for n ϭ

19 female wild bears. The corresponding variable

names are Weight and Chest.

a. Plot y ϭ Weight versus x ϭ Chest. Describe the important features of the plot.

b. Compute a simple linear regression equation for

y ϭ Weight and x ϭ Chest.

c. What is the value of R 2 for this relationship? Write

a sentence interpreting this value.

d. What is the predicted weight for a bear with a chest

girth of 40 inches?

e. Compute a 95% prediction interval for the weight

of a bear with a chest girth of 40 inches. Write a

sentence interpreting this interval.

f. Compute a 95% conﬁdence interval for the mean

weight of bears with a chest girth of 40 inches.

Write a sentence interpreting this interval.

Use the dataset heightfoot for this exercise. Heights

(in.) and foot lengths (cm) are given for 33 men.

a. Plot y ϭ height versus x ϭ foot length. What important features are evident in the plot?

b. Omit any outliers evident in the plot in part (a),

and compute the linear regression line for predicting y ϭ height from x ϭ foot length.

c. Determine a 90% prediction interval for the height

of a man whose foot length is 28 centimeters.

d. Discuss whether height can be accurately predicted from foot length. Use regression results to

e. For the data used for part (b), plot residuals versus x ϭ foot length. What does this plot indicate

about the necessary conditions for doing a linear

regression?

Refer to Exercise 14.50 about the relationship between height and foot length.

a. Do not omit any outliers. Use the complete

dataset to determine a 90% prediction interval

Bold-numbered exercises answered in the back ### Tài liệu bạn tìm kiếm đã sẵn sàng tải về

5 SD Module 2: Sampling Distribution for the Difference in Two Sample Proportions

Tải bản đầy đủ ngay(0 tr)

×