Tải bản đầy đủ - 0 (trang)
ACTIVITY 13.1: Are Tall Women from “Big” Families?

ACTIVITY 13.1: Are Tall Women from “Big” Families?

Tải bản đầy đủ - 0trang

Summary of Key Concepts and Formulas



661



Summary of Key Concepts and Formulas

TERM OR FORMULA



COMMENT



Simple linear regression model, y 5 a 1 bx 1 e



This model assumes that there is a line with slope b and

y intercept a, called the population regression line, such

that an observation deviates from the line by a random

amount e. The random deviation is assumed to have a

normal distribution with mean zero and standard deviation s, and random deviations for different observations

are assumed to be independent of one another.



Estimated regression line, y^ 5 a 1 bx



The least-squares line introduced in Chapter 5.



se 5

sb 5



SSResid

Å n22

se



The point estimate of the standard deviation s, with associated degrees of freedom n 2 2.

The estimated standard deviation of the statistic b.



"Sxx



b 6 1t critical value2 sb



t5



A confidence interval for the slope b of the population

regression line, where the t critical value is based on

n 2 2 degrees of freedom.



b 2 hypothesized value

sb



Model utility test, with test statistic t 5



The test statistic for testing hypotheses about b. The test

is based on n 2 2 degrees of freedom.

b

sb



A test of H0: b 5 0, which asserts that there is no useful

linear relationship between x and y, versus Ha: b ϶ 0, the

claim that there is a useful linear relationship.



Residual analysis



Methods based on the residuals or standardized residuals

for checking the assumptions of a regression model.



Standardized residual



A residual divided by its standard deviation.



Standardized residual plot



A plot of the (x, standardized residual) pairs. A pattern in

this plot suggests that the simple linear regression model

may not be appropriate.



sa1bx* 5 se



1x* 2 x2 2

1

1

Ån

Sxx



The estimated standard deviation of the statistic

a 1 bx*, where x* denotes a particular value of x.



a 1 bx* 6 1t critical value2 sa1bx*



A confidence interval for a 1 bx*, the mean value of y

when x 5 x*.



a 1 bx* 6 1t critical value2 "s 2e 1 s 2a1bx*



A prediction interval for a single y value to be observed

when x 5 x*.



Population correlation coefficient r



A measure of the extent to which the x and y values in an

entire population are linearly related.



t5



r

1 2 r2

Ån22



The test statistic for testing H0: r 5 0, according to which

(assuming a bivariate normal population distribution) x

and y are independent of one another.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



662



Chapter 13 Simple Linear Regression and Correlation: Inferential Methods



Chapter Review Exercises 13.56 - 13.70

13.56 The effects of grazing animals on grasslands have

been the focus of numerous investigations by ecologists.

One such study, reported in “The Ecology of Plants,



Large Mammalian Herbivores, and Drought in Yellowstone National Park” (Ecology [1992]: 2043–2058),

proposed using the simple linear regression model to relate y 5 green biomass concentration (g/cm 3) to x 5

elapsed time since snowmelt (days).

a. The estimated regression equation was given as y^ 5

106.3 2 .640x. What is the estimate of average

change in biomass concentration associated with a

1-day increase in elapsed time?

b. What value of biomass concentration would you

predict when elapsed time is 40 days?

c. The sample size was n 5 58, and the reported value

of the coefficient of determination was .470. Does

this suggest that there is a useful linear relationship

between the two variables? Carry out an appropriate

test.



13.57 A random sample of n 5 347 students was selected, and each one was asked to complete several questionnaires, from which a Coping Humor Scale value x

and a Depression Scale value y were determined (“Depression and Sense of Humor,” (Psychological Reports

[1994]: 1473–1474). The resulting value of the sample correlation coefficient was 2.18.

a. The investigators reported that P-value , .05. Do

you agree?

b. Is the sign of r consistent with your intuition? Explain.

(Higher scale values correspond to more developed

sense of humor and greater extent of depression.)

c. Would the simple linear regression model give accurate predictions? Why or why not?



13.58 Example 13.4 gave data on x 5 treadmill run

time to exhaustion and y 5 20-km ski time for a sample

of 11 biathletes. Use the accompanying Minitab output

to answer the following questions.

The regression equation is

ski = 88.8 – 2.33tread

Predictor

Coef

Stdev

t-ratio p

Constant

88.796

5.750

15.44 0.000

tread

–2.3335 0.5911 –3.95 0.003

s = 2.188

R-sq = 63.4%

R-sq(adj) = 59.3%



Bold exercises answered in back



Data set available online



Analysis of Variance

Source

Regression

Error

Total



DF

1

9

10



SS

MS

F

74.630 74.630 15.58

43.097 4.789

117.727



a. Carry out a test at significance level .01 to decide

whether the simple linear regression model is useful.

b. Estimate the average change in ski time associated

with a 1-minute increase in treadmill time, and do

so in a way that conveys information about the precision of estimation.

c. Minitab reported that sa1b1102 5 .689. Predict ski

time for a single biathlete whose treadmill time is 10

minutes, and do so in a way that conveys information about the precision of prediction.

d. Minitab also reported that sa1b1112 5 1.029. Why is

x

this larger than sa1b1102?



13.59 A sample of n 5 61 penguin burrows was selected, and values of both y 5 trail length (m) and x 5

soil hardness (force required to penetrate the substrate to

a depth of 12 cm with a certain gauge, in kg) were determined for each one (“Effects of Substrate on the Distribution of Magellanic Penguin Burrows,” The Auk

[1991]: 923–933). The equation of the least-squares line



was y^ 5 11.607 2 1.4187x, and r 2 5 .386.

a. Does the relationship between soil hardness and trail

length appear to be linear, with shorter trails associated with harder soil (as the article asserted)? Carry

out an appropriate test of hypotheses.

b. Using se 5 2.35, x 5 4.5, and g 1x 2 x 2 2 5 250,

predict trail length when soil hardness is 6.0 in a way

that conveys information about the reliability and

precision of the prediction.

c. Would you use the simple linear regression model to

predict trail length when hardness is 10.0? Explain

your reasoning.

The article “Photocharge Effects in Dye Sensitized Ag[Br,I] Emulsions at Millisecond Range Exposures” (Photographic Science and Engineering [1981]:

138–144) gave the accompanying data on x 5 % light



13.60



absorption and y 5 peak photovoltage.

x 4.0 8.7 12.7 19.1 21.4 24.6 28.9 29.8 30.5

y 0.12 0.28 0.55 0.68 0.85 1.02 1.15 1.34 1.29



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Chapter Review Exercises



a.

b.



c.

d.



e.



f.



g.



2

a x 5 179.7   a x 5 4334.41

2

a y 5 7.28  a y 5 7.4028  a xy 5 178.683

Construct a scatterplot of the data. What does it

suggest?

Assuming that the simple linear regression model is

appropriate, obtain the equation of the estimated

regression line. y^ 5 20.08259 1 0.4464x

How much of the observed variation in peak photovoltage can be explained by the model relationship?

Predict peak photovoltage when percent absorption

is 19.1, and compute the value of the corresponding

residual.

The authors claimed that there is a useful linear relationship between the two variables. Do you agree?

Carry out a formal test.

Give an estimate of the average change in peak photovoltage associated with a 1 percentage point increase in

light absorption. Your estimate should convey information about the precision of estimation.

Give an estimate of mean peak photovoltage when

percentage of light absorption is 20, and do so in a

way that conveys information about precision.



13.61

Reduced visual performance with increasing

age has been a much-studied phenomenon in recent

years. This decline is due partly to changes in optical

properties of the eye itself and partly to neural degeneration throughout the visual system. As one aspect of this

problem, the article “Morphometry of Nerve Fiber

Bundle Pores in the Optic Nerve Head of the Human”

(Experimental Eye Research [1988]: 559–568) presented the accompanying data on x 5 age and y 5 percentage of the cribriform area of the lamina scleralis occupied by pores.

x

y



22

75



25

62



27

50



39

49



42

54



43

49



44

59



46

47



x

y



48

52



50

58



57

49



58

52



63

49



63

31



74

42



74

41



46

54



a. Suppose that prior to this study the researchers had

believed that the average decrease in percentage area

associated with a 1-year age increase was .5%. Do the

data contradict this prior belief? State and test the appropriate hypotheses using a .10 significance level.

b. Estimate true average percentage area covered by

pores for all 50-year-olds in the population in a way

that conveys information about the precision of

estimation.

Bold exercises answered in back



Data set available online



663



Occasionally an investigator may wish to compute a confidence interval for a, the y intercept of the

true regression line, or test hypotheses about a. The estimated y intercept is simply the height of the estimated

line when x 5 0, since a 1 b(0) 5 a. This implies that

sa the estimated standard deviation of the statistic a, results from substituting x* 5 0 in the formula for sa1bx*.

The desired confidence interval is then



13.62



a 6 1t critical value2 sa

and a test statistic is

t5



a 2 hypothesized value

sa



a. The article “Comparison of Winter-Nocturnal Geo-



stationary Satellite Infrared-Surface Temperature

with Shelter-Height Temperature in Florida” (Remote Sensing of the Environment [1983]: 313–327)

used the simple linear regression model to relate surface temperature as measured by a satellite (y) to actual air temperature (x) as determined from a thermocouple placed on a traversing vehicle. Selected data are

given (read from a scatterplot in the article).

x

22

21

0

1

2

3

4

y 23.9 22.1 22.0 21.2 0.0 1.9 0.6

x

5

6

7

y

2.1

1.2

3.0

Estimate the population regression line.

b. Compute the estimated standard deviation sa. Carry

out a test at level of significance .05 to see whether

the y intercept of the population regression line differs from zero.

c. Compute a 95% confidence interval for a. Does the

result indicate that a 5 0 is plausible? Explain.

In some studies, an investigator has n (x, y)

pairs sampled from one population and m (x, y) pairs

from a second population. Let b and br denote the

slopes of the first and second population lines, respectively, and let b and bЈ denote the estimated slopes calculated from the first and second samples, respectively. The

investigator may then wish to test the null hypothesis

H0: b 2 br 5 0 (that is, b 5 br) against an appropriate

alternative hypothesis. Suppose that s 2, the variance

about the population line, is the same for both populations. Then this common variance can be estimated by



13.63



s2 5



SSR



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



664



Chapter 13 Simple Linear Regression and Correlation: Inferential Methods



where SSResid and SSResid9 are the residual sums of

squares for the first and second samples, respectively. With

Sxx and Srxx denoting the quantity g 1x 2 x 2 2 for the first

and second samples, respectively, the test statistic is

t5



b 2 br

s2

s2

1

Å Sxx

Srxx



When H0 is true, this statistic has a t distribution based

on 1n 1 m 2 42 df .

The data below are a subset of the data in the article



“Diet and Foraging Model of Bufa marinus and Leptodactylus ocellatus” (Journal of Herpetology [1984]:

138–146). The independent variable x is body length

(cm) and the dependent variable y is mouth width (cm),

with n 5 9 observations for one type of nocturnal frog

and m 5 8 observations for a second type. Carry out a

test to determine if the slopes of the true regression lines

for the two different frog populations are equal. Use a

sigificance level of .05. (Summary statistics are also given

in the accompanying table.)

Leptodactylus ocellatus

x

y



3.8 4.0 4.9 7.1 8.1 8.5 8.9 9.1 9.8

1.0 1.2 1.7 2.0 2.7 2.5 2.4 2.9 3.2



Bufa marinus

x

y



Data Set

Variable



3.8

1.6



4.3

1.7



6.2

2.3



6.3

2.5



7.8

3.2



8.5

3.0



9.0

3.5



Leptodactylus



Bufa



9

64.2

500.78

19.6

47.28

153.36



8

55.9

425.15

21.6

62.92

163.36



Sample size:

gx

gx2

gy

gy 2

gxy



10.0

3.8



1

y



2

y



3

y



4

x



4

y



11.0

14.0

6.0

4.0

12.0

7.0

5.0



8.33

9.96

7.24

4.26

10.84

4.82

5.68



9.26

8.10

6.13

3.10

9.13

7.26

4.74



7.81

8.84

6.08

5.39

8.15

6.42

5.73



8.0

8.0

8.0

19.0

8.0

8.0

8.0



8.47

7.04

5.25

12.50

5.56

7.91

6.89



For each of these data sets, the values of the summary

quantities x, y, g 1x 2 x 2 2, and g 1x 2 x 2 1 y 2 y 2 are

identical, so all quantities computed from these will be

identical for the four sets: the estimated regression line,

SSResid, se, r 2, and so on. The summary quantities provide no way of distinguishing among the four data sets.

Based on a scatterplot for each set, comment on the

appropriateness or inappropriateness of fitting the simple

linear regression model in each case.



13.65 The accompanying scatterplot, based on 34 sediment samples with x 5 sediment depth (cm) and y 5 oil

and grease content (mg/kg), appeared in the article



“Mined Land Reclamation Using Polluted Urban Navigable Waterway Sediments” (Journal of Environmental Quality [1984]: 415–422). Discuss the effect that the

observation (20, 33,000) will have on the estimated regression line. If this point were omitted, what can you

say about the slope of the estimated regression line?

What do you think will happen to the slope if this observation is included in the computations?

Oil and grease

(mg/kg)



32,000

28,000



Consider the following four (x, y) data sets: the

first three have the same x values, so these values are

listed only once (from “Graphs in Statistical Analysis,”

American Statistician [1973]: 17–21).



13.64



Data Set

Variable



1–3

x



1–3

x



1

y



2

y



3

y



4

x



4

y



10.0

8.0

13.0

9.0



8.04

6.95

7.58

8.81



9.14

8.14

8.74

8.77



7.46

6.77

12.74

7.11



8.0

8.0

8.0

8.0



6.58

5.76

7.71

8.84



24,000

20,000

16,000

12,000

8,000

4,000

0



30



60

90

120

150

Subsample mean depth (cm)



180



(continued)



Bold exercises answered in back



Data set available online



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



665



Chapter Review Exercises



13.66



The article “Improving Fermentation Produc-



600



on y 5 glucose concentration (g/L) and x 5 fermentation time (days) for a blend of malt liquor.

x

y



1

74



2

54



3

52



4

51



5

52



6

53



7

58



8

71



a. Use the data to calculate the estimated regression

line. y^ 5 57.964 1 0.0357x

b. Do the data indicate a linear relationship between y

and x? Test using a .10 significance level.

c. Using the estimated regression line of Part (a), compute the residuals and construct a plot of the residuals versus x (that is, of the (x, residual) pairs).

d. Based on the plot in Part (c), do you think that the

simple linear regression model is appropriate for describing the relationship between y and x? Explain.



13.67 The employee relations manager of a large company was concerned that raises given to employees during a recent period might not have been based strictly on

objective performance criteria. A sample of n 5 20 employees was selected, and the values of x, a quantitative

measure of productivity, and y, the percentage salary increase, were determined for each one. A computer package was used to fit the simple linear regression model,

and the resulting output gave the P-value 5 .0076 for

the model utility test. Does the percentage raise appear

to be linearly related to productivity? Explain.



Biomass (g/m2)



tivity with Reverse Osmosis” (Food Technology [1984]:

92–96) gave the following data (read from a scatterplot)

400



200



0

0



5



10

Soil depth (cm)



15



20



13.69 Give a brief answer, comment, or explanation

for each of the following.

a. What is the difference between e1, e2, . . . , en and the

n residuals?

b. The simple linear regression model states that

y 5 a 1bx.

c. Does it make sense to test hypotheses about b?

d. SSResid is always positive.

e. A student reported that a data set consisting of

n 5 6 observations yielded residuals 2, 0, 5, 3, 0,

and 1 from the least-squares line.

f. A research report included the following summary

quantities obtained from a simple linear regression

analysis:

2

2

a 1 y 2 y 2 5 615   a 1 y 2 y^ 2 5 731



13.68 The figure at the top of the page is based on data

from the article “Root and Shoot Competition Intensity Along a Soil Depth Gradient” (Ecology [1995]:

673–682). It shows the relationship between aboveground biomass and soil depth within the experimental

plots. The relationship is described by the estimated regression equation: biomass 5 29.85 1 25.29(soil depth)

and r 2 5 .65; P , 0.001; n 5 55. Do you think the

simple linear regression model is appropriate here? Explain. What would you expect to see in a plot of the

standardized residuals versus x?



Bold exercises answered in back



Data set available online



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



666



Chapter 13 Simple Linear Regression and Correlation: Inferential Methods



Cumulative Review Exercises CR13.1-CR13.18

CR13.1 The article “You Will Be Tested on This” (The

Chronicle of Higher Education, June 8, 2007) describes

an experiment to investigate the effect of quizzes on student learning. The goal of the experiment was to determine if students who take daily quizzes have better endof-semester retention than students who attend the same

lectures and complete the same homework assignments

but who do not take the daily quizzes. Describe how you

would design such an experiment using the 400 students

enrolled in an introductory psychology course as

subjects.



CR13.2 The paper “Pistachio Nut Consumption and

Serum Lipid Levels” (Journal of the American College

of Nutrition [2007]: 141–148) describes a study to determine if eating pistachio nuts can have an effect on blood

cholesterol levels in people with high cholesterol. Fifteen

subjects followed their regular diet for 4 weeks and then

followed a diet in which 15% of the daily caloric intake

was from pistachio nuts for 4 weeks. Total blood cholesterol was measured for each subject at the end of each of

the 2 four-week periods, resulting in two samples (one

for the regular diet and one for the pistachio diet).

a. Are the two samples independent or paired?

Explain.

b. The mean difference in total cholesterol (regular

diet—pistachio diet) was 11 mg/dL. The standard

deviation of the differences was 24 mg/dL. Assume

that it is reasonable to regard the 15 study participants as representative of adults with high cholesterol and that total cholesterol differences are approximately normally distributed. Do the data

support the claim that eating the pistachio diet for

4 weeks is effective in reducing total cholesterol

level? Test the relevant hypotheses using a 5 .01.

The article “Fines Show Airline Problems”

(USA Today, February 2, 2010) gave the accompanying

data on the number of fines for violating FAA maintenance regulations assessed against each of the 25 U.S.

airlines from 2004 to 2009.



CR13.3



1

4

2



12

10

2



3

6

1



7

2

0



23

2

0



36

2



6

2



14

2



1

3



3

1



a. Construct a boxplot of these data. Are any of the observations in the data set outliers? If so, which ones?

Bold exercises answered in back



b. Explain why it may not be reasonable to assume that

the two airlines with the highest number of fines assessed are the worst airlines in terms of maintenance

violations.



CR13.4 The article “Odds Are, It’s Wrong” (Science

News, March 27, 2010) poses the following scenario:

Suppose that a test for steroid use among baseball

players is 95% accurate—that is, it correctly identifies actual steroid users 95% of the time, and misidentifies non-users as users 5 percent of the

time. . . . Now suppose, based on previous testing,

that experts have established that about 5 percent

of professional baseball players use steroids.

Answer the following questions for this scenario.

a. If 400 professional baseball players are selected at

random, how many would you expect to be steroid

users and how many would you expect to be

non-users?

b. How many of the steroid users would you expect to

test positive for steroid use?

c. How many of the players who do not use steroids

would you expect to test positive for steroid use (a

false positive)?

d. Use your answers to Parts (b) and (c) to estimate the

proportion of those who test positive for steroid use

who actually do use steroids.

e. Write a few sentences explaining why, in this scenario, the proportion of those who test positive for

steroid use who actually use steroids is not .95.



CR13.5 The press release “Luxury or Necessity? The

Public Makes a U-Turn” (Pew Research Center, April

23, 2009) summarizes results from a survey of a nationally representative sample of n 5 1003 adult Americans.

a. One question in the survey asked participants if they

think of a landline phone as a necessity or as a luxury

that they could do without. Sixty-eight percent said

they thought a landline phone was a necessity. Estimate the proportion of adult Americans who view a

landline phone as a necessity using a 95% confidence interval.

b. In the same survey, 52% said they viewed a television set as a necessity. Is there convincing evidence

that a majority of adult Americans view a television

set as a necessity? Test the relevant hypotheses using

a 5 .05.



Data set available online



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Cumulative Review Exercises



c. The press release also described a survey conducted in

2003. When asked about a microwave oven, 68% of

the 2003 sample regarded a microwave oven as a necessity, whereas only 47% of the 2009 sample said

they thought a microwave oven was a necessity. Assume that the sample size for the 2003 survey was also

1003. Is there convincing evidence that the proportion of adult Americans who regard a microwave oven

as a necessity decreased between 2003 and 2009? Test

the appropriate hypotheses using a 5 .01.



CR13.6 The accompanying graphical display appeared

in USA Today (February 19, 2010). It is meant to be a

pie chart, but an oval rather than a circle is used to represent the whole pie. Do you think this graph does a

good job of conveying the proportion falling into each of

the three response categories? Explain why or why not.



USA TODAY Snapshots®



Number of Checked Bags



Before fees

After Fees



0



1



2 or more



7

22



70

64



23

14



Consider the following data on y 5 number

of songs stored on an MP3 player and x 5 number of

months the user has owned the MP3 player for a sample

of 15 owners of MP3 players.



Since they entered

the profession,

care has:

Improved



26%



19%

Declined



55%

Source: AMN

Healthcare’s

2010 Survey of

Registered Nurses

By Anne R. Carey and Suzy Parker, USA TODAY



CR13.7 The following quote describing 18- to 29-yearolds is from the article “Study: Millennial Generation

More Educated, Less Employed” (USA Today, February

23, 2010): “38% have a tattoo (and half of those with

tattoos have two to five; 18% have six or more).” These

percentages were based on a representative sample of 830

Americans age 18 to 29, but for purposes of this exercise,

suppose that they hold for the population of all Americans in this age group. Define the random variable x 5

number of tattoos for a randomly selected American age

18 to 29. Find the following probabilities:

a. P 1x 5 02

b. P 1x 5 12

c. P 12 # x # 52

d. P 1x . 52

Bold exercises answered in back



CR13.8 To raise revenues, many airlines now charge

fees to check luggage. Suppose that the number of

checked bags was recorded for each person in a random

sample of 100 airline passengers selected before fees were

imposed and also for each person in a random sample of

100 airline passengers selected after fees were imposed,

resulting in the accompanying data. Do the data provide

convincing evidence that the proportions in each of the

number of checked bags categories is not the same before

and after fees were imposed? Test the appropriate hypotheses using a significance level of .05.



CR13.9



What nurses say about

nursing care today



Remained

the same



667



Data set available online



x



y



23

35

2

28

5

32

23

10

4

26

1

8

13

9

5



486

747

81

581

117

728

445

128

61

476

35

121

266

126

141



a. Construct a scatterplot of the data. Does the relationship between x and y look approximately linear?

b. What is the equation of the estimated regression line?

c. Do you think that the assumptions of the simple

linear regression model are reasonable? Justify your

answer using appropriate graphs.

d. Is the simple linear regression model useful for describing the relationship between x and y? Test the

relevant hypotheses using a significance level of .05.

Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



668



Chapter 13 Simple Linear Regression and Correlation: Inferential Methods



CR13.10 Many people take ginkgo supplements advertised to improve memory. Are these over-the-counter

supplements effective? In a study reported in the paper



“Ginkgo for Memory Enhancement” (Journal of the

American Medical Association [2002]: 835–840),

elderly adults were assigned at random to either a treatment

group or a control group. The 104 participants who were

assigned to the treatment group took 40 mg of ginkgo

three times a day for 6 weeks. The 115 participants assigned to the control group took a placebo pill three times

a day for 6 weeks. At the end of 6 weeks, the Wechsler

Memory Scale (a test of short-term memory) was administered. Higher scores indicate better memory function.

Summary values are given in the following table.

Ginkgo

Placebo



n



x



s



104

115



5.6

5.5



.6

.6



Based on these results, is there evidence that taking 40

mg of ginkgo three times a day is effective in increasing

mean performance on the Wechsler Memory Scale? Test

the relevant hypotheses using a 5 .05.

The Harvard University Institute of Politics

surveys undergraduates across the United States annually. Responses to the question “When it comes to voting, do you consider yourself to be affiliated with the

Democratic Party, the Republican Party, or are you Independent or unaffiliated with a major party?” for the

survey conducted in 2003, 2004, and 2005 are summarized in the given table. The samples for each year were

independently selected and are considered to be representative of the population of undergraduate students in

the year the survey was conducted. Is there evidence that

the distribution of political affiliation is not the same for

all three years for which data are given?



time, most of the time, some of the time, or never. Use

the data in the given table and an appropriate hypothesis

test to determine if there is evidence that trust in the

President was not the same in 2005 as it was in 2002.

Year

Response



Political Affiliation

Democrat

Republican

Independent/

unaffiliated

Other



2005



2004



2003



397

301

458



409

349

397



325

373

457



60



48



48



The survey described in the previous exercise

also asked the following question: “Please tell me whether

you trust the President to do the right thing all of the



CR13.12



Bold exercises answered in back



Data set available online



2002



132

337

554

169



180

528

396

96



All of the time

Most of the time

Some of the time

Never



The report “Undergraduate Students and

Credit Cards in 2004” (Nellie Mae, May 2005) included

information collected from individuals in a random sample

of undergraduate students in the United States. Students

were classified according to region of residence and whether

or not they have one or more credit cards, resulting in the

accompanying two-way table. Carry out a test to determine

if there is evidence that region of residence and having a

credit card are not independent. Use a 5 .05.



CR13.13



Credit Card?



CR13.11



Year



2005



Region

Northeast

Midwest

South

West



At Least One

Credit Card



No Credit Cards



401

162

408

104



164

36

115

23



CR13.14

The report described in the previous exercise

also classified students according to region of residence

and whether or not they had a credit card with a balance

of more than $7000. Do these data support the conclusion that there is an association between region of residence and whether or not the student has a balance exceeding $7000? Test the relevant hypotheses using a .01

significance level.

Balance Over $7000?

Region



No



Yes



Northeast

Midwest

South

West



28

162

42

9



537

182

481

118



Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Cumulative Review Exercises



CR13.15 The discharge of industrial wastewater into

rivers affects water quality. To assess the effect of a particular power plant on water quality, 24 water specimens

were taken 16 km upstream and 4 km downstream of the

plant. Alkalinity (mg/L) was determined for each specimen, resulting in the summary quantities in the accompanying table. Do the data suggest that the true mean

alkalinity is higher downstream than upstream by more

than 50 mg/L? Use a .05 significance level.

Location



n



Mean



Upstream

Downstream



24

24



75.9

183.6



Standard

Deviation

1.83

1.70



CR13.16

The report of a European commission on

radiation protection titled “Cosmic Radiation Exposure

of Aircraft Crew” (2004) measured the exposure to radiation on eight international flights from Madrid using

several different methods for measuring radiation. Data

for two of the methods are given in the accompanying

table. Use these data to test the hypothesis that there is

no significant difference in mean radiation measurement

for the two methods.

Flight



Method 1



Method 2



1

2

3

4

5

6

7

8



27.5

41.3

3.5

24.3

27.0

17.7

12.0

20.9



34.4

38.6

3.5

21.9

24.4

21.4

11.8

24.1



CR13.17 It is hypothesized that when homing pigeons

are disoriented in a certain manner, they will exhibit no

Bold exercises answered in back



Data set available online



669



preference for any direction of flight after takeoff. To test

this, 120 pigeons are disoriented and released, and the

direction of flight of each is recorded. The resulting data

are given in the accompanying table.

Direction

08 to , 458

458 to , 908

908 to , 1358

1358 to , 1808

1808 to , 2258

2258 to , 2708

2708 to , 3158

3158 to , 3608



Frequency

12

16

17

15

13

20

17

10



Use the goodness-of-fit test with significance level .10 to

determine whether the data are consistent with this hypothesis.



CR 13.18 The authors of the paper “Inadequate Physician Knowledge of the Effects of Diet on Blood Lipids

and Lipoproteins” (Nutrition Journal [2003]: 19–26)

summarize the responses to a questionnaire on basic

knowledge of nutrition that was mailed to 6000 physicians selected at random from a list of physicians licensed

in the United States. Sixteen percent of those who received the questionnaire completed and returned it. The

authors report that 26 of 120 cardiologists and 222 of

419 internists did not know that carbohydrate was the

diet component most likely to raise triglycerides.

a. Estimate the difference between the proportion of

cardiologists and the proportion of internists who

did not know that carbohydrate was the diet component most likely to raise triglycerides using a 95%

confidence interval.

b. What potential source of bias might limit your ability to generalize the estimate from Part (a) to the

populations of all cardiologists and all internists?

Video Solution available



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



CHAPTER



14



Multiple

Regression

Analysis



© David Zimmerman/Getty Images



The general objective of regression analysis is to model the

relationship between a dependent variable y and one or

more independent (i.e., predictor or explanatory) variables.

The simple linear regression model y ϭ a ϩ bx ϩ e, discussed in Chapter 13, has been used successfully by many

investigators in a wide variety of disciplines to relate y to a

single predictor variable x. In many situations, the relationship between y and any single predictor variable is not

strong, but knowing the values of several independent variables may considerably reduce uncertainty about the associated y value. For example, some variation in house prices

in a large city can certainly be attributed to house size, but

knowledge of size by itself would not usually enable a bank

appraiser to accurately predict a home’s value. Price is also determined to some extent

by other variables, such as age, lot size, number of bedrooms and bathrooms, and

distance from schools.

In this chapter, we extend the regression methodology developed in the previous

chapter to multiple regression models, which include at least two predictor variables.



Make the most of your study time by accessing everything you need to succeed

online with CourseMate.

Visit http://www.cengagebrain.com where you will find:

• An interactive eBook, which allows you to take notes, highlight, bookmark, search















the text, and use in-context glossary definitions

Step-by-step instructions for Minitab, Excel, TI-83/84, SPSS, and JMP

Video solutions to selected exercises

Data sets available for selected examples and exercises

Online quizzes

Flashcards

Videos



671

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

ACTIVITY 13.1: Are Tall Women from “Big” Families?

Tải bản đầy đủ ngay(0 tr)

×