Tải bản đầy đủ

3 Fitting the Model: The Method of Least Squares

94 Chapter 3 Simple Linear Regression

Figure 3.2 Scatterplot for

data in Table 3.1

Figure 3.3 Visual

straight-line ﬁt to data in

Table 3.1

y

4

3

∼

y = −1 + x

∼

error of prediction = y − y = 2−3 = −1

(or residual)

2

1

0

x

1

2

3

4

5

the deviations (i.e., the differences between the observed and the predicted values

of y). These deviations or errors of prediction, are the vertical distances between

observed and predicted values of y (see Figure 3.3). The observed and predicted

values of y, their differences, and their squared differences are shown in Table 3.2.

Note that the sum of the errors (SE) equals 0 and the sum of squares of the errors

(SSE), which gives greater emphasis to large deviations of the points from the line,

is equal to 2.

By shifting the ruler around the graph, we can ﬁnd many lines for which the sum

of the errors is equal to 0, but it can be shown that there is one (and only one) line

for which the SSE is a minimum. This line is called the least squares line, regression

line, or least squares prediction equation.

To ﬁnd the least squares line for a set of data, assume that we have a sample

of n data points that can be identiﬁed by corresponding values of x and y, say,

(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). For example, the n = 5 data points shown in Table 3.2

Fitting the Model: The Method of Least Squares

95

Table 3.2 Comparing observed and predicted values for the visual model

x

y

Prediction

y˜ = −1 + x

Error of prediction

(y − y)

˜

Squared error

(y − y)

˜ 2

1

1

0

(1 − 0) =

1

1

2

1

1

(1 − 1) =

0

0

3

2

2

(2 − 2) =

0

0

4

2

3

(2 − 3) = −1

1

5

4

4

(4 − 4) =

0

0

Sum of errors (SE) =

0

Sum of squared errors (SSE) = 2

are (1, 1), (2, 1), (3, 2), (4, 2), and (5, 4). The straight-line model for the response y

in terms of x is

y = β0 + β1 x + ε

The line of means is

E(y) = β0 + β1 x

and the ﬁtted line, which we hope to ﬁnd, is represented as

yˆ = βˆ0 + βˆ1 x

The ‘‘hats’’ can be read as ‘‘estimator of.’’ Thus, yˆ is an estimator of the mean value

of y, E(y), and a predictor of some future value of y; and βˆ0 and βˆ1 are estimators

of β0 and β1 , respectively.

For a given data point, say, (xi , yi ), the observed value of y is yi and the predicted

value of y is obtained by substituting xi into the prediction equation:

yˆ i = βˆ0 + βˆ1 xi

The deviation of the ith value of y from its predicted value, called the ith residual, is

(yi − yˆ i ) = [yi − (βˆ0 + βˆ1 xi )]

Then the sum of squares of the deviations of the y-values about their predicted

values (i.e., the sum of squares of residuals) for all of the n data points is

n

SSE =

[yi − (βˆ0 + βˆ1 xi )]2

i=1

The quantities βˆ0 and βˆ1 that make the SSE a minimum are called the least

squares estimates of the population parameters β0 and β1 , and the prediction

equation yˆ = βˆ0 + βˆ1 x is called the least squares line.

Deﬁnition 3.1 The least squares line is one that satisﬁes the following two

properties:

1. SE = (yi − yˆ i ) = 0; i.e., the sum of the residuals is 0.

2. SSE = (yi − yˆ i )2 ; i.e., the sum of squared errors is smaller than for any

other straight-line model with SE = 0.

96 Chapter 3 Simple Linear Regression

The values of βˆ0 and βˆ1 that minimize the SSE are given by the formulas in

the box.∗

Formulas for the Least Squares Estimates

SSxy

SSxx

y-intercept : βˆ0 = y¯ − βˆ1 x¯

Slope : βˆ1 =

where

n

SSxy =

n

(xi − x)(y

¯ i − y)

¯ =

i=1

n

SSxx =

xi yi − nx¯ y¯

i=1

n

(xi − x)

¯ 2=

i=1

xi2 − n(x)

¯ 2

i=1

n = Sample size

Table 3.3 Preliminary computations for the

advertising–sales example

xi

yi

xi2

xi yi

1

1

1

1

2

1

4

2

3

2

9

6

4

2

16

8

5

4

25

20

xi = 15

yi = 10

xi2 = 55

xi yi = 37

Means: x¯ = 3

y¯ = 2

Totals:

Preliminary computations for ﬁnding the least squares line for the advertising–sales example are given in Table 3.3. We can now calculate.†

SSxy =

xi yi − nx¯ y¯ = 37 − 5(3)(2) = 37 − 30 = 7

SSxx =

xi2 − n(x)

¯ 2 = 55 − 5(3)2 = 55 − 45 = 10

Then, the slope of the least squares line is

βˆ1 =

SSxy

7

= .7

=

SSxx

10

∗ Students who are familiar with calculus should note that the values of β and β that minimize SSE =

0

1

(yi − yˆ i )2 are obtained by setting the two partial derivatives ∂SSE/∂β0 and ∂SSE/∂β1 equal to 0. The

solutions to these two equations yield the formulas shown in the box. (The complete derivation is provided in

Appendix A.) Furthermore, we denote the sample solutions to the equations by βˆ0 and βˆ1 , whereas the ‘‘∧’’

(hat) denotes that these are sample estimates of the true population intercept β0 and slope β1 .

† Since summations are used extensively from this point on, we omit the limits on

when the summation

includes all the measurements in the sample (i.e., when the summation is ni=1 , we write .)

Fitting the Model: The Method of Least Squares

97

and the y-intercept is

βˆ0 = y¯ − βˆ1 x¯

= 2 − (.7)(3) = 2 − 2.1 = −.1

The least squares line is then

yˆ = βˆ0 + βˆ1 x = −.1 + .7x

The graph of this line is shown in Figure 3.4.

Figure 3.4 Plot of the least

squares line yˆ = −.1 + .7x

The observed and predicted values of y, the deviations of the y-values about

their predicted values, and the squares of these deviations are shown in Table 3.4.

Note that the sum of squares of the deviations, SSE, is 1.10, and (as we would expect)

this is less than the SSE = 2.0 obtained in Table 3.2 for the visually ﬁtted line.

Table 3.4 Comparing observed and predicted values for the

least squares model

x

y

Predicted

yˆ = −.1 + .7x

1

1

.6

2

1

3

Residual (error)

(y − y)

ˆ

(1 − .6) =

Squared error

(y − y)

ˆ 2

.4

.16

1.3

(1 − 1.3) = −.3

.09

2

2.0

(2 − 2.0) =

0

.00

4

2

2.7

(2 − 2.7) = −.7

.49

5

4

3.4

(4 − 3.4) =

.36

.6

Sum of errors (SE) = 0

SSE = 1.10

The calculations required to obtain βˆ0 , βˆ1 , and SSE in simple linear regression,

although straightforward, can become rather tedious. Even with the use of a

98 Chapter 3 Simple Linear Regression

calculator, the process is laborious and susceptible to error, especially when the

sample size is large. Fortunately, the use of statistical computer software can

signiﬁcantly reduce the labor involved in regression calculations. The SAS, SPSS,

and MINITAB outputs for the simple linear regression of the data in Table 3.1 are

displayed in Figure 3.5a–c. The values of βˆ0 and βˆ1 are highlighted on the printouts.

These values, βˆ0 = −.1 and βˆ1 = .7, agree exactly with our hand-calculated values.

The value of SSE = 1.10 is also highlighted on the printouts.

Whether you use a calculator or a computer, it is important that you be

able to interpret the intercept and slope in terms of the data being utilized to ﬁt

the model.

Figure 3.5a SAS printout

for advertising–sales

regression

Figure 3.5b SPSS printout

for advertising–sales

regression

Fitting the Model: The Method of Least Squares

99

Figure 3.5c MINITAB

printout for

advertising–sales

regression

In the advertising–sales example, our interpretation of the least squares slope,

βˆ1 = .7, is that the mean of sales revenue y will increase .7 unit for every 1-unit

increase in advertising expenditure x. Since y is measured in units of $1,000 and

x in units of $100, our interpretation is that mean monthly sales revenue increases

$700 for every $100 increase in monthly advertising expenditure. (We will attach a

measure of reliability to this inference in Section 3.6.)

The least squares intercept, βˆ0 = −.1, is our estimate of mean sales revenue

y when advertising expenditure is set at x = $0. Since sales revenue can never

be negative, why does such a nonsensical result occur? The reason is that we are

attempting to use the least squares model to predict y for a value of x (x = 0) that is

outside the range of the sample data and therefore impractical. (We have more to

say about predicting outside the range of the sample data—called extrapolation—in

Section 3.9.) Consequently, βˆ0 will not always have a practical interpretation. Only

when x = 0 is within the range of the x-values in the sample and is a practical value

will βˆ0 have a meaningful interpretation.

Even when the interpretations of the estimated parameters are meaningful,

we need to remember that they are only estimates based on the sample. As such,

their values will typically change in repeated sampling. How much conﬁdence do

we have that the estimated slope, βˆ1 , accurately approximates the true slope, β1 ?

This requires statistical inference, in the form of conﬁdence intervals and tests of

hypotheses, which we address in Section 3.6.

To summarize, we have deﬁned the best-ﬁtting straight line to be the one that

satisﬁes the least squares criterion; that is, the sum of the squared errors will be

smaller than for any other straight-line model. This line is called the least squares

line, and its equation is called the least squares prediction equation. In subsequent

sections, we show how to make statistical inferences about the model.

3.3 Exercises

3.6 Learning the mechanics. Use the method of least

squares to ﬁt a straight line to these six data points:

3.7 Learning the mechanics. Use the method of least

EX3_6

x

y

1

1

(b) Plot the data points and graph the least squares

line on the scatterplot.

squares to ﬁt a straight line to these ﬁve data points:

2

2

3

2

4

3

5

5

6

5

(a) What are the least squares estimates of β0

and β1 ?

EX3_7

x

y

−2

4

−1

3

0

3

1

1

2

−1

100 Chapter 3 Simple Linear Regression

(a) What are the least squares estimates of β0

and β1 ?

(b) Plot the data points and graph the least squares

line on the scatterplot.

TAMPALMS

PROPERTY

MARKET VALUE

(THOUS.)

SALE PRICE

(THOUS.)

1

2

3

4

5

..

.

$184.44

191.00

159.83

189.22

151.61

..

.

$382.0

230.0

220.0

277.0

205.0

..

.

72

73

74

75

76

263.40

194.58

219.15

322.67

325.96

325.0

252.0

270.0

305.0

450.0

3.8 Predicting home sales price. Real estate investors,

homebuyers, and homeowners often use the

appraised (or market) value of a property as a

basis for predicting sale price. Data on sale prices

and total appraised values of 76 residential properties sold in 2008 in an upscale Tampa, Florida,

neighborhood named Tampa Palms are saved in

the TAMPALMS ﬁle. The ﬁrst ﬁve and last ﬁve

observations of the data set are listed in the accompanying table.

(a) Propose a straight-line model to relate the

appraised property value x to the sale price y

for residential properties in this neighborhood.

Source: Hillsborough County (Florida) Property Appraiser’s Ofﬁce.

MINITAB Output for Exercise 3.8

Fitting the Model: The Method of Least Squares

(b) A MINITAB scatterplot of the data is shown

on the previous page. [Note: Both sale price

and total market value are shown in thousands

of dollars.] Does it appear that a straight-line

model will be an appropriate ﬁt to the data?

(c) A MINITAB simple linear regression printout is also shown (p. 100). Find the equation

of the best-ﬁtting line through the data on

the printout.

(d) Interpret the y-intercept of the least squares

line. Does it have a practical meaning for this

application? Explain.

(e) Interpret the slope of the least squares line.

Over what range of x is the interpretation

meaningful?

(f) Use the least squares model to estimate

the mean sale price of a property appraised

at $300,000.

3.9 Quantitative models of music. Writing in Chance

(Fall 2004), University of Konstanz (Germany)

statistics professor Jan Beran demonstrated that

certain aspects of music can be described by

quantitative models. For example, the information content of a musical composition (called

entropy) can be quantiﬁed by determining how

many times a certain pitch occurs. In a sample

of 147 famous compositions ranging from the

13th to the 20th century, Beran computed

the Z12-note entropy (y) and plotted it against

the year of birth (x) of the composer. The graph is

reproduced below.

Z12-Note-entropy vs. date of birth

y

2.4

2.2

2.0

1.8

1200

1400

1600

1800

x

(a) Do you observe a trend, especially since the

year 1400?

(b) The least squares line for the data since year

1400 is shown on the graph. Is the slope of

the line positive or negative? What does this

imply?

101

(c) Explain why the line shown is not the true line

of means.

3.10 Wind turbine blade stress.

Mechanical engineers at the University of Newcastle (Australia)

investigated the use of timber in high-efﬁciency

small wind turbine blades (Wind Engineering,

January 2004). The strengths of two types of

timber—radiata pine and hoop pine—were compared. Twenty specimens (called ‘‘coupons’’) of

each timber blade were fatigue tested by measuring the stress (in MPa) on the blade after various

numbers of blade cycles. A simple linear regression

analysis of the data—one conducted for each type

of timber—yielded the following results (where

y = stress and x = natural logarithm of number of

cycles):

Radiata Pine:

Hoop Pine:

yˆ = 97.37 − 2.50x

yˆ = 122.03 − 2.36x

(a) Interpret the estimated slope of each line

(b) Interpret the estimated y-intercept of each

line

(c) Based on these results, which type of timber

blade appears to be stronger and more fatigue

resistant? Explain.

3.11 In business, do nice guys ﬁnish ﬁrst or last? In

baseball, there is an old saying that ‘‘nice guys

ﬁnish last.’’ Is this true in the business world?

Researchers at Harvard University attempted to

answer this question and reported their results in

Nature (March 20, 2008). In the study, Boston-area

college students repeatedly played a version of the

game ‘‘prisoner’s dilemma,’’ where competitors

choose cooperation, defection, or costly punishment. (Cooperation meant paying 1 unit for the

opponent to receive 2 units; defection meant gaining 1 unit at a cost of 1 unit for the opponent; and

punishment meant paying 1 unit for the opponent

to lose 4 units.) At the conclusion of the games,

the researchers recorded the average payoff and

the number of times cooperation, defection, and

punishment were used for each player. The scattergrams (p. 102) plot average payoff (y) against level

of cooperation use, defection use, and punishment

use, respectively.

(a) Consider cooperation use (x) as a predictor of

average payoff (y). Based on the scattergram,

is there evidence of a linear trend?

(b) Consider defection use (x) as a predictor of

average payoff (y). Based on the scattergram,

is there evidence of a linear trend?

(c) Consider punishment use (x) as a predictor of

average payoff (y). Based on the scattergram,

is there evidence of a linear trend?

102 Chapter 3 Simple Linear Regression

(d) Refer to part c. Is the slope of the line relating punishment use (x) to average payoff (y)

positive or negative?

(e) The researchers concluded that ‘‘winners don’t

punish.’’ Do you agree? Explain.

BLACKBREAM

WEEK

NUMBER OF STRIKES

AGE OF FISH (days)

1

2

3

4

5

6

7

8

9

85

63

34

39

58

35

57

12

15

120

136

150

155

162

169

178

184

190

Source: Shand, J., et al. ‘‘Variability in the location of the

retinal ganglion cell area centralis is correlated with ontogenetic changes in feeding behavior in the Blackbream,

Acanthopagrus ‘butcher’,.’’ Brain and Behavior, Vol. 55,

No. 4, Apr. 2000 (Figure H).

3.13 Sweetness of orange juice. The quality of the

orange juice produced by a manufacturer (e.g.,

Minute Maid, Tropicana) is constantly monitored.

There are numerous sensory and chemical components that combine to make the best tasting

orange juice. For example, one manufacturer has

developed a quantitative index of the ‘‘sweetness’’ of orange juice. (The higher the index, the

OJUICE

RUN

3.12 Feeding behavior of blackbream ﬁsh. In Brain

and Behavior Evolution (April 2000), Zoologists

conducted a study of the feeding behavior of blackbream ﬁsh. The zoologists recorded the number of

aggressive strikes of two blackbream ﬁsh feeding

at the bottom of an aquarium in the 10-minute

period following the addition of food. The next

table lists the weekly number of strikes and age of

the ﬁsh (in days).

(a) Write the equation of a straight-line model

relating number of strikes (y) to age of ﬁsh (x).

(b) Fit the model to the data using the method

of least squares and give the least squares

prediction equation.

(c) Give a practical interpretation of the value of

βˆ0 , if possible.

(d) Give a practical interpretation of the value of

βˆ1 , if possible.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

SWEETNESS INDEX

PECTIN (ppm)

5.2

5.5

6.0

5.9

5.8

6.0

5.8

5.6

5.6

5.9

5.4

5.6

5.8

5.5

5.3

5.3

5.7

5.5

5.7

5.3

5.9

5.8

5.8

5.9

220

227

259

210

224

215

231

268

239

212

410

256

306

259

284

383

271

264

227

263

232

220

246

241

Note: The data in the table are authentic. For conﬁdentiality reasons, the manufacturer cannot be disclosed.

Fitting the Model: The Method of Least Squares

name and the name of the ﬁrst student, the third

student to say his/her name and the names of the

ﬁrst two students, and so on. After making their

introductions, the students listened to a seminar

speaker for 30 minutes. At the end of the seminar, all students were asked to remember the full

name of each of the other students in their group

and the researchers measured the proportion of

names recalled for each. One goal of the study was

to investigate the linear trend between y = recall

proportion and x = position (order) of the student

during the game. The data (simulated based on

summary statistics provided in the research article) for 144 students in the ﬁrst eight positions

are saved in the NAMEGAME2 ﬁle. The ﬁrst ﬁve

and last ﬁve observations in the data set are listed

in the table. [Note: Since the student in position

1 actually must recall the names of all the other

students, he or she is assigned position number 9

in the data set.] Use the method of least squares to

estimate the line, E(y) = β0 + β1 x. Interpret the β

estimates in the words of the problem.

sweeter the juice.) Is there a relationship between the sweetness index and a chemical measure

such as the amount of water-soluble pectin (parts

per million) in the orange juice? Data collected

on these two variables for 24 production runs at a

juice manufacturing plant are shown in the table

on p. 102. Suppose a manufacturer wants to use

simple linear regression to predict the sweetness

(y) from the amount of pectin (x).

(a) Find the least squares line for the data.

(b) Interpret βˆ0 and βˆ1 in the words of

the problem.

(c) Predict the sweetness index if amount of

pectin in the orange juice is 300 ppm. [Note:

A measure of reliability of such a prediction is

discussed in Section 3.9.]

3.14 Extending the life of an aluminum smelter pot. An

investigation of the properties of bricks used to line

aluminum smelter pots was published in the American Ceramic Society Bulletin (February 2005). Six

different commercial bricks were evaluated. The

life length of a smelter pot depends on the porosity

of the brick lining (the less porosity, the longer the

life); consequently, the researchers measured the

apparent porosity of each brick specimen, as well

as the mean pore diameter of each brick. The data

are given in the accompanying table

SMELTPOT

BRICK

A

B

C

D

E

F

APPARENT

POROSITY (%)

MEAN PORE DIAMETER

(micrometers)

18.0

18.3

16.3

6.9

17.1

20.4

12.0

9.7

7.3

5.3

10.9

16.8

Source: Bonadia, P., et al. ‘‘Aluminosilicate refractories

for aluminum cell linings,’’ American Ceramic Society

Bulletin, Vol. 84, No. 2, Feb. 2005 (Table II).

(a) Find the least squares line relating porosity

(y) to mean pore diameter (x).

(b) Interpret the y-intercept of the line.

(c) Interpret the slope of the line.

(d) Predict the apparent porosity percentage

for a brick with a mean pore diameter of

10 micrometers.

3.15 Recalling names of students. The Journal of

Experimental Psychology—Applied (June 2000)

published a study in which the ‘‘name game’’ was

used to help groups of students learn the names

of other students in the group. The ‘‘name game’’

requires the ﬁrst student in the group to state

his/her full name, the second student to say his/her

103

NAMEGAME2

POSITION

RECALL

2

2

2

2

2

..

.

0.04

0.37

1.00

0.99

0.79

..

.

9

9

9

9

9

0.72

0.88

0.46

0.54

0.99

Source: Morris, P.E., and Fritz, C.O. ‘‘The name game:

Using retrieval practice to improve the learning of

names,’’ Journal of Experimental Psychology—Applied,

Vol. 6, No. 2, June 2000 (data simulated from Figure 2).

Copyright © 2000 American Psychological Association,

reprinted with permission.

3.16 Spreading rate of spilled liquid. A contract engineer at DuPont Corp. studied the rate at which

a spilled volatile liquid will spread across a surface (Chemicial Engineering Progress, January

2005). Assume 50 gallons of methanol spills onto a

level surface outdoors. The engineer used derived

empirical formulas (assuming a state of turbulent

free convection) to calculate the mass (in pounds)

of the spill after a period of time ranging from 0 to

60 minutes. The calculated mass values are given in

the next table. Do the data indicate that the mass

of the spill tends to diminish as time increases? If

so, how much will the mass diminish each minute?

104 Chapter 3 Simple Linear Regression

LIQUIDSPILL

TIME

(minutes)

MASS

(pounds)

TIME

(minutes)

MASS

(pounds)

0

1

2

4

6

8

10

12

14

16

18

20

6.64

6.34

6.04

5.47

4.94

4.44

3.98

3.55

3.15

2.79

2.45

2.14

22

24

26

28

30

35

40

45

50

55

60

1.86

1.60

1.37

1.17

0.98

0.60

0.34

0.17

0.06

0.02

0.00

Source: Barry, J. ‘‘Estimating rates of spreading and evaporation of volatile

liquids,’’ Chemical Engineering Progress, Vol. 101, No. 1, Jan. 2005.

3.4 Model Assumptions

In the advertising–sales example presented in Section 3.3, we assumed that the

probabilistic model relating the ﬁrm’s sales revenue y to advertising dollars x is

y = β0 + β1 x + ε

Recall that the least squares estimate of the deterministic component of the model

β0 + β1 x is

yˆ = βˆ0 + βˆ1 x = −.1 + .7x

Now we turn our attention to the random component ε of the probabilistic model

and its relation to the errors of estimating β0 and β1 . In particular, we will see how

the probability distribution of ε determines how well the model describes the true

relationship between the dependent variable y and the independent variable x.

We make four basic assumptions about the general form of the probability

distribution of ε:

Assumption 1 The mean of the probability distribution of ε is 0. That is, the

average of the errors over an inﬁnitely long series of experiments is 0 for each

setting of the independent variable x. This assumption implies that the mean value

of y, E(y), for a given value of x is E(y) = β0 + β1 x.

Assumption 2 The variance of the probability distribution of ε is constant for all

settings of the independent variable x. For our straight-line model, this assumption

means that the variance of ε is equal to a constant, say, σ 2 , for all values of x.

Assumption 3

The probability distribution of ε is normal.

Assumption 4

The errors associated with any two different observations are

independent. That is, the error associated with one value of y has no effect on the

errors associated with other y values.

The implications of the ﬁrst three assumptions can be seen in Figure 3.6, which

shows distributions of errors for three particular values of x, namely, x1 , x2 , and x3 .

Note that the relative frequency distributions of the errors are normal, with a mean

of 0, and a constant variance σ 2 (all the distributions shown have the same amount

## 2011 (7th edition) william mendenhall a second course in statistics regression analysis prentice hall (2011)

## 2 Populations, Samples, and Random Sampling

## 6 Assessing the Utility of the Model: Making Inferences About the Slope β[sub(1)]

## 4 Fitting the Model: The Method of Least Squares

## 6 Testing the Utility of a Model: The Analysis of Variance F-Test

## 11 A Quadratic (Second-Order) Model with a Quantitative Predictor

## 1 Introduction: Why Model Building Is Important

## 1 Introduction: Why Use a Variable-Screening Method?

## 5 Extrapolation: Predicting Outside the Experimental Region

## 7 Follow-Up Analysis: Tukey’s Multiple Comparisons of Means

## B.7 Standard Errors of Estimators, Test Statistics, and Confidence Intervals for β[sub(0)], β[sub(1)], . . . , β[sub(k)]

Tài liệu liên quan