Tải bản đầy đủ
3 Fitting the Model: The Method of Least Squares

# 3 Fitting the Model: The Method of Least Squares

Tải bản đầy đủ

94 Chapter 3 Simple Linear Regression

Figure 3.2 Scatterplot for
data in Table 3.1

Figure 3.3 Visual
straight-line ﬁt to data in
Table 3.1

y

4

3

y = −1 + x

error of prediction = y − y = 2−3 = −1
(or residual)

2

1

0

x
1

2

3

4

5

the deviations (i.e., the differences between the observed and the predicted values
of y). These deviations or errors of prediction, are the vertical distances between
observed and predicted values of y (see Figure 3.3). The observed and predicted
values of y, their differences, and their squared differences are shown in Table 3.2.
Note that the sum of the errors (SE) equals 0 and the sum of squares of the errors
(SSE), which gives greater emphasis to large deviations of the points from the line,
is equal to 2.
By shifting the ruler around the graph, we can ﬁnd many lines for which the sum
of the errors is equal to 0, but it can be shown that there is one (and only one) line
for which the SSE is a minimum. This line is called the least squares line, regression
line, or least squares prediction equation.
To ﬁnd the least squares line for a set of data, assume that we have a sample
of n data points that can be identiﬁed by corresponding values of x and y, say,
(x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ). For example, the n = 5 data points shown in Table 3.2

Fitting the Model: The Method of Least Squares

95

Table 3.2 Comparing observed and predicted values for the visual model
x

y

Prediction
y˜ = −1 + x

Error of prediction
(y − y)
˜

Squared error
(y − y)
˜ 2

1

1

0

(1 − 0) =

1

1

2

1

1

(1 − 1) =

0

0

3

2

2

(2 − 2) =

0

0

4

2

3

(2 − 3) = −1

1

5

4

4

(4 − 4) =

0

0

Sum of errors (SE) =

0

Sum of squared errors (SSE) = 2

are (1, 1), (2, 1), (3, 2), (4, 2), and (5, 4). The straight-line model for the response y
in terms of x is
y = β0 + β1 x + ε
The line of means is
E(y) = β0 + β1 x
and the ﬁtted line, which we hope to ﬁnd, is represented as
yˆ = βˆ0 + βˆ1 x
The ‘‘hats’’ can be read as ‘‘estimator of.’’ Thus, yˆ is an estimator of the mean value
of y, E(y), and a predictor of some future value of y; and βˆ0 and βˆ1 are estimators
of β0 and β1 , respectively.
For a given data point, say, (xi , yi ), the observed value of y is yi and the predicted
value of y is obtained by substituting xi into the prediction equation:
yˆ i = βˆ0 + βˆ1 xi
The deviation of the ith value of y from its predicted value, called the ith residual, is
(yi − yˆ i ) = [yi − (βˆ0 + βˆ1 xi )]
Then the sum of squares of the deviations of the y-values about their predicted
values (i.e., the sum of squares of residuals) for all of the n data points is
n

SSE =

[yi − (βˆ0 + βˆ1 xi )]2
i=1

The quantities βˆ0 and βˆ1 that make the SSE a minimum are called the least
squares estimates of the population parameters β0 and β1 , and the prediction
equation yˆ = βˆ0 + βˆ1 x is called the least squares line.

Deﬁnition 3.1 The least squares line is one that satisﬁes the following two
properties:
1. SE = (yi − yˆ i ) = 0; i.e., the sum of the residuals is 0.
2. SSE = (yi − yˆ i )2 ; i.e., the sum of squared errors is smaller than for any
other straight-line model with SE = 0.

96 Chapter 3 Simple Linear Regression
The values of βˆ0 and βˆ1 that minimize the SSE are given by the formulas in
the box.∗

Formulas for the Least Squares Estimates
SSxy
SSxx
y-intercept : βˆ0 = y¯ − βˆ1 x¯
Slope : βˆ1 =

where
n

SSxy =

n

(xi − x)(y
¯ i − y)
¯ =
i=1
n

SSxx =

xi yi − nx¯ y¯
i=1

n

(xi − x)
¯ 2=
i=1

xi2 − n(x)
¯ 2
i=1

n = Sample size

Table 3.3 Preliminary computations for the
xi

yi

xi2

xi yi

1

1

1

1

2

1

4

2

3

2

9

6

4

2

16

8

5

4

25

20

xi = 15

yi = 10

xi2 = 55

xi yi = 37

Means: x¯ = 3

y¯ = 2

Totals:

Preliminary computations for ﬁnding the least squares line for the advertising–sales example are given in Table 3.3. We can now calculate.†
SSxy =

xi yi − nx¯ y¯ = 37 − 5(3)(2) = 37 − 30 = 7

SSxx =

xi2 − n(x)
¯ 2 = 55 − 5(3)2 = 55 − 45 = 10

Then, the slope of the least squares line is
βˆ1 =

SSxy
7
= .7
=
SSxx
10

∗ Students who are familiar with calculus should note that the values of β and β that minimize SSE =
0
1

(yi − yˆ i )2 are obtained by setting the two partial derivatives ∂SSE/∂β0 and ∂SSE/∂β1 equal to 0. The
solutions to these two equations yield the formulas shown in the box. (The complete derivation is provided in
Appendix A.) Furthermore, we denote the sample solutions to the equations by βˆ0 and βˆ1 , whereas the ‘‘∧’’
(hat) denotes that these are sample estimates of the true population intercept β0 and slope β1 .
† Since summations are used extensively from this point on, we omit the limits on
when the summation
includes all the measurements in the sample (i.e., when the summation is ni=1 , we write .)

Fitting the Model: The Method of Least Squares

97

and the y-intercept is
βˆ0 = y¯ − βˆ1 x¯
= 2 − (.7)(3) = 2 − 2.1 = −.1
The least squares line is then
yˆ = βˆ0 + βˆ1 x = −.1 + .7x
The graph of this line is shown in Figure 3.4.

Figure 3.4 Plot of the least
squares line yˆ = −.1 + .7x

The observed and predicted values of y, the deviations of the y-values about
their predicted values, and the squares of these deviations are shown in Table 3.4.
Note that the sum of squares of the deviations, SSE, is 1.10, and (as we would expect)
this is less than the SSE = 2.0 obtained in Table 3.2 for the visually ﬁtted line.

Table 3.4 Comparing observed and predicted values for the
least squares model
x

y

Predicted
yˆ = −.1 + .7x

1

1

.6

2

1

3

Residual (error)
(y − y)
ˆ
(1 − .6) =

Squared error
(y − y)
ˆ 2

.4

.16

1.3

(1 − 1.3) = −.3

.09

2

2.0

(2 − 2.0) =

0

.00

4

2

2.7

(2 − 2.7) = −.7

.49

5

4

3.4

(4 − 3.4) =

.36

.6

Sum of errors (SE) = 0

SSE = 1.10

The calculations required to obtain βˆ0 , βˆ1 , and SSE in simple linear regression,
although straightforward, can become rather tedious. Even with the use of a

98 Chapter 3 Simple Linear Regression
calculator, the process is laborious and susceptible to error, especially when the
sample size is large. Fortunately, the use of statistical computer software can
signiﬁcantly reduce the labor involved in regression calculations. The SAS, SPSS,
and MINITAB outputs for the simple linear regression of the data in Table 3.1 are
displayed in Figure 3.5a–c. The values of βˆ0 and βˆ1 are highlighted on the printouts.
These values, βˆ0 = −.1 and βˆ1 = .7, agree exactly with our hand-calculated values.
The value of SSE = 1.10 is also highlighted on the printouts.
Whether you use a calculator or a computer, it is important that you be
able to interpret the intercept and slope in terms of the data being utilized to ﬁt
the model.

Figure 3.5a SAS printout
regression

Figure 3.5b SPSS printout
regression

Fitting the Model: The Method of Least Squares

99

Figure 3.5c MINITAB
printout for
regression

In the advertising–sales example, our interpretation of the least squares slope,
βˆ1 = .7, is that the mean of sales revenue y will increase .7 unit for every 1-unit
increase in advertising expenditure x. Since y is measured in units of \$1,000 and
x in units of \$100, our interpretation is that mean monthly sales revenue increases
\$700 for every \$100 increase in monthly advertising expenditure. (We will attach a
measure of reliability to this inference in Section 3.6.)
The least squares intercept, βˆ0 = −.1, is our estimate of mean sales revenue
y when advertising expenditure is set at x = \$0. Since sales revenue can never
be negative, why does such a nonsensical result occur? The reason is that we are
attempting to use the least squares model to predict y for a value of x (x = 0) that is
outside the range of the sample data and therefore impractical. (We have more to
say about predicting outside the range of the sample data—called extrapolation—in
Section 3.9.) Consequently, βˆ0 will not always have a practical interpretation. Only
when x = 0 is within the range of the x-values in the sample and is a practical value
will βˆ0 have a meaningful interpretation.
Even when the interpretations of the estimated parameters are meaningful,
we need to remember that they are only estimates based on the sample. As such,
their values will typically change in repeated sampling. How much conﬁdence do
we have that the estimated slope, βˆ1 , accurately approximates the true slope, β1 ?
This requires statistical inference, in the form of conﬁdence intervals and tests of
hypotheses, which we address in Section 3.6.
To summarize, we have deﬁned the best-ﬁtting straight line to be the one that
satisﬁes the least squares criterion; that is, the sum of the squared errors will be
smaller than for any other straight-line model. This line is called the least squares
line, and its equation is called the least squares prediction equation. In subsequent
sections, we show how to make statistical inferences about the model.

3.3 Exercises
3.6 Learning the mechanics. Use the method of least
squares to ﬁt a straight line to these six data points:

3.7 Learning the mechanics. Use the method of least

EX3_6
x
y

1
1

(b) Plot the data points and graph the least squares
line on the scatterplot.
squares to ﬁt a straight line to these ﬁve data points:

2
2

3
2

4
3

5
5

6
5

(a) What are the least squares estimates of β0
and β1 ?

EX3_7
x
y

−2
4

−1
3

0
3

1
1

2
−1

100 Chapter 3 Simple Linear Regression
(a) What are the least squares estimates of β0
and β1 ?
(b) Plot the data points and graph the least squares
line on the scatterplot.

TAMPALMS
PROPERTY

MARKET VALUE
(THOUS.)

SALE PRICE
(THOUS.)

1
2
3
4
5
..
.

\$184.44
191.00
159.83
189.22
151.61
..
.

\$382.0
230.0
220.0
277.0
205.0
..
.

72
73
74
75
76

263.40
194.58
219.15
322.67
325.96

325.0
252.0
270.0
305.0
450.0

3.8 Predicting home sales price. Real estate investors,
homebuyers, and homeowners often use the
appraised (or market) value of a property as a
basis for predicting sale price. Data on sale prices
and total appraised values of 76 residential properties sold in 2008 in an upscale Tampa, Florida,
neighborhood named Tampa Palms are saved in
the TAMPALMS ﬁle. The ﬁrst ﬁve and last ﬁve
observations of the data set are listed in the accompanying table.
(a) Propose a straight-line model to relate the
appraised property value x to the sale price y
for residential properties in this neighborhood.

Source: Hillsborough County (Florida) Property Appraiser’s Ofﬁce.

MINITAB Output for Exercise 3.8

Fitting the Model: The Method of Least Squares

(b) A MINITAB scatterplot of the data is shown
on the previous page. [Note: Both sale price
and total market value are shown in thousands
of dollars.] Does it appear that a straight-line
model will be an appropriate ﬁt to the data?
(c) A MINITAB simple linear regression printout is also shown (p. 100). Find the equation
of the best-ﬁtting line through the data on
the printout.
(d) Interpret the y-intercept of the least squares
line. Does it have a practical meaning for this
application? Explain.
(e) Interpret the slope of the least squares line.
Over what range of x is the interpretation
meaningful?
(f) Use the least squares model to estimate
the mean sale price of a property appraised
at \$300,000.

3.9 Quantitative models of music. Writing in Chance
(Fall 2004), University of Konstanz (Germany)
statistics professor Jan Beran demonstrated that
certain aspects of music can be described by
quantitative models. For example, the information content of a musical composition (called
entropy) can be quantiﬁed by determining how
many times a certain pitch occurs. In a sample
of 147 famous compositions ranging from the
13th to the 20th century, Beran computed
the Z12-note entropy (y) and plotted it against
the year of birth (x) of the composer. The graph is
reproduced below.
Z12-Note-entropy vs. date of birth

y
2.4

2.2

2.0

1.8
1200

1400

1600

1800

x

(a) Do you observe a trend, especially since the
year 1400?
(b) The least squares line for the data since year
1400 is shown on the graph. Is the slope of
the line positive or negative? What does this
imply?

101

(c) Explain why the line shown is not the true line
of means.

Mechanical engineers at the University of Newcastle (Australia)
investigated the use of timber in high-efﬁciency
small wind turbine blades (Wind Engineering,
January 2004). The strengths of two types of
timber—radiata pine and hoop pine—were compared. Twenty specimens (called ‘‘coupons’’) of
each timber blade were fatigue tested by measuring the stress (in MPa) on the blade after various
numbers of blade cycles. A simple linear regression
analysis of the data—one conducted for each type
of timber—yielded the following results (where
y = stress and x = natural logarithm of number of
cycles):

Hoop Pine:

yˆ = 97.37 − 2.50x
yˆ = 122.03 − 2.36x

(a) Interpret the estimated slope of each line
(b) Interpret the estimated y-intercept of each
line
(c) Based on these results, which type of timber
blade appears to be stronger and more fatigue
resistant? Explain.

3.11 In business, do nice guys ﬁnish ﬁrst or last? In
baseball, there is an old saying that ‘‘nice guys
ﬁnish last.’’ Is this true in the business world?
Researchers at Harvard University attempted to
answer this question and reported their results in
Nature (March 20, 2008). In the study, Boston-area
college students repeatedly played a version of the
game ‘‘prisoner’s dilemma,’’ where competitors
choose cooperation, defection, or costly punishment. (Cooperation meant paying 1 unit for the
opponent to receive 2 units; defection meant gaining 1 unit at a cost of 1 unit for the opponent; and
punishment meant paying 1 unit for the opponent
to lose 4 units.) At the conclusion of the games,
the researchers recorded the average payoff and
the number of times cooperation, defection, and
punishment were used for each player. The scattergrams (p. 102) plot average payoff (y) against level
of cooperation use, defection use, and punishment
use, respectively.
(a) Consider cooperation use (x) as a predictor of
average payoff (y). Based on the scattergram,
is there evidence of a linear trend?
(b) Consider defection use (x) as a predictor of
average payoff (y). Based on the scattergram,
is there evidence of a linear trend?
(c) Consider punishment use (x) as a predictor of
average payoff (y). Based on the scattergram,
is there evidence of a linear trend?

102 Chapter 3 Simple Linear Regression
(d) Refer to part c. Is the slope of the line relating punishment use (x) to average payoff (y)
positive or negative?
(e) The researchers concluded that ‘‘winners don’t
punish.’’ Do you agree? Explain.

BLACKBREAM
WEEK

NUMBER OF STRIKES

AGE OF FISH (days)

1
2
3
4
5
6
7
8
9

85
63
34
39
58
35
57
12
15

120
136
150
155
162
169
178
184
190

Source: Shand, J., et al. ‘‘Variability in the location of the
retinal ganglion cell area centralis is correlated with ontogenetic changes in feeding behavior in the Blackbream,
Acanthopagrus ‘butcher’,.’’ Brain and Behavior, Vol. 55,
No. 4, Apr. 2000 (Figure H).

3.13 Sweetness of orange juice. The quality of the
orange juice produced by a manufacturer (e.g.,
Minute Maid, Tropicana) is constantly monitored.
There are numerous sensory and chemical components that combine to make the best tasting
orange juice. For example, one manufacturer has
developed a quantitative index of the ‘‘sweetness’’ of orange juice. (The higher the index, the
OJUICE
RUN

3.12 Feeding behavior of blackbream ﬁsh. In Brain
and Behavior Evolution (April 2000), Zoologists
conducted a study of the feeding behavior of blackbream ﬁsh. The zoologists recorded the number of
aggressive strikes of two blackbream ﬁsh feeding
at the bottom of an aquarium in the 10-minute
period following the addition of food. The next
table lists the weekly number of strikes and age of
the ﬁsh (in days).
(a) Write the equation of a straight-line model
relating number of strikes (y) to age of ﬁsh (x).
(b) Fit the model to the data using the method
of least squares and give the least squares
prediction equation.
(c) Give a practical interpretation of the value of
βˆ0 , if possible.
(d) Give a practical interpretation of the value of
βˆ1 , if possible.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

SWEETNESS INDEX

PECTIN (ppm)

5.2
5.5
6.0
5.9
5.8
6.0
5.8
5.6
5.6
5.9
5.4
5.6
5.8
5.5
5.3
5.3
5.7
5.5
5.7
5.3
5.9
5.8
5.8
5.9

220
227
259
210
224
215
231
268
239
212
410
256
306
259
284
383
271
264
227
263
232
220
246
241

Note: The data in the table are authentic. For conﬁdentiality reasons, the manufacturer cannot be disclosed.

Fitting the Model: The Method of Least Squares

name and the name of the ﬁrst student, the third
student to say his/her name and the names of the
ﬁrst two students, and so on. After making their
introductions, the students listened to a seminar
speaker for 30 minutes. At the end of the seminar, all students were asked to remember the full
name of each of the other students in their group
and the researchers measured the proportion of
names recalled for each. One goal of the study was
to investigate the linear trend between y = recall
proportion and x = position (order) of the student
during the game. The data (simulated based on
summary statistics provided in the research article) for 144 students in the ﬁrst eight positions
are saved in the NAMEGAME2 ﬁle. The ﬁrst ﬁve
and last ﬁve observations in the data set are listed
in the table. [Note: Since the student in position
1 actually must recall the names of all the other
students, he or she is assigned position number 9
in the data set.] Use the method of least squares to
estimate the line, E(y) = β0 + β1 x. Interpret the β
estimates in the words of the problem.

sweeter the juice.) Is there a relationship between the sweetness index and a chemical measure
such as the amount of water-soluble pectin (parts
per million) in the orange juice? Data collected
on these two variables for 24 production runs at a
juice manufacturing plant are shown in the table
on p. 102. Suppose a manufacturer wants to use
simple linear regression to predict the sweetness
(y) from the amount of pectin (x).
(a) Find the least squares line for the data.
(b) Interpret βˆ0 and βˆ1 in the words of
the problem.
(c) Predict the sweetness index if amount of
pectin in the orange juice is 300 ppm. [Note:
A measure of reliability of such a prediction is
discussed in Section 3.9.]

3.14 Extending the life of an aluminum smelter pot. An
investigation of the properties of bricks used to line
aluminum smelter pots was published in the American Ceramic Society Bulletin (February 2005). Six
different commercial bricks were evaluated. The
life length of a smelter pot depends on the porosity
of the brick lining (the less porosity, the longer the
life); consequently, the researchers measured the
apparent porosity of each brick specimen, as well
as the mean pore diameter of each brick. The data
are given in the accompanying table
SMELTPOT
BRICK

A
B
C
D
E
F

APPARENT
POROSITY (%)

MEAN PORE DIAMETER
(micrometers)

18.0
18.3
16.3
6.9
17.1
20.4

12.0
9.7
7.3
5.3
10.9
16.8

Source: Bonadia, P., et al. ‘‘Aluminosilicate refractories
for aluminum cell linings,’’ American Ceramic Society
Bulletin, Vol. 84, No. 2, Feb. 2005 (Table II).
(a) Find the least squares line relating porosity
(y) to mean pore diameter (x).
(b) Interpret the y-intercept of the line.
(c) Interpret the slope of the line.
(d) Predict the apparent porosity percentage
for a brick with a mean pore diameter of
10 micrometers.

3.15 Recalling names of students. The Journal of
Experimental Psychology—Applied (June 2000)
published a study in which the ‘‘name game’’ was
used to help groups of students learn the names
of other students in the group. The ‘‘name game’’
requires the ﬁrst student in the group to state
his/her full name, the second student to say his/her

103

NAMEGAME2
POSITION

RECALL

2
2
2
2
2
..
.

0.04
0.37
1.00
0.99
0.79
..
.

9
9
9
9
9

0.72
0.88
0.46
0.54
0.99

Source: Morris, P.E., and Fritz, C.O. ‘‘The name game:
Using retrieval practice to improve the learning of
names,’’ Journal of Experimental Psychology—Applied,
Vol. 6, No. 2, June 2000 (data simulated from Figure 2).
reprinted with permission.

3.16 Spreading rate of spilled liquid. A contract engineer at DuPont Corp. studied the rate at which
a spilled volatile liquid will spread across a surface (Chemicial Engineering Progress, January
2005). Assume 50 gallons of methanol spills onto a
level surface outdoors. The engineer used derived
empirical formulas (assuming a state of turbulent
free convection) to calculate the mass (in pounds)
of the spill after a period of time ranging from 0 to
60 minutes. The calculated mass values are given in
the next table. Do the data indicate that the mass
of the spill tends to diminish as time increases? If
so, how much will the mass diminish each minute?

104 Chapter 3 Simple Linear Regression
LIQUIDSPILL
TIME
(minutes)

MASS
(pounds)

TIME
(minutes)

MASS
(pounds)

0
1
2
4
6
8
10
12
14
16
18
20

6.64
6.34
6.04
5.47
4.94
4.44
3.98
3.55
3.15
2.79
2.45
2.14

22
24
26
28
30
35
40
45
50
55
60

1.86
1.60
1.37
1.17
0.98
0.60
0.34
0.17
0.06
0.02
0.00

Source: Barry, J. ‘‘Estimating rates of spreading and evaporation of volatile
liquids,’’ Chemical Engineering Progress, Vol. 101, No. 1, Jan. 2005.

3.4 Model Assumptions
In the advertising–sales example presented in Section 3.3, we assumed that the
probabilistic model relating the ﬁrm’s sales revenue y to advertising dollars x is
y = β0 + β1 x + ε
Recall that the least squares estimate of the deterministic component of the model
β0 + β1 x is
yˆ = βˆ0 + βˆ1 x = −.1 + .7x
Now we turn our attention to the random component ε of the probabilistic model
and its relation to the errors of estimating β0 and β1 . In particular, we will see how
the probability distribution of ε determines how well the model describes the true
relationship between the dependent variable y and the independent variable x.
We make four basic assumptions about the general form of the probability
distribution of ε:
Assumption 1 The mean of the probability distribution of ε is 0. That is, the
average of the errors over an inﬁnitely long series of experiments is 0 for each
setting of the independent variable x. This assumption implies that the mean value
of y, E(y), for a given value of x is E(y) = β0 + β1 x.
Assumption 2 The variance of the probability distribution of ε is constant for all
settings of the independent variable x. For our straight-line model, this assumption
means that the variance of ε is equal to a constant, say, σ 2 , for all values of x.
Assumption 3

The probability distribution of ε is normal.

Assumption 4
The errors associated with any two different observations are
independent. That is, the error associated with one value of y has no effect on the
errors associated with other y values.
The implications of the ﬁrst three assumptions can be seen in Figure 3.6, which
shows distributions of errors for three particular values of x, namely, x1 , x2 , and x3 .
Note that the relative frequency distributions of the errors are normal, with a mean
of 0, and a constant variance σ 2 (all the distributions shown have the same amount