13 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters
Tải bản đầy đủ
Chapter 12
Multiple Linear Regression and
Certain Nonlinear Regression
Models
12.1
Introduction
In most research problems where regression analysis is applied, more than one
independent variable is needed in the regression model. The complexity of most
scientiﬁc mechanisms is such that in order to be able to predict an important
response, a multiple regression model is needed. When this model is linear in
the coeﬃcients, it is called a multiple linear regression model. For the case of
k independent variables x1 , x2 , . . . , xk , the mean of Y |x1 , x2 , . . . , xk is given by the
multiple linear regression model
μY |x1 ,x2 ,...,xk = β0 + β1 x1 + · · · + βk xk ,
and the estimated response is obtained from the sample regression equation
yˆ = b0 + b1 x1 + · · · + bk xk ,
where each regression coeﬃcient βi is estimated by bi from the sample data using
the method of least squares. As in the case of a single independent variable, the
multiple linear regression model can often be an adequate representation of a more
complicated structure within certain ranges of the independent variables.
Similar least squares techniques can also be applied for estimating the coeﬃcients when the linear model involves, say, powers and products of the independent
variables. For example, when k = 1, the experimenter may believe that the means
μY |x do not fall on a straight line but are more appropriately described by the
polynomial regression model
μY |x = β0 + β1 x + β2 x2 + · · · + βr xr ,
and the estimated response is obtained from the polynomial regression equation
yˆ = b0 + b1 x + b2 x2 + · · · + br xr .
443
444
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
Confusion arises occasionally when we speak of a polynomial model as a linear
model. However, statisticians normally refer to a linear model as one in which the
parameters occur linearly, regardless of how the independent variables enter the
model. An example of a nonlinear model is the exponential relationship
μY |x = αβ x ,
whose response is estimated by the regression equation
yˆ = abx .
There are many phenomena in science and engineering that are inherently nonlinear in nature, and when the true structure is known, an attempt should certainly
be made to ﬁt the actual model. The literature on estimation by least squares of
nonlinear models is voluminous. The nonlinear models discussed in this chapter
deal with nonideal conditions in which the analyst is certain that the response and
hence the response model error are not normally distributed but, rather, have a
binomial or Poisson distribution. These situations do occur extensively in practice.
A student who wants a more general account of nonlinear regression should
consult Classical and Modern Regression with Applications by Myers (1990; see
the Bibliography).
12.2
Estimating the Coeﬃcients
In this section, we obtain the least squares estimators of the parameters β0 , β1 , . . . , βk
by ﬁtting the multiple linear regression model
μY |x1 ,x2 ,...,xk = β0 + β1 x1 + · · · + βk xk
to the data points
{(x1i , x2i , . . . , xki , yi );
i = 1, 2, . . . , n and n > k},
where yi is the observed response to the values x1i , x2i , . . . , xki of the k independent
variables x1 , x2 , . . . , xk . Each observation (x1i , x2i , . . . , xki , yi ) is assumed to satisfy
the following equation.
yi = β0 + β1 x1i + β2 x2i + · · · + βk xki +
Multiple Linear
Regression Model or
i
yi = yˆi + ei = b0 + b1 x1i + b2 x2i + · · · + bk xki + ei ,
where i and ei are the random error and residual, respectively, associated with
the response yi and ﬁtted value yˆi .
As in the case of simple linear regression, it is assumed that the i are independent
and identically distributed with mean 0 and common variance σ 2 .
In using the concept of least squares to arrive at estimates b0 , b1 , . . . , bk , we
minimize the expression
n
n
(yi − b0 − b1 x1i − b2 x2i − · · · − bk xki )2 .
e2i =
SSE =
i=1
i=1
Diﬀerentiating SSE in turn with respect to b0 , b1 , . . . , bk and equating to zero, we
generate the set of k + 1 normal equations for multiple linear regression.
12.2 Estimating the Coeﬃcients
Normal Estimation
Equations for
Multiple Linear
Regression
445
n
nb0 + b1
n
x1i
+ b2
i=1
n
n
b0
i=1
i=1
x1i xki =
i=1
x1i yi
i=1
..
.
n
..
.
n
xki x2i + · · · + bk
xki x1i + b2
yi
i=1
n
i=1
..
.
n
xki + b1
=
i=1
n
i=1
..
.
n
b0
xki
x1i x2i + · · · + bk
+ b2
i=1
..
.
+ · · · + bk
i=1
n
x21i
x1i + b1
n
n
x2i
i=1
n
x2ki
i=1
=
xki yi
i=1
These equations can be solved for b0 , b1 , b2 , . . . , bk by any appropriate method for
solving systems of linear equations. Most statistical software can be used to obtain
numerical solutions of the above equations.
Example 12.1: A study was done on a diesel-powered light-duty pickup truck to see if humidity, air
temperature, and barometric pressure inﬂuence emission of nitrous oxide (in ppm).
Emission measurements were taken at diﬀerent times, with varying experimental
conditions. The data are given in Table 12.2. The model is
μY |x1 ,x2 ,x3 = β0 + β1 x1 + β2 x2 + β3 x3 ,
or, equivalently,
yi = β0 + β1 x1i + β2 x2i + β3 x3i + i ,
i = 1, 2, . . . , 20.
Fit this multiple linear regression model to the given data and then estimate the
amount of nitrous oxide emitted for the conditions where humidity is 50%, temperature is 76◦ F, and barometric pressure is 29.30.
Table 12.1: Data for Example 12.1
Nitrous
Oxide, y
0.90
0.91
0.96
0.89
1.00
1.10
1.15
1.03
0.77
1.07
Humidity,
x1
72.4
41.6
34.3
35.1
10.7
12.9
8.3
20.1
72.2
24.0
Temp.,
x2
76.3
70.3
77.1
68.0
79.0
67.4
66.8
76.9
77.7
67.7
Pressure,
x3
29.18
29.35
29.24
29.27
29.78
29.39
29.69
29.48
29.09
29.60
Nitrous
Oxide, y
1.07
0.94
1.10
1.10
1.10
0.91
0.87
0.78
0.82
0.95
Humidity,
x1
23.2
47.4
31.5
10.6
11.2
73.3
75.4
96.6
107.4
54.9
Temp.,
x2
76.8
86.6
76.9
86.3
86.0
76.3
77.9
78.7
86.8
70.9
Pressure,
x3
29.38
29.35
29.63
29.56
29.48
29.40
29.28
29.29
29.03
29.37
Source: Charles T. Hare, “Light-Duty Diesel Emission Correction Factors for Ambient Conditions,” EPA-600/2-77116. U.S. Environmental Protection Agency.
Solution : The solution of the set of estimating equations yields the unique estimates
b0 = −3.507778, b1 = −0.002625, b2 = 0.000799, b3 = 0.154155.
446
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
Therefore, the regression equation is
yˆ = −3.507778 − 0.002625 x1 + 0.000799 x2 + 0.154155 x3 .
For 50% humidity, a temperature of 76◦ F, and a barometric pressure of 29.30, the
estimated amount of nitrous oxide emitted is
yˆ = −3.507778 − 0.002625(50.0) + 0.000799(76.0) + 0.1541553(29.30)
= 0.9384 ppm.
Polynomial Regression
Now suppose that we wish to ﬁt the polynomial equation
μY |x = β0 + β1 x + β2 x2 + · · · + βr xr
to the n pairs of observations {(xi , yi ); i = 1, 2, . . . , n}. Each observation, yi ,
satisﬁes the equation
yi = β0 + β1 xi + β2 x2i + · · · + βr xri +
i
or
yi = yˆi + ei = b0 + b1 xi + b2 x2i + · · · + br xri + ei ,
where r is the degree of the polynomial and i and ei are again the random error
and residual associated with the response yi and ﬁtted value yˆi , respectively. Here,
the number of pairs, n, must be at least as large as r + 1, the number of parameters
to be estimated.
Notice that the polynomial model can be considered a special case of the more
general multiple linear regression model, where we set x1 = x, x2 = x2 , . . . , xr = xr .
The normal equations assume the same form as those given on page 445. They are
then solved for b0 , b1 , b2 , . . . , br .
Example 12.2: Given the data
0
1
2
3
4
5
6
7
8
9
x
y 9.1 7.3 3.2 4.6 4.8 2.9 5.7 7.1 8.8 10.2
ﬁt a regression curve of the form μY |x = β0 + β1 x + β2 x2 and then estimate μY |2 .
Solution : From the data given, we ﬁnd that
10b0 + 45 b1 + 285 b2 = 63.7,
45b0 + 285b1 + 2025 b2 = 307.3,
285b0 + 2025 b1 + 15,333b2 = 2153.3.
Solving these normal equations, we obtain
b0 = 8.698, b1 = −2.341, b2 = 0.288.
Therefore,
yˆ = 8.698 − 2.341x + 0.288x2 .
12.3 Linear Regression Model Using Matrices
447
When x = 2, our estimate of μY |2 is
yˆ = 8.698 − (2.341)(2) + (0.288)(22 ) = 5.168.
Example 12.3: The data in Table 12.2 represent the percent of impurities that resulted for various
temperatures and sterilizing times during a reaction associated with the manufacturing of a certain beverage. Estimate the regression coeﬃcients in the polynomial
model
yi = β0 + β1 x1i + β2 x2i + β11 x21i + β22 x22i + β12 x1i x2i + i ,
for i = 1, 2, . . . , 18.
Table 12.2: Data for Example 12.3
Sterilizing
Time, x2 (min)
15
20
25
Temperature, x1 (◦ C)
75
100
125
7.55
10.55
14.05
6.59
9.48
14.93
9.23
13.63
16.56
8.78
11.75
15.85
15.93
18.55
22.41
16.44
17.98
21.66
Solution : Using the normal equations, we obtain
b0 = 56.4411,
b11 = 0.00081,
b1 = −0.36190,
b22 = 0.08173,
b2 = −2.75299,
b12 = 0.00314,
and our estimated regression equation is
yˆ = 56.4411 − 0.36190x1 − 2.75299x2 + 0.00081x21 + 0.08173x22 + 0.00314x1 x2 .
Many of the principles and procedures associated with the estimation of polynomial regression functions fall into the category of response surface methodology, a collection of techniques that have been used quite successfully by scientists
and engineers in many ﬁelds. The x2i are called pure quadratic terms, and the
xi xj (i = j) are called interaction terms. Such problems as selecting a proper
experimental design, particularly in cases where a large number of variables are
in the model, and choosing optimum operating conditions for x1 , x2 , . . . , xk are
often approached through the use of these methods. For an extensive exposure,
the reader is referred to Response Surface Methodology: Process and Product Optimization Using Designed Experiments by Myers, Montgomery, and Anderson-Cook
(2009; see the Bibliography).
12.3
Linear Regression Model Using Matrices
In ﬁtting a multiple linear regression model, particularly when the number of variables exceeds two, a knowledge of matrix theory can facilitate the mathematical
manipulations considerably. Suppose that the experimenter has k independent
448
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
variables x1 , x2 , . . . , xk and n observations y1 , y2 , . . . , yn , each of which can be expressed by the equation
yi = β0 + β1 x1i + β2 x2i + · · · + βk xki + i .
This model essentially represents n equations describing how the response values
are generated in the scientiﬁc process. Using matrix notation, we can write the
following equation:
General Linear
Model where
y = Xβ + ,
⎡
⎤
y1
⎢ y2 ⎥
⎢ ⎥
y = ⎢ . ⎥,
⎣ .. ⎦
yn
⎡
⎤
xk1
xk2 ⎥
⎥
.. ⎥ ,
. ⎦
1
⎢1
⎢
X = ⎢.
⎣ ..
x11
x12
..
.
x21
x22
..
.
···
···
1
x1n
x2n
· · · xkn
⎡
⎤
β0
⎢ β1 ⎥
⎢ ⎥
β = ⎢ . ⎥,
⎣ .. ⎦
βk
⎡ ⎤
⎢
⎢
=⎢
⎣
1
⎥
2⎥
.. ⎥ .
.⎦
n
Then the least squares method for estimation of β, illustrated in Section 12.2,
involves ﬁnding b for which
SSE = (y − Xb) (y − Xb)
is minimized. This minimization process involves solving for b in the equation
∂
(SSE) = 0.
∂b
We will not present the details regarding solution of the equations above. The
result reduces to the solution of b in
(X X)b = X y.
Notice the nature of the X matrix. Apart from the initial element, the ith row
represents the x-values that give rise to the response yi . Writing
⎤
⎡
n
n
n
n
x1i
x2i
···
xki
⎥
⎢
i=1
i=1
i=1
⎥
⎢ n
n
n
n
⎢
2
x1i
x1i
x1i x2i · · ·
x1i xki ⎥
⎥
⎢
⎥
i=1
i=1
i=1
A=XX=⎢
⎥
⎢ i=1.
..
..
..
⎥
⎢ .
.
.
.
⎥
⎢ .
⎦
⎣ n
n
n
n
2
xki
xki x1i
xki x2i · · ·
xki
i=1
and
i=1
i=1
⎡
i=1
⎤
n
g0 =
yi
⎢
i=1
⎢
⎢ g = n x y
⎢ 1
1i i
i=1
g=Xy=⎢
⎢
..
⎢
.
⎢
⎣
n
xki yi
gk =
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎥
⎦
i=1
allows the normal equations to be put in the matrix form
Ab = g.
12.3 Linear Regression Model Using Matrices
449
If the matrix A is nonsingular, we can write the solution for the regression
coeﬃcients as
b = A−1 g = (X X)
−1
X y.
Thus, we can obtain the prediction equation or regression equation by solving a
set of k + 1 equations in a like number of unknowns. This involves the inversion of
the k + 1 by k + 1 matrix X X. Techniques for inverting this matrix are explained
in most textbooks on elementary determinants and matrices. Of course, there are
many high-speed computer packages available for multiple regression problems,
packages that not only print out estimates of the regression coeﬃcients but also
provide other information relevant to making inferences concerning the regression
equation.
Example 12.4: The percent survival rate of sperm in a certain type of animal semen, after storage,
was measured at various combinations of concentrations of three materials used to
increase chance of survival. The data are given in Table 12.3. Estimate the multiple
linear regression model for the given data.
Table 12.3: Data for Example 12.4
y (% survival)
25.5
31.2
25.9
38.4
18.4
26.7
26.4
25.9
32.0
25.2
39.7
35.7
26.5
x1 (weight %)
1.74
6.32
6.22
10.52
1.19
1.22
4.10
6.32
4.08
4.15
10.15
1.72
1.70
x2 (weight %)
5.30
5.42
8.41
4.63
11.60
5.85
6.62
8.72
4.42
7.60
4.83
3.12
5.30
Solution : The least squares estimating equations, (X X)b = X y, are
⎤⎡ ⎤ ⎡
⎡
b0
13.0
59.43
81.82
115.40
⎢ 59.43 394.7255 360.6621 522.0780 ⎥ ⎢b1 ⎥ ⎢
⎥⎢ ⎥ ⎢
⎢
⎣ 81.82 360.6621 576.7264 728.3100 ⎦ ⎣b2 ⎦ = ⎣
115.40 522.0780 728.3100 1035.9600
b3
x3 (weight %)
10.80
9.40
7.20
8.50
9.40
9.90
8.00
9.10
8.70
9.20
9.40
7.60
8.20
⎤
377.5
1877.567 ⎥
⎥.
2246.661 ⎦
3337.780
From a computer readout we obtain the elements of the inverse matrix
⎤
⎡
8.0648 −0.0826 −0.0942 −0.7905
⎢ −0.0826
0.0085
0.0017
0.0037 ⎥
⎥,
(X X)−1 = ⎢
⎣ −0.0942
0.0017
0.0166 −0.0021 ⎦
−0.7905
0.0037 −0.0021
0.0886
−1
and then, using the relation b = (X X) X y, the estimated regression coeﬃcients
are obtained as
/
450
/
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
b0 = 39.1574, b1 = 1.0161, b2 = −1.8616, b3 = −0.3433.
Hence, our estimated regression equation is
yˆ = 39.1574 + 1.0161x1 − 1.8616x2 − 0.3433x3 .
Exercises
12.1 A set of experimental runs was made to determine a way of predicting cooking time y at various
values of oven width x1 and ﬂue temperature x2 . The
coded data were recorded as follows:
y
x1
x2
1.15
1.32
6.40
3.40
2.69
15.05
4.10
3.56
18.75
8.75
4.41
30.25
14.82
5.35
44.85
15.15
6.20
48.94
15.32
7.12
51.55
18.18
8.87
61.50
35.19
9.80
100.44
40.40
10.65
111.42
Estimate the multiple linear regression equation
μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .
12.2 In Applied Spectroscopy, the infrared reﬂectance
spectra properties of a viscous liquid used in the electronics industry as a lubricant were studied. The designed experiment consisted of the eﬀect of band frequency x1 and ﬁlm thickness x2 on optical density y
using a Perkin-Elmer Model 621 infrared spectrometer.
(Source: Pacansky, J., England, C. D., and Wattman,
R., 1986.)
y
x1
x2
1.10
740
0.231
0.62
740
0.107
0.31
740
0.053
1.10
805
0.129
0.62
805
0.069
0.31
805
0.030
1.10
980
1.005
0.62
980
0.559
0.31
980
0.321
1.10
1235
2.948
0.62
1235
1.633
0.31
1235
0.934
Estimate the multiple linear regression equation
yˆ = b0 + b1 x1 + b2 x2 .
12.3 Suppose in Review Exercise 11.53 on page 437
that we were also given the number of class periods
missed by the 12 students taking the chemistry course.
The complete data are shown.
Chemistry
Test
Classes
Student Grade, y Score, x1 Missed, x2
1
65
85
1
7
50
74
2
5
55
76
3
2
65
90
4
6
55
85
5
3
70
87
6
2
65
94
7
5
70
98
8
4
55
81
9
3
70
91
10
1
50
76
11
4
55
74
12
(a) Fit a multiple linear regression equation of the form
yˆ = b0 + b1 x1 + b2 x2 .
(b) Estimate the chemistry grade for a student who has
an intelligence test score of 60 and missed 4 classes.
12.4 An experiment was conducted to determine if
the weight of an animal can be predicted after a given
period of time on the basis of the initial weight of the
animal and the amount of feed that was eaten. The
following data, measured in kilograms, were recorded:
Final
Initial
Feed
Weight, y Weight, x1 Weight, x2
272
42
95
226
33
77
259
33
80
292
45
100
311
39
97
183
36
70
173
32
50
236
41
80
230
40
92
235
38
84
(a) Fit a multiple regression equation of the form
μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .
(b) Predict the ﬁnal weight of an animal having an initial weight of 35 kilograms that is given 250 kilograms of feed.
12.5 The electric power consumed each month by a
chemical plant is thought to be related to the average
ambient temperature x1 , the number of days in the
month x2 , the average product purity x3 , and the tons
of product produced x4 . The past year’s historical data
are available and are presented in the following table.
/
/
Exercises
451
y
x1
x2
x3
x4
100
91
24
25
240
95
90
21
31
236
110
88
24
45
290
88
87
25
60
274
94
91
25
65
301
99
94
26
72
316
97
87
25
80
300
96
86
25
84
296
110
88
24
75
267
105
91
25
60
276
100
90
25
50
288
98
89
23
38
261
(a) Fit a multiple linear regression model using the
above data set.
(b) Predict power consumption for a month in which
x1 = 75◦ F, x2 = 24 days, x3 = 90%, and x4 = 98
tons.
12.6 An experiment was conducted on a new model
of a particular make of automobile to determine the
stopping distance at various speeds. The following data
were recorded.
Speed, v (km/hr)
35 50 65 80 95 110
Stopping Distance, d (m) 16 26 41 62 88 119
(a) Fit a multiple regression curve of the form μD|v =
β0 + β1 v + β2 v 2 .
(b) Estimate the stopping distance when the car is
traveling at 70 kilometers per hour.
12.8 The following is a set of coded experimental data
on the compressive strength of a particular alloy at various values of the concentration of some additive:
Concentration,
Compressive
x
Strength, y
10.0
25.2
27.3
28.7
15.0
29.8
31.1
27.8
20.0
31.2
32.6
29.7
25.0
31.7
30.1
32.3
30.0
29.4
30.8
32.8
(a) Estimate the quadratic regression equation μY |x =
β0 + β1 x + β2 x2 .
(b) Test for lack of ﬁt of the model.
12.7 An experiment was conducted in order to determine if cerebral blood ﬂow in human beings can be
predicted from arterial oxygen tension (millimeters of
mercury). Fifteen patients participated in the study,
and the following data were collected:
12.11 An experiment was conducted to study the size
of squid eaten by sharks and tuna. The regressor variables are characteristics of the beaks of the squid. The
data are given as follows:
x1
x2
x3
x4
x5
y
1.95
1.31 1.07 0.44 0.75 0.35
2.90
1.55 1.49 0.53 0.90 0.47
0.72
0.99 0.84 0.34 0.57 0.32
0.81
0.99 0.83 0.34 0.54 0.27
1.09
1.01 0.90 0.36 0.64 0.30
1.22
1.09 0.93 0.42 0.61 0.31
1.02
1.08 0.90 0.40 0.51 0.31
1.93
1.27 1.08 0.44 0.77 0.34
0.64
0.99 0.85 0.36 0.56 0.29
2.08
1.34 1.13 0.45 0.77 0.37
1.98
1.30 1.10 0.45 0.76 0.38
1.90
1.33 1.10 0.48 0.77 0.38
8.56
1.86 1.47 0.60 1.01 0.65
1.58 1.34 0.52 0.95 0.50
4.49
8.49
1.97 1.59 0.67 1.20 0.59
6.17
1.80 1.56 0.66 1.02 0.59
7.54
1.75 1.58 0.63 1.09 0.59
6.36
1.72 1.43 0.64 1.02 0.63
7.63
1.68 1.57 0.72 0.96 0.68
7.78
1.75 1.59 0.68 1.08 0.62
2.19 1.86 0.75 1.24 0.72 10.15
6.88
1.73 1.67 0.64 1.14 0.55
Blood Flow, Arterial Oxygen
y
Tension, x
84.33
603.40
87.80
582.50
82.20
556.20
78.21
594.60
78.44
558.90
80.01
575.20
83.53
580.10
79.46
451.20
75.22
404.00
76.58
484.00
77.90
452.40
78.80
448.40
80.67
334.80
86.60
320.30
78.20
350.30
Estimate the quadratic regression equation
μY |x = β0 + β1 x + β2 x2 .
12.9 (a) Fit a multiple regression equation of the
form μY |x = β0 + β1 x1 + β2 x2 to the data of Example 11.8 on page 420.
(b) Estimate the yield of the chemical reaction for a
temperature of 225◦ C.
12.10 The following data are given:
x
y
0
1
1
4
2
5
3
3
4
2
5
3
6
4
(a) Fit the cubic model μY |x = β0 + β1 x + β2 x2 + β3 x3 .
(b) Predict Y when x = 2.
/
452
/
Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models
In the study, the regressor variables and response considered are
x1 = rostral length, in inches,
x2 = wing length, in inches,
x3 = rostral to notch length, in inches,
x4 = notch to wing length, in inches,
x5 = width, in inches,
y = weight, in pounds.
Estimate the multiple linear regression equation
μY |x1 ,x2 ,x3 ,x4 ,x5
= β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 .
12.12 The following data reﬂect information from 17
U.S. Naval hospitals at various sites around the world.
The regressors are workload variables, that is, items
that result in the need for personnel in a hospital. A
brief description of the variables is as follows:
y = monthly labor-hours,
x1 = average daily patient load,
x2 = monthly X-ray exposures,
x3 = monthly occupied bed-days,
x4 = eligible population in the area/1000,
x5 = average length of patient’s stay, in days.
x2
x3
x4
x5
y
Site x1
472.92 18.0 4.45
2463
566.52
1 15.57
1339.75
2048
9.5 6.92
696.82
2 44.02
3940
1033.15
620.25 12.8 4.28
3 20.42
6505
1003.62
568.33 36.7 3.90
4 18.74
5723
1611.37
1497.60 35.7 5.50
5 49.20
1613.27
1365.83 24.0 4.60
6 44.92 11,520
5779
1854.17
1687.00 43.3 5.62
7 55.48
5969
2160.55
1639.92 46.7 5.15
8 59.28
8461
2305.58
2872.33 78.7 6.18
9 94.39
3503.93
3655.08 180.5 6.15
10 128.02 20,106
3571.59
2912.00 60.9 5.88
11 96.00 13,313
3741.40
3921.00 103.7 4.88
12 131.42 10,771
4026.52
3865.67 126.8 5.50
13 127.21 15,543
7684.10 157.7 7.00 10,343.81
14 252.90 36,194
15 409.20 34,703 12,446.33 169.4 10.75 11,732.17
16 463.70 39,204 14,098.40 331.4 7.05 15,414.94
17 510.22 86,533 15,524.00 371.6 6.35 18,854.45
The goal here is to produce an empirical equation that
will estimate (or predict) personnel needs for Naval
hospitals. Estimate the multiple linear regression equation
μY |x1 ,x2 ,x3 ,x4 ,x5
= β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 .
12.13 A study was performed on a type of bearing to ﬁnd the relationship of amount of wear y to
x1 = oil viscosity and x2 = load. The following data
were obtained. (From Response Surface Methodology,
Myers, Montgomery, and Anderson-Cook, 2009.)
y
x1
x2
y
x1
x2
193
1.6
851
230
15.5
816
172
22.0
1058
91
43.0
1201
113
33.0
1357
125
40.0
1115
(a) Estimate the unknown parameters of the multiple
linear regression equation
μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .
(b) Predict wear when oil viscosity is 20 and load is
1200.
12.14 Eleven student teachers took part in an evaluation program designed to measure teacher eﬀectiveness and determine what factors are important. The
response measure was a quantitative evaluation of the
teacher. The regressor variables were scores on four
standardized tests given to each teacher. The data are
as follows:
y
x1
x2
x3
x4
55.66
59.00
125
69
410
63.97
31.75
131
57
569
45.32
80.50
141
77
425
46.67
75.00
122
81
344
41.21
49.00
141
0
324
43.83
49.35
152
53
505
41.61
60.75
141
77
235
64.57
41.25
132
76
501
42.41
50.75
157
65
400
57.95
32.25
166
97
584
57.90
54.50
141
76
434
Estimate the multiple linear regression equation
μY |x1 ,x2 ,x3 ,x4 = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 .
12.15 The personnel department of a certain industrial ﬁrm used 12 subjects in a study to determine the
relationship between job performance rating (y) and
scores on four tests. The data are as follows:
x2
x3
x4
y
x1
11.2 56.5 71.0 38.5 43.0
14.5 59.5 72.5 38.2 44.8
17.2 69.2 76.0 42.5 49.0
17.8 74.5 79.5 43.4 56.3
19.3 81.2 84.0 47.5 60.2
24.5 88.0 86.2 47.4 62.0
21.2 78.2 80.5 44.5 58.1
16.9 69.0 72.0 41.8 48.1
14.8 58.1 68.0 42.1 46.0
20.0 80.5 85.0 48.1 60.3
13.2 58.3 71.0 37.5 47.1
22.5 84.0 87.2 51.0 65.2
12.4 Properties of the Least Squares Estimators
Estimate the regression coeﬃcients in the model
yˆ = b0 + b1 x1 + b2 x2 + b3 x3 + b4 x4 .
12.16 An engineer at a semiconductor company
wants to model the relationship between the gain or
hFE of a device (y) and three parameters: emitter-RS
(x1 ), base-RS (x2 ), and emitter-to-base-RS (x3 ). The
data are shown below:
x1 ,
x2 ,
x3 ,
y,
Emitter-RS Base-RS E-B-RS
hFE
128.40
7.000
226.0
14.62
52.62
3.375
220.0
15.63
113.90
6.375
217.4
14.62
98.01
6.000
220.0
15.00
139.90
7.625
226.5
14.50
102.60
6.000
224.1
15.25
(cont.)
12.4
453
x1 ,
x2 ,
x3 ,
y,
Emitter-RS Base-RS E-B-RS
hFE
3.375
220.5
48.14
16.12
6.125
223.5
109.60
15.13
5.000
217.6
82.68
15.50
6.625
228.5
112.60
15.13
5.750
230.2
97.52
15.50
3.750
226.5
59.06
16.12
6.125
226.6
111.80
15.13
5.375
225.6
89.09
15.63
234.0
171.90
15.38
8.875
230.0
66.80
15.50
4.000
224.3
157.10
14.25
8.000
240.5
208.40
14.50
10.870
223.7
133.40
14.62
7.375
(Data from Myers, Montgomery, and Anderson-Cook,
2009.)
(a) Fit a multiple linear regression to the data.
(b) Predict hFE when x1 = 14, x2 = 220, and x3 = 5.
Properties of the Least Squares Estimators
The means and variances of the estimators b0 , b1 , . . . , bk are readily obtained under
certain assumptions on the random errors 1 , 2 , . . . , k that are identical to those
made in the case of simple linear regression. When we assume these errors to be
independent, each with mean 0 and variance σ 2 , it can be shown that b0 , b1 , . . . , bk
are, respectively, unbiased estimators of the regression coeﬃcients β0 , β1 , . . . , βk .
In addition, the variances of the b’s are obtained through the elements of the inverse
of the A matrix. Note that the oﬀ-diagonal elements of A = X X represent sums
of products of elements in the columns of X, while the diagonal elements of A
represent sums of squares of elements in the columns of X. The inverse matrix,
A−1 , apart from the multiplier σ 2 , represents the variance-covariance matrix
of the estimated regression coeﬃcients. That is, the elements of the matrix A−1 σ 2
display the variances of b0 , b1 , . . . , bk on the main diagonal and covariances on the
oﬀ-diagonal. For example, in a k = 2 multiple linear regression problem, we might
write
⎤
⎡
c00 c01 c02
(X X)−1 = ⎣c10 c11 c12 ⎦
c20 c21 c22
with the elements below the main diagonal determined through the symmetry of
the matrix. Then we can write
σb2i = cii σ 2 ,
i = 0, 1, 2,
2
σbi bj = Cov(bi , bj )= cij σ , i = j.
Of course, the estimates of the variances and hence the standard errors of these
estimators are obtained by replacing σ 2 with the appropriate estimate obtained
through experimental data. An unbiased estimate of σ 2 is once again deﬁned in