Tải bản đầy đủ

13 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters

Chapter 12

Multiple Linear Regression and

Certain Nonlinear Regression

Models

12.1

Introduction

In most research problems where regression analysis is applied, more than one

independent variable is needed in the regression model. The complexity of most

scientiﬁc mechanisms is such that in order to be able to predict an important

response, a multiple regression model is needed. When this model is linear in

the coeﬃcients, it is called a multiple linear regression model. For the case of

k independent variables x1 , x2 , . . . , xk , the mean of Y |x1 , x2 , . . . , xk is given by the

multiple linear regression model

μY |x1 ,x2 ,...,xk = β0 + β1 x1 + · · · + βk xk ,

and the estimated response is obtained from the sample regression equation

yˆ = b0 + b1 x1 + · · · + bk xk ,

where each regression coeﬃcient βi is estimated by bi from the sample data using

the method of least squares. As in the case of a single independent variable, the

multiple linear regression model can often be an adequate representation of a more

complicated structure within certain ranges of the independent variables.

Similar least squares techniques can also be applied for estimating the coeﬃcients when the linear model involves, say, powers and products of the independent

variables. For example, when k = 1, the experimenter may believe that the means

μY |x do not fall on a straight line but are more appropriately described by the

polynomial regression model

μY |x = β0 + β1 x + β2 x2 + · · · + βr xr ,

and the estimated response is obtained from the polynomial regression equation

yˆ = b0 + b1 x + b2 x2 + · · · + br xr .

443

444

Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

Confusion arises occasionally when we speak of a polynomial model as a linear

model. However, statisticians normally refer to a linear model as one in which the

parameters occur linearly, regardless of how the independent variables enter the

model. An example of a nonlinear model is the exponential relationship

μY |x = αβ x ,

whose response is estimated by the regression equation

yˆ = abx .

There are many phenomena in science and engineering that are inherently nonlinear in nature, and when the true structure is known, an attempt should certainly

be made to ﬁt the actual model. The literature on estimation by least squares of

nonlinear models is voluminous. The nonlinear models discussed in this chapter

deal with nonideal conditions in which the analyst is certain that the response and

hence the response model error are not normally distributed but, rather, have a

binomial or Poisson distribution. These situations do occur extensively in practice.

A student who wants a more general account of nonlinear regression should

consult Classical and Modern Regression with Applications by Myers (1990; see

the Bibliography).

12.2

Estimating the Coeﬃcients

In this section, we obtain the least squares estimators of the parameters β0 , β1 , . . . , βk

by ﬁtting the multiple linear regression model

μY |x1 ,x2 ,...,xk = β0 + β1 x1 + · · · + βk xk

to the data points

{(x1i , x2i , . . . , xki , yi );

i = 1, 2, . . . , n and n > k},

where yi is the observed response to the values x1i , x2i , . . . , xki of the k independent

variables x1 , x2 , . . . , xk . Each observation (x1i , x2i , . . . , xki , yi ) is assumed to satisfy

the following equation.

yi = β0 + β1 x1i + β2 x2i + · · · + βk xki +

Multiple Linear

Regression Model or

i

yi = yˆi + ei = b0 + b1 x1i + b2 x2i + · · · + bk xki + ei ,

where i and ei are the random error and residual, respectively, associated with

the response yi and ﬁtted value yˆi .

As in the case of simple linear regression, it is assumed that the i are independent

and identically distributed with mean 0 and common variance σ 2 .

In using the concept of least squares to arrive at estimates b0 , b1 , . . . , bk , we

minimize the expression

n

n

(yi − b0 − b1 x1i − b2 x2i − · · · − bk xki )2 .

e2i =

SSE =

i=1

i=1

Diﬀerentiating SSE in turn with respect to b0 , b1 , . . . , bk and equating to zero, we

generate the set of k + 1 normal equations for multiple linear regression.

12.2 Estimating the Coeﬃcients

Normal Estimation

Equations for

Multiple Linear

Regression

445

n

nb0 + b1

n

x1i

+ b2

i=1

n

n

b0

i=1

i=1

x1i xki =

i=1

x1i yi

i=1

..

.

n

..

.

n

xki x2i + · · · + bk

xki x1i + b2

yi

i=1

n

i=1

..

.

n

xki + b1

=

i=1

n

i=1

..

.

n

b0

xki

x1i x2i + · · · + bk

+ b2

i=1

..

.

+ · · · + bk

i=1

n

x21i

x1i + b1

n

n

x2i

i=1

n

x2ki

i=1

=

xki yi

i=1

These equations can be solved for b0 , b1 , b2 , . . . , bk by any appropriate method for

solving systems of linear equations. Most statistical software can be used to obtain

numerical solutions of the above equations.

Example 12.1: A study was done on a diesel-powered light-duty pickup truck to see if humidity, air

temperature, and barometric pressure inﬂuence emission of nitrous oxide (in ppm).

Emission measurements were taken at diﬀerent times, with varying experimental

conditions. The data are given in Table 12.2. The model is

μY |x1 ,x2 ,x3 = β0 + β1 x1 + β2 x2 + β3 x3 ,

or, equivalently,

yi = β0 + β1 x1i + β2 x2i + β3 x3i + i ,

i = 1, 2, . . . , 20.

Fit this multiple linear regression model to the given data and then estimate the

amount of nitrous oxide emitted for the conditions where humidity is 50%, temperature is 76◦ F, and barometric pressure is 29.30.

Table 12.1: Data for Example 12.1

Nitrous

Oxide, y

0.90

0.91

0.96

0.89

1.00

1.10

1.15

1.03

0.77

1.07

Humidity,

x1

72.4

41.6

34.3

35.1

10.7

12.9

8.3

20.1

72.2

24.0

Temp.,

x2

76.3

70.3

77.1

68.0

79.0

67.4

66.8

76.9

77.7

67.7

Pressure,

x3

29.18

29.35

29.24

29.27

29.78

29.39

29.69

29.48

29.09

29.60

Nitrous

Oxide, y

1.07

0.94

1.10

1.10

1.10

0.91

0.87

0.78

0.82

0.95

Humidity,

x1

23.2

47.4

31.5

10.6

11.2

73.3

75.4

96.6

107.4

54.9

Temp.,

x2

76.8

86.6

76.9

86.3

86.0

76.3

77.9

78.7

86.8

70.9

Pressure,

x3

29.38

29.35

29.63

29.56

29.48

29.40

29.28

29.29

29.03

29.37

Source: Charles T. Hare, “Light-Duty Diesel Emission Correction Factors for Ambient Conditions,” EPA-600/2-77116. U.S. Environmental Protection Agency.

Solution : The solution of the set of estimating equations yields the unique estimates

b0 = −3.507778, b1 = −0.002625, b2 = 0.000799, b3 = 0.154155.

446

Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

Therefore, the regression equation is

yˆ = −3.507778 − 0.002625 x1 + 0.000799 x2 + 0.154155 x3 .

For 50% humidity, a temperature of 76◦ F, and a barometric pressure of 29.30, the

estimated amount of nitrous oxide emitted is

yˆ = −3.507778 − 0.002625(50.0) + 0.000799(76.0) + 0.1541553(29.30)

= 0.9384 ppm.

Polynomial Regression

Now suppose that we wish to ﬁt the polynomial equation

μY |x = β0 + β1 x + β2 x2 + · · · + βr xr

to the n pairs of observations {(xi , yi ); i = 1, 2, . . . , n}. Each observation, yi ,

satisﬁes the equation

yi = β0 + β1 xi + β2 x2i + · · · + βr xri +

i

or

yi = yˆi + ei = b0 + b1 xi + b2 x2i + · · · + br xri + ei ,

where r is the degree of the polynomial and i and ei are again the random error

and residual associated with the response yi and ﬁtted value yˆi , respectively. Here,

the number of pairs, n, must be at least as large as r + 1, the number of parameters

to be estimated.

Notice that the polynomial model can be considered a special case of the more

general multiple linear regression model, where we set x1 = x, x2 = x2 , . . . , xr = xr .

The normal equations assume the same form as those given on page 445. They are

then solved for b0 , b1 , b2 , . . . , br .

Example 12.2: Given the data

0

1

2

3

4

5

6

7

8

9

x

y 9.1 7.3 3.2 4.6 4.8 2.9 5.7 7.1 8.8 10.2

ﬁt a regression curve of the form μY |x = β0 + β1 x + β2 x2 and then estimate μY |2 .

Solution : From the data given, we ﬁnd that

10b0 + 45 b1 + 285 b2 = 63.7,

45b0 + 285b1 + 2025 b2 = 307.3,

285b0 + 2025 b1 + 15,333b2 = 2153.3.

Solving these normal equations, we obtain

b0 = 8.698, b1 = −2.341, b2 = 0.288.

Therefore,

yˆ = 8.698 − 2.341x + 0.288x2 .

12.3 Linear Regression Model Using Matrices

447

When x = 2, our estimate of μY |2 is

yˆ = 8.698 − (2.341)(2) + (0.288)(22 ) = 5.168.

Example 12.3: The data in Table 12.2 represent the percent of impurities that resulted for various

temperatures and sterilizing times during a reaction associated with the manufacturing of a certain beverage. Estimate the regression coeﬃcients in the polynomial

model

yi = β0 + β1 x1i + β2 x2i + β11 x21i + β22 x22i + β12 x1i x2i + i ,

for i = 1, 2, . . . , 18.

Table 12.2: Data for Example 12.3

Sterilizing

Time, x2 (min)

15

20

25

Temperature, x1 (◦ C)

75

100

125

7.55

10.55

14.05

6.59

9.48

14.93

9.23

13.63

16.56

8.78

11.75

15.85

15.93

18.55

22.41

16.44

17.98

21.66

Solution : Using the normal equations, we obtain

b0 = 56.4411,

b11 = 0.00081,

b1 = −0.36190,

b22 = 0.08173,

b2 = −2.75299,

b12 = 0.00314,

and our estimated regression equation is

yˆ = 56.4411 − 0.36190x1 − 2.75299x2 + 0.00081x21 + 0.08173x22 + 0.00314x1 x2 .

Many of the principles and procedures associated with the estimation of polynomial regression functions fall into the category of response surface methodology, a collection of techniques that have been used quite successfully by scientists

and engineers in many ﬁelds. The x2i are called pure quadratic terms, and the

xi xj (i = j) are called interaction terms. Such problems as selecting a proper

experimental design, particularly in cases where a large number of variables are

in the model, and choosing optimum operating conditions for x1 , x2 , . . . , xk are

often approached through the use of these methods. For an extensive exposure,

the reader is referred to Response Surface Methodology: Process and Product Optimization Using Designed Experiments by Myers, Montgomery, and Anderson-Cook

(2009; see the Bibliography).

12.3

Linear Regression Model Using Matrices

In ﬁtting a multiple linear regression model, particularly when the number of variables exceeds two, a knowledge of matrix theory can facilitate the mathematical

manipulations considerably. Suppose that the experimenter has k independent

448

Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

variables x1 , x2 , . . . , xk and n observations y1 , y2 , . . . , yn , each of which can be expressed by the equation

yi = β0 + β1 x1i + β2 x2i + · · · + βk xki + i .

This model essentially represents n equations describing how the response values

are generated in the scientiﬁc process. Using matrix notation, we can write the

following equation:

General Linear

Model where

y = Xβ + ,

⎡

⎤

y1

⎢ y2 ⎥

⎢ ⎥

y = ⎢ . ⎥,

⎣ .. ⎦

yn

⎡

⎤

xk1

xk2 ⎥

⎥

.. ⎥ ,

. ⎦

1

⎢1

⎢

X = ⎢.

⎣ ..

x11

x12

..

.

x21

x22

..

.

···

···

1

x1n

x2n

· · · xkn

⎡

⎤

β0

⎢ β1 ⎥

⎢ ⎥

β = ⎢ . ⎥,

⎣ .. ⎦

βk

⎡ ⎤

⎢

⎢

=⎢

⎣

1

⎥

2⎥

.. ⎥ .

.⎦

n

Then the least squares method for estimation of β, illustrated in Section 12.2,

involves ﬁnding b for which

SSE = (y − Xb) (y − Xb)

is minimized. This minimization process involves solving for b in the equation

∂

(SSE) = 0.

∂b

We will not present the details regarding solution of the equations above. The

result reduces to the solution of b in

(X X)b = X y.

Notice the nature of the X matrix. Apart from the initial element, the ith row

represents the x-values that give rise to the response yi . Writing

⎤

⎡

n

n

n

n

x1i

x2i

···

xki

⎥

⎢

i=1

i=1

i=1

⎥

⎢ n

n

n

n

⎢

2

x1i

x1i

x1i x2i · · ·

x1i xki ⎥

⎥

⎢

⎥

i=1

i=1

i=1

A=XX=⎢

⎥

⎢ i=1.

..

..

..

⎥

⎢ .

.

.

.

⎥

⎢ .

⎦

⎣ n

n

n

n

2

xki

xki x1i

xki x2i · · ·

xki

i=1

and

i=1

i=1

⎡

i=1

⎤

n

g0 =

yi

⎢

i=1

⎢

⎢ g = n x y

⎢ 1

1i i

i=1

g=Xy=⎢

⎢

..

⎢

.

⎢

⎣

n

xki yi

gk =

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

i=1

allows the normal equations to be put in the matrix form

Ab = g.

12.3 Linear Regression Model Using Matrices

449

If the matrix A is nonsingular, we can write the solution for the regression

coeﬃcients as

b = A−1 g = (X X)

−1

X y.

Thus, we can obtain the prediction equation or regression equation by solving a

set of k + 1 equations in a like number of unknowns. This involves the inversion of

the k + 1 by k + 1 matrix X X. Techniques for inverting this matrix are explained

in most textbooks on elementary determinants and matrices. Of course, there are

many high-speed computer packages available for multiple regression problems,

packages that not only print out estimates of the regression coeﬃcients but also

provide other information relevant to making inferences concerning the regression

equation.

Example 12.4: The percent survival rate of sperm in a certain type of animal semen, after storage,

was measured at various combinations of concentrations of three materials used to

increase chance of survival. The data are given in Table 12.3. Estimate the multiple

linear regression model for the given data.

Table 12.3: Data for Example 12.4

y (% survival)

25.5

31.2

25.9

38.4

18.4

26.7

26.4

25.9

32.0

25.2

39.7

35.7

26.5

x1 (weight %)

1.74

6.32

6.22

10.52

1.19

1.22

4.10

6.32

4.08

4.15

10.15

1.72

1.70

x2 (weight %)

5.30

5.42

8.41

4.63

11.60

5.85

6.62

8.72

4.42

7.60

4.83

3.12

5.30

Solution : The least squares estimating equations, (X X)b = X y, are

⎤⎡ ⎤ ⎡

⎡

b0

13.0

59.43

81.82

115.40

⎢ 59.43 394.7255 360.6621 522.0780 ⎥ ⎢b1 ⎥ ⎢

⎥⎢ ⎥ ⎢

⎢

⎣ 81.82 360.6621 576.7264 728.3100 ⎦ ⎣b2 ⎦ = ⎣

115.40 522.0780 728.3100 1035.9600

b3

x3 (weight %)

10.80

9.40

7.20

8.50

9.40

9.90

8.00

9.10

8.70

9.20

9.40

7.60

8.20

⎤

377.5

1877.567 ⎥

⎥.

2246.661 ⎦

3337.780

From a computer readout we obtain the elements of the inverse matrix

⎤

⎡

8.0648 −0.0826 −0.0942 −0.7905

⎢ −0.0826

0.0085

0.0017

0.0037 ⎥

⎥,

(X X)−1 = ⎢

⎣ −0.0942

0.0017

0.0166 −0.0021 ⎦

−0.7905

0.0037 −0.0021

0.0886

−1

and then, using the relation b = (X X) X y, the estimated regression coeﬃcients

are obtained as

/

450

/

Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

b0 = 39.1574, b1 = 1.0161, b2 = −1.8616, b3 = −0.3433.

Hence, our estimated regression equation is

yˆ = 39.1574 + 1.0161x1 − 1.8616x2 − 0.3433x3 .

Exercises

12.1 A set of experimental runs was made to determine a way of predicting cooking time y at various

values of oven width x1 and ﬂue temperature x2 . The

coded data were recorded as follows:

y

x1

x2

1.15

1.32

6.40

3.40

2.69

15.05

4.10

3.56

18.75

8.75

4.41

30.25

14.82

5.35

44.85

15.15

6.20

48.94

15.32

7.12

51.55

18.18

8.87

61.50

35.19

9.80

100.44

40.40

10.65

111.42

Estimate the multiple linear regression equation

μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .

12.2 In Applied Spectroscopy, the infrared reﬂectance

spectra properties of a viscous liquid used in the electronics industry as a lubricant were studied. The designed experiment consisted of the eﬀect of band frequency x1 and ﬁlm thickness x2 on optical density y

using a Perkin-Elmer Model 621 infrared spectrometer.

(Source: Pacansky, J., England, C. D., and Wattman,

R., 1986.)

y

x1

x2

1.10

740

0.231

0.62

740

0.107

0.31

740

0.053

1.10

805

0.129

0.62

805

0.069

0.31

805

0.030

1.10

980

1.005

0.62

980

0.559

0.31

980

0.321

1.10

1235

2.948

0.62

1235

1.633

0.31

1235

0.934

Estimate the multiple linear regression equation

yˆ = b0 + b1 x1 + b2 x2 .

12.3 Suppose in Review Exercise 11.53 on page 437

that we were also given the number of class periods

missed by the 12 students taking the chemistry course.

The complete data are shown.

Chemistry

Test

Classes

Student Grade, y Score, x1 Missed, x2

1

65

85

1

7

50

74

2

5

55

76

3

2

65

90

4

6

55

85

5

3

70

87

6

2

65

94

7

5

70

98

8

4

55

81

9

3

70

91

10

1

50

76

11

4

55

74

12

(a) Fit a multiple linear regression equation of the form

yˆ = b0 + b1 x1 + b2 x2 .

(b) Estimate the chemistry grade for a student who has

an intelligence test score of 60 and missed 4 classes.

12.4 An experiment was conducted to determine if

the weight of an animal can be predicted after a given

period of time on the basis of the initial weight of the

animal and the amount of feed that was eaten. The

following data, measured in kilograms, were recorded:

Final

Initial

Feed

Weight, y Weight, x1 Weight, x2

272

42

95

226

33

77

259

33

80

292

45

100

311

39

97

183

36

70

173

32

50

236

41

80

230

40

92

235

38

84

(a) Fit a multiple regression equation of the form

μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .

(b) Predict the ﬁnal weight of an animal having an initial weight of 35 kilograms that is given 250 kilograms of feed.

12.5 The electric power consumed each month by a

chemical plant is thought to be related to the average

ambient temperature x1 , the number of days in the

month x2 , the average product purity x3 , and the tons

of product produced x4 . The past year’s historical data

are available and are presented in the following table.

/

/

Exercises

451

y

x1

x2

x3

x4

100

91

24

25

240

95

90

21

31

236

110

88

24

45

290

88

87

25

60

274

94

91

25

65

301

99

94

26

72

316

97

87

25

80

300

96

86

25

84

296

110

88

24

75

267

105

91

25

60

276

100

90

25

50

288

98

89

23

38

261

(a) Fit a multiple linear regression model using the

above data set.

(b) Predict power consumption for a month in which

x1 = 75◦ F, x2 = 24 days, x3 = 90%, and x4 = 98

tons.

12.6 An experiment was conducted on a new model

of a particular make of automobile to determine the

stopping distance at various speeds. The following data

were recorded.

Speed, v (km/hr)

35 50 65 80 95 110

Stopping Distance, d (m) 16 26 41 62 88 119

(a) Fit a multiple regression curve of the form μD|v =

β0 + β1 v + β2 v 2 .

(b) Estimate the stopping distance when the car is

traveling at 70 kilometers per hour.

12.8 The following is a set of coded experimental data

on the compressive strength of a particular alloy at various values of the concentration of some additive:

Concentration,

Compressive

x

Strength, y

10.0

25.2

27.3

28.7

15.0

29.8

31.1

27.8

20.0

31.2

32.6

29.7

25.0

31.7

30.1

32.3

30.0

29.4

30.8

32.8

(a) Estimate the quadratic regression equation μY |x =

β0 + β1 x + β2 x2 .

(b) Test for lack of ﬁt of the model.

12.7 An experiment was conducted in order to determine if cerebral blood ﬂow in human beings can be

predicted from arterial oxygen tension (millimeters of

mercury). Fifteen patients participated in the study,

and the following data were collected:

12.11 An experiment was conducted to study the size

of squid eaten by sharks and tuna. The regressor variables are characteristics of the beaks of the squid. The

data are given as follows:

x1

x2

x3

x4

x5

y

1.95

1.31 1.07 0.44 0.75 0.35

2.90

1.55 1.49 0.53 0.90 0.47

0.72

0.99 0.84 0.34 0.57 0.32

0.81

0.99 0.83 0.34 0.54 0.27

1.09

1.01 0.90 0.36 0.64 0.30

1.22

1.09 0.93 0.42 0.61 0.31

1.02

1.08 0.90 0.40 0.51 0.31

1.93

1.27 1.08 0.44 0.77 0.34

0.64

0.99 0.85 0.36 0.56 0.29

2.08

1.34 1.13 0.45 0.77 0.37

1.98

1.30 1.10 0.45 0.76 0.38

1.90

1.33 1.10 0.48 0.77 0.38

8.56

1.86 1.47 0.60 1.01 0.65

1.58 1.34 0.52 0.95 0.50

4.49

8.49

1.97 1.59 0.67 1.20 0.59

6.17

1.80 1.56 0.66 1.02 0.59

7.54

1.75 1.58 0.63 1.09 0.59

6.36

1.72 1.43 0.64 1.02 0.63

7.63

1.68 1.57 0.72 0.96 0.68

7.78

1.75 1.59 0.68 1.08 0.62

2.19 1.86 0.75 1.24 0.72 10.15

6.88

1.73 1.67 0.64 1.14 0.55

Blood Flow, Arterial Oxygen

y

Tension, x

84.33

603.40

87.80

582.50

82.20

556.20

78.21

594.60

78.44

558.90

80.01

575.20

83.53

580.10

79.46

451.20

75.22

404.00

76.58

484.00

77.90

452.40

78.80

448.40

80.67

334.80

86.60

320.30

78.20

350.30

Estimate the quadratic regression equation

μY |x = β0 + β1 x + β2 x2 .

12.9 (a) Fit a multiple regression equation of the

form μY |x = β0 + β1 x1 + β2 x2 to the data of Example 11.8 on page 420.

(b) Estimate the yield of the chemical reaction for a

temperature of 225◦ C.

12.10 The following data are given:

x

y

0

1

1

4

2

5

3

3

4

2

5

3

6

4

(a) Fit the cubic model μY |x = β0 + β1 x + β2 x2 + β3 x3 .

(b) Predict Y when x = 2.

/

452

/

Chapter 12 Multiple Linear Regression and Certain Nonlinear Regression Models

In the study, the regressor variables and response considered are

x1 = rostral length, in inches,

x2 = wing length, in inches,

x3 = rostral to notch length, in inches,

x4 = notch to wing length, in inches,

x5 = width, in inches,

y = weight, in pounds.

Estimate the multiple linear regression equation

μY |x1 ,x2 ,x3 ,x4 ,x5

= β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 .

12.12 The following data reﬂect information from 17

U.S. Naval hospitals at various sites around the world.

The regressors are workload variables, that is, items

that result in the need for personnel in a hospital. A

brief description of the variables is as follows:

y = monthly labor-hours,

x1 = average daily patient load,

x2 = monthly X-ray exposures,

x3 = monthly occupied bed-days,

x4 = eligible population in the area/1000,

x5 = average length of patient’s stay, in days.

x2

x3

x4

x5

y

Site x1

472.92 18.0 4.45

2463

566.52

1 15.57

1339.75

2048

9.5 6.92

696.82

2 44.02

3940

1033.15

620.25 12.8 4.28

3 20.42

6505

1003.62

568.33 36.7 3.90

4 18.74

5723

1611.37

1497.60 35.7 5.50

5 49.20

1613.27

1365.83 24.0 4.60

6 44.92 11,520

5779

1854.17

1687.00 43.3 5.62

7 55.48

5969

2160.55

1639.92 46.7 5.15

8 59.28

8461

2305.58

2872.33 78.7 6.18

9 94.39

3503.93

3655.08 180.5 6.15

10 128.02 20,106

3571.59

2912.00 60.9 5.88

11 96.00 13,313

3741.40

3921.00 103.7 4.88

12 131.42 10,771

4026.52

3865.67 126.8 5.50

13 127.21 15,543

7684.10 157.7 7.00 10,343.81

14 252.90 36,194

15 409.20 34,703 12,446.33 169.4 10.75 11,732.17

16 463.70 39,204 14,098.40 331.4 7.05 15,414.94

17 510.22 86,533 15,524.00 371.6 6.35 18,854.45

The goal here is to produce an empirical equation that

will estimate (or predict) personnel needs for Naval

hospitals. Estimate the multiple linear regression equation

μY |x1 ,x2 ,x3 ,x4 ,x5

= β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x5 .

12.13 A study was performed on a type of bearing to ﬁnd the relationship of amount of wear y to

x1 = oil viscosity and x2 = load. The following data

were obtained. (From Response Surface Methodology,

Myers, Montgomery, and Anderson-Cook, 2009.)

y

x1

x2

y

x1

x2

193

1.6

851

230

15.5

816

172

22.0

1058

91

43.0

1201

113

33.0

1357

125

40.0

1115

(a) Estimate the unknown parameters of the multiple

linear regression equation

μY |x1 ,x2 = β0 + β1 x1 + β2 x2 .

(b) Predict wear when oil viscosity is 20 and load is

1200.

12.14 Eleven student teachers took part in an evaluation program designed to measure teacher eﬀectiveness and determine what factors are important. The

response measure was a quantitative evaluation of the

teacher. The regressor variables were scores on four

standardized tests given to each teacher. The data are

as follows:

y

x1

x2

x3

x4

55.66

59.00

125

69

410

63.97

31.75

131

57

569

45.32

80.50

141

77

425

46.67

75.00

122

81

344

41.21

49.00

141

0

324

43.83

49.35

152

53

505

41.61

60.75

141

77

235

64.57

41.25

132

76

501

42.41

50.75

157

65

400

57.95

32.25

166

97

584

57.90

54.50

141

76

434

Estimate the multiple linear regression equation

μY |x1 ,x2 ,x3 ,x4 = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 .

12.15 The personnel department of a certain industrial ﬁrm used 12 subjects in a study to determine the

relationship between job performance rating (y) and

scores on four tests. The data are as follows:

x2

x3

x4

y

x1

11.2 56.5 71.0 38.5 43.0

14.5 59.5 72.5 38.2 44.8

17.2 69.2 76.0 42.5 49.0

17.8 74.5 79.5 43.4 56.3

19.3 81.2 84.0 47.5 60.2

24.5 88.0 86.2 47.4 62.0

21.2 78.2 80.5 44.5 58.1

16.9 69.0 72.0 41.8 48.1

14.8 58.1 68.0 42.1 46.0

20.0 80.5 85.0 48.1 60.3

13.2 58.3 71.0 37.5 47.1

22.5 84.0 87.2 51.0 65.2

12.4 Properties of the Least Squares Estimators

Estimate the regression coeﬃcients in the model

yˆ = b0 + b1 x1 + b2 x2 + b3 x3 + b4 x4 .

12.16 An engineer at a semiconductor company

wants to model the relationship between the gain or

hFE of a device (y) and three parameters: emitter-RS

(x1 ), base-RS (x2 ), and emitter-to-base-RS (x3 ). The

data are shown below:

x1 ,

x2 ,

x3 ,

y,

Emitter-RS Base-RS E-B-RS

hFE

128.40

7.000

226.0

14.62

52.62

3.375

220.0

15.63

113.90

6.375

217.4

14.62

98.01

6.000

220.0

15.00

139.90

7.625

226.5

14.50

102.60

6.000

224.1

15.25

(cont.)

12.4

453

x1 ,

x2 ,

x3 ,

y,

Emitter-RS Base-RS E-B-RS

hFE

3.375

220.5

48.14

16.12

6.125

223.5

109.60

15.13

5.000

217.6

82.68

15.50

6.625

228.5

112.60

15.13

5.750

230.2

97.52

15.50

3.750

226.5

59.06

16.12

6.125

226.6

111.80

15.13

5.375

225.6

89.09

15.63

234.0

171.90

15.38

8.875

230.0

66.80

15.50

4.000

224.3

157.10

14.25

8.000

240.5

208.40

14.50

10.870

223.7

133.40

14.62

7.375

(Data from Myers, Montgomery, and Anderson-Cook,

2009.)

(a) Fit a multiple linear regression to the data.

(b) Predict hFE when x1 = 14, x2 = 220, and x3 = 5.

Properties of the Least Squares Estimators

The means and variances of the estimators b0 , b1 , . . . , bk are readily obtained under

certain assumptions on the random errors 1 , 2 , . . . , k that are identical to those

made in the case of simple linear regression. When we assume these errors to be

independent, each with mean 0 and variance σ 2 , it can be shown that b0 , b1 , . . . , bk

are, respectively, unbiased estimators of the regression coeﬃcients β0 , β1 , . . . , βk .

In addition, the variances of the b’s are obtained through the elements of the inverse

of the A matrix. Note that the oﬀ-diagonal elements of A = X X represent sums

of products of elements in the columns of X, while the diagonal elements of A

represent sums of squares of elements in the columns of X. The inverse matrix,

A−1 , apart from the multiplier σ 2 , represents the variance-covariance matrix

of the estimated regression coeﬃcients. That is, the elements of the matrix A−1 σ 2

display the variances of b0 , b1 , . . . , bk on the main diagonal and covariances on the

oﬀ-diagonal. For example, in a k = 2 multiple linear regression problem, we might

write

⎤

⎡

c00 c01 c02

(X X)−1 = ⎣c10 c11 c12 ⎦

c20 c21 c22

with the elements below the main diagonal determined through the symmetry of

the matrix. Then we can write

σb2i = cii σ 2 ,

i = 0, 1, 2,

2

σbi bj = Cov(bi , bj )= cij σ , i = j.

Of course, the estimates of the variances and hence the standard errors of these

estimators are obtained by replacing σ 2 with the appropriate estimate obtained

through experimental data. An unbiased estimate of σ 2 is once again deﬁned in

## Probability statistics for engineers and scientists 9th by walpole myers

## 1 Overview: Statistical Inference, Samples, Populations, and the Role of Probability

## 2 Sampling Procedures; Collection of Data

## 3 Measures of Location: The Sample Mean and Median

## 6 Statistical Modeling, Scientific Inspection, and Graphical Diagnostics

## 7 General Types of Statistical Studies: Designed Experiment, Observational Study, and Retrospective Study

## 6 Conditional Probability, Independence, and the Product Rule

## 8 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters

## 5 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters

## 6 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters

## 11 Potential Misconceptions and Hazards; Relationship to Material in Other Chapters

Tài liệu liên quan