B.7 Standard Errors of Estimators, Test Statistics, and Confidence Intervals for β[sub(0)], β[sub(1)], . . . , β[sub(k)]
Tải bản đầy đủ
B.7 / Standard Errors of Estimators, Test Statistics, and Conﬁdence Intervals
739
then it can be shown (proof omitted) that the standard errors of the sampling
distributions of βˆ0 , βˆ1 , . . . , βˆk are
√
σβˆ0 = σ c00
√
σβˆ1 = σ c11
√
σβˆ2 = σ c22
..
.
√
σβˆk = σ ckk
where σ is the standard deviation of the random error ε. In other words, the diagonal
elements of (X X)−1 give the values of c00 , c11 , . . . , ckk that are required for ﬁnding
the standard errors of the estimators βˆ0 , βˆ1 , . . . , βˆk . The estimated values of the
standard errors are obtained by replacing σ by s in the formulas for the standard
√
errors. Thus, the estimated standard error of βˆ1 is sβˆ1 = s c11 .
The conﬁdence interval for a single β parameter, βi , is given in the next box.
Conﬁdence Interval for βi
βˆi ± tα/2 (Estimated standard error of βˆi )
or
√
βˆi ± (tα/2 )s cii
where tα/2 is based on the number of degrees of freedom associated with s.
Similarly, the test statistic for testing the null hypothesis H0 : βi = 0 is as shown
in the following box.
Test Statistic for H0 : βi = 0
βˆi
t= √
s cii
Example
B.8
Refer to Example B.5 and ﬁnd the estimated standard error for the sampling
distribution of βˆ1 , the estimator of the slope of the line β1 . Then give a 95%
conﬁdence interval for β1 .
Solution
The (X X)−1 matrix for the least squares solution in Example B.5 was
(X X)−1 =
1.1 −.3
−.3 .1
Therefore, c00 = 1.1, c11 = .1, and the estimated standard error for βˆ1 is
√
√
√
sβˆ1 = s c11 = .367( .1) = .192
√
The value for s, .367, was obtained from Example B.7.
A 95% conﬁdence interval for β1 is
√
βˆ1 ± (tα/2 )s c11
.7 ± (3.182)(.192) = (.09, 1.31)
740 Appendix B The Mechanics of a Multiple Regression Analysis
The t value, t.025 , is based on (n − 2) = 3 df. Observe that this is the same conﬁdence
interval as the one obtained in Section 3.6.
Example
B.9
Refer to Example B.6 and the least squares solution for ﬁtting power usage y to the
size of a home x using the model
y = β0 + β1 x + β2 x 2 + ε
The MINITAB printout for the analysis is reproduced in Figure B.2.
(a) Compute the estimated standard error for βˆ1 , and compare this result with the
value shaded in Figure B.2.
(b) Compute the value of the test statistic for testing H0 : β2 = 0. Compare this
with the value shaded in Figure B.2.
Figure B.2 MINITAB
regression printout for
power usage model
Solution
The ﬁtted model is
yˆ = −1,216.14389 + 2.39893x − .00045x 2
The (X X)−1 matrix, obtained in Example B.6, is
⎡
⎤
26.9156
−.027027
6.3554 × 10−6
(X X)−1 = ⎣ −.027027
2.75914 × 10−5 −6.5804 × 10−9 ⎦
−6
6.3554 × 10
−6.5804 × 10−9 1.5934 × 10−12
From (X X)−1 , we know that
c00 = 26.9156
c11 = 2.75914 × 10−5
c22 = 1.5934 × 10−12
and from the printout, s = 46.80.
(a) The estimated standard error of βˆ1 is
√
sβˆ1 = s c11
= (46.80) 2.75914 × 10−1 = .2458
Notice that this agrees with the value of sβˆ1 shaded in the MINITAB printout
(Figure B.2).
A Conﬁdence Interval for a Linear Function of the β Parameters; a Conﬁdence Interval for E(y)
741
(b) The value of the test statistic for testing H0 : β2 = 0 is
βˆ2
−.00045
t= √
=
= −7.62
√
s c22
(46.80) 1.5934 × 10−12
Notice that this value of the t statistic agrees with the value −7.62 shaded in
the printout (Figure B.2).
B.7 Exercises
B.19 Do the data given in Exercise B.16 provide sufﬁcient evidence to indicate that x contributes information for the prediction of y? Test H0 : β1 = 0
against Ha : β1 = 0 using α = .05.
B.20 Find a 90% conﬁdence interval for the slope of the
line in Exercise B.19.
B.21 The term in the second-order model E(y) =
β0 + β1 x + β2 x 2 that controls the curvature in
its graph is β2 x 2 . If β2 = 0, E(y) graphs as a
straight line. Do the data given in Exercise B.18
provide sufﬁcient evidence to indicate curvature
in the model for E(y)? Test H0 : β2 = 0 against
Ha : β2 = 0 using α = .10.
B.8 A Conﬁdence Interval for a Linear Function of the
β Parameters; a Conﬁdence Interval for E (y)
Suppose we were to postulate that the mean value of the productivity, y, of a
company is related to the size of the company, x, and that the relationship could be
modeled by the expression
E(y) = β0 + β1 x + β2 x 2
A graph of E(y) might appear as shown in Figure B.3.
We might have several reasons for collecting data on the productivity and size
of a set of n companies and for ﬁnding the least squares prediction equation,
yˆ = βˆ0 + βˆ1 x + βˆ2 x 2
For example, we might wish to estimate the mean productivity for a company of a
given size (say, x = 2). That is, we might wish to estimate
E(y) = β0 + β1 x + β2 x 2
= β0 + 2β1 + 4β2
where x = 2
Or we might wish to estimate the marginal increase in productivity, the slope of a
tangent to the curve, when x = 2 (see Figure B.4). The marginal productivity for y
Figure B.3 Graph of mean
productivity E(y)
742 Appendix B The Mechanics of a Multiple Regression Analysis
Figure B.4 Marginal
productivity
when x = 2 is the rate of change of E(y) with respect to x, evaluated at x = 2.∗ The
marginal productivity for a value of x, denoted by the symbol dE(y)/dx, can be
shown (proof omitted) to be
dE(y)
= β1 + 2β2 x
dx
Therefore, the marginal productivity at x = 2 is
dE(y)
= β1 + 2β2 (2) = β1 + 4β2
dx
For x = 2, both E(y) and the marginal productivity are linear functions of the
unknown parameters β0 , β1 , β2 in the model. The problem we pose in this section
is that of ﬁnding conﬁdence intervals for linear functions of β parameters or testing
hypotheses concerning their values. The information necessary to solve this problem
is rarely given in a standard multiple regression analysis computer printout, but we
can ﬁnd these conﬁdence intervals or values of the appropriate test statistics from
knowledge of (X X)−1 .
For the model
y = β0 + β1 x1 + · · · + βk xk + ε
we can make an inference about a linear function of the β parameters, say,
a0 β0 + a1 β1 + · · · + ak βk
where a0 , a1 , . . . , ak are known constants. We will use the corresponding linear
function of least squares estimates,
= a0 βˆ0 + a1 βˆ1 + · · · + ak βˆk
as our best estimate of a0 β0 + a1 β1 + · · · + ak βk .
Then, for the assumptions on the random error ε (stated in Section 4.2), the
sampling distribution for the estimator l will be normal, with mean and standard
error as given in the ﬁrst box on page 743. This indicates that l is an unbiased
estimator of
E( ) = a0 β0 + a1 β1 + · · · + ak βk
and that its sampling distribution would appear as shown in Figure B.5.
∗ If you have had calculus, you can see that the marginal productivity for y given x is the ﬁrst derivative of
E(y) = β0 + β1 x + β2 x 2 with respect to x.