4 Building the Logistic Regression Model: Stage 2, Estimation of Model Parameters and Standard Errors
Tải bản đầy đủ
237
Logistic Regression and Generalized Linear Models
package (Landis et al., 1976) and still remains available as an option in programs such as SAS PROC CATMOD. Later, Binder (1981, 1983) presented a
second general framework for fitting logistic regression and other generalized linear models to complex sample survey data. Binder proposed pseudomaximum likelihood estimation (PMLE) as a technique for estimating the
model parameters. The PMLE approach to parameter estimation was combined with a linearized estimator of the variance–covariance matrix for the
parameter estimates, taking complex sample design features into account.
Further development and evaluation of the PMLE approach is presented in
Roberts, Rao, and Kumar (1987), Morel (1989), and Skinner, Holt, and Smith
(1989). The PMLE approach is now the standard method for logistic regression modeling in all of the major software systems that support analysis of
complex sample survey data.
In theory, the finite population regression parameters for a generalized
linear model of interest are those values that maximize a likelihood equation
for the i = 1, …, N elements in the survey population. For a binary dependent variable y (with possible values 0 or 1), the population likelihood can be
defined as
L( B| x ) =
N
∏ π(x )
i
yi
[1 − π( xi )]1− yi
(8.9)
i=1
where under the logit link, π( xi ) is evaluated using the logistic
CDF and the parameters in the specified logistic regression model:
π( xi ) = exp( xi B)/[1 + exp( xi B)] . Note that as in ChapterÂ€7, the finite population model parameters are denoted by the standard alphabetic B to distinguish them from the superpopulation model parameters denoted by β. Here,
as in linear regression, the distinction is primarily a theoretical one.
Estimates of these finite population regression parameters are obtained by
maximizing the following estimate of the population likelihood, which is a
weighted function of the observed sample data and the π( xi ) values:
PL( B|x ) =
n
∏ { π( x )
i
yi
⋅ [1 − π( xi )]1− yi } wi
i =1
(8.10)
with : π( xi ) = exp( xi B)/[1 + exp( xi B)]
Like the standard MLE procedure, this weighted pseudo-likelihood function can be maximized using the iterative Newton–Raphson method, or
related algorithms. (See Theory BoxÂ€8.2 for more detail.)
The next hurdle in analyzing logistic regression models for complex sample survey data is to estimate the sampling variances and covariances of the
© 2010 by Taylor and Francis Group, LLC
238
Applied Survey Data Analysis
Theory BoxÂ€8.2â•… Pseudo-Maximum Likelihood
Estimation for Complex Sample Survey Data
For a binary dependent variable and binomial data likelihood, the
pseudo-maximum likelihood approach to the estimation of the logistic
regression parameters and their variance–covariance matrix requires
the solution to the following vector of estimating equations:
S( B ) =
∑∑∑ w
h
α
hαi
Dh′αi [( π hαi ( B )) ⋅ (1 − π hαi ( B ))]−1 ( yhαi − π hαi ( B )) = 0
i
where:
Dhαi
δ(π hαi ( B ))
; j = 0 ,..., p
is the vector of partial derivatives,
δB j
(8.12)
In Equation 8.12, h is a stratum index, α is a cluster (or SECU) index
within stratum h, and i is an index for individual observations within
cluster α. The term whαi thus refers to the sampling weight for observation i. The term πhαi(B) refers to the probability that the outcome variable
is equal to 1 as a function of the parameter estimates and the observed
data according to the specified logistic regression model. For a logistic
regression model of a binary variable, this reduces to a system of p +
1 estimating equations (where p is the number of predictor variables,
and there is one additional parameter corresponding to the intercept
in the model):
∑∑∑
S( B )logistic =
w hαi ( yhαi − π hαi ( B )) x h′αi = 0
h
α
i
where:
(8.13)
x′hαi = a column vector of the p + 1 design matrix elements for case
i = [1x1 ,hαi …χ p ,hαi ]′
For the probit regression model, the estimating equations reduce to
S( B )probit =
∑∑∑ w
hαi
( yhαi − π hαi ( B )) ⋅ φ( xh′αi B )
xh′αi = 0
π hαi ( B ) ⋅ (1 − π hαi ( B ))
h
α
i
where:
(8.14)
φ( xh′αi B) is the standard normal probability
y density function
evaluated at xh′αi B.
The weighted parameter estimates are computed by using the
Newton–Raphson method to derive a solution for S(B) = 0 (Agresti,
© 2010 by Taylor and Francis Group, LLC
Logistic Regression and Generalized Linear Models
239
2002). Binder showed that the vector of weighted parameter estimates
based on pseudo-maximum likelihood estimation is consistent for B even
when the sample design is complex—that is, the bias of this estimator
is of order 1/n, so that as the sample size gets larger (which is often the
case with complex samples), the bias of the estimator approaches 0.
parameter estimates. Binder (1983) proposed a solution to this problem that
applied a multivariate version of Taylor series linearization (TSL). The result
is a sandwich-type variance estimator of the form
var( Bˆ ) = ( J −1 )var[S( Bˆ )]( J −1 )
(8.11)
where J is a matrix of second derivatives with respect to the Bˆ j of the pseudolog-likelihood for the data (derived by applying the natural log function to the
ˆ is the variance–covariance matrix
likelihood defined in (8.10)) and var(S( B))
for the sample totals of the weighted “score function” for the individual
observations used to fit the model. Interested readers can find a more mathematical treatment of this approach in Theory Boxes 8.2 and 8.3. Binder’s linearized variance estimator, var( Bˆ ), is the default variance estimator in most
major software packages for survey data analysis. However, most systems
also provide options to select a jackknife repeated replication (JRR) or balanced repeated replication (BRR) method to estimate the variance–covariance
matrix, var( Bˆ )rep , for the estimated model coefficients.
8.5â•‡Building the Logistic Regression Model:
Stage 3, Evaluation of the Fitted Model
The next step in building the logistic regression model is to test the contribution of individual model parameters and effects and to evaluate the overall
goodness of fit (GOF) of the model.
8.5.1â•‡ Wald Tests of Model Parameters
When fitting logistic regression models to data collected from simple random samples, the statistical significance of one or more logistic regression
parameters can be evaluated using a likelihood ratio test. Under the null
hypotheses H0: βj = 0 (single parameter) or H0: βq = 0 (with q parameters), the
following test statistic G follows a chi-square distribution with either 1 (for a
single parameter) or q degrees of freedom:
© 2010 by Taylor and Francis Group, LLC
240
Applied Survey Data Analysis
Theory BoxÂ€8.3â•… Taylor Series
Estimation of Var( Bˆ )
The computation of variance estimators for the pseudo-maximum
likelihood estimates of the finite population parameters in the logistic
regression model makes use of the J matrix of second derivatives:
δ 2 ln PL( B )
ˆ
J = −
|B = B
2
δ
B
=
∑ ∑ ∑ x′
h
α
(8.15)
ˆ hαi ( B )(1 − πˆ hαi ( B ))
hαi x hαi w hαi π
i
Due to the weighting, stratification and clustering inherent to complex
sample survey designs, J-1 is not equivalent to the variance–covariance
matrix of the pseudo-maximum likelihood parameter estimates, as is
the case in the simple random sample setting (see Section 8.4). Instead,
a sandwich-type variance estimator is used, incorporating the matrix
J and the estimated variance–covariance matrix of the weighted score
equations from Equation 8.13:
ˆ ( Bˆ ) = ( J −1 )var[S( Bˆ )]( J −1 )
Var
(8.16)
The symmetric matrix var[S( Bˆ )] is the variance–covariance matrix
for the p + 1 estimating equations in Equation 8.13. Each of these p + 1
estimating equations is a summation over strata, clusters, and elements
of the individual “scores” for the n survey respondents. Since each
estimating equation is a sample total of respondents’ scores, standard
formulae for stratified sampling of ultimate clusters (Chapter 3) can
be used to estimate the variances and covariances of the p + 1 sample
totals. In vector notation,
var[S( Bˆ )] =
n−1
n − ( p + 1)
H
∑
h=1
ah
∑
ah
( shα −sh )'( shα − sh )
( ah − 1) α =1
which for n large is:
var[S( Bˆ )] ≅
H
∑
h =1
© 2010 by Taylor and Francis Group, LLC
ah
∑
ah
( shα −sh )'( shα − sh )
( ah − 1) α=1
(8.17)