Tải bản đầy đủ
5 Building the Logistic Regression Model: Stage 3, Evaluation of the Fitted Model

5 Building the Logistic Regression Model: Stage 3, Evaluation of the Fitted Model

Tải bản đầy đủ


Applied Survey Data Analysis

Theory Box€8.3╅ Taylor Series
Estimation of Var( Bˆ )
The computation of variance estimators for the pseudo-maximum
likelihood estimates of the finite population parameters in the logistic
regression model makes use of the J matrix of second derivatives:

 δ 2 ln PL( B ) 
J = −
|B = B


∑ ∑ ∑ x′



ˆ hαi ( B )(1 − πˆ hαi ( B ))
hαi x hαi w hαi π


Due to the weighting, stratification and clustering inherent to complex
sample survey designs, J-1 is not equivalent to the variance–covariance
matrix of the pseudo-maximum likelihood parameter estimates, as is
the case in the simple random sample setting (see Section 8.4). Instead,
a sandwich-type variance estimator is used, incorporating the matrix
J and the estimated variance–covariance matrix of the weighted score
equations from Equation 8.13:
ˆ ( Bˆ ) = ( J −1 )var[S( Bˆ )]( J −1 )


The symmetric matrix var[S( Bˆ )] is the variance–covariance matrix
for the p + 1 estimating equations in Equation 8.13. Each of these p + 1
estimating equations is a summation over strata, clusters, and elements
of the individual “scores” for the n survey respondents. Since each
estimating equation is a sample total of respondents’ scores, standard
formulae for stratified sampling of ultimate clusters (Chapter 3) can
be used to estimate the variances and covariances of the p + 1 sample
totals. In vector notation,
var[S( Bˆ )] =

n − ( p + 1)




( shα −sh )'( shα − sh )
( ah − 1) α =1

which for n large is:

var[S( Bˆ )] ≅


h =1

© 2010 by Taylor and Francis Group, LLC


( shα −sh )'( shα − sh )
( ah − 1) α=1


Logistic Regression and Generalized Linear Models


where for the logistic link:
shα =

shαi =

i =1

sh =




( y hαi − πˆ hαi ( B )) xh′αi ; and

i =1



α =1

The estimator of var[s(Bˆ )] for the probit or CLL link is obtained by
substituting the appropriate expressions for the individual score functions in the calculation of the s hα. For more details, interested readers
should refer to Binder (1983).

 L( βˆ MLE )reduced 
G = −2 ln 

 L( βˆ MLE ) full 



L( βˆ MLE ) = the likelihood under the model evaluated at
the maximum lik
kelihood estimates of β .

The reduced model in this case is the model excluding the q regression
parameters to be tested, while the full model is the model including the q
regression parameters. Both models should be fitted using exactly the same
set of observations for this type of test to be valid.
As described in Section 7.3.4 for the linear regression model, complex sample designs invalidate the key assumptions that underlie the F-tests or likelihood ratio tests used to compare alternative models. Instead, Wald-type tests
are used to test hypotheses concerning the parameters of a specified logistic
regression model. The default output from software procedures enabling
analysts to fit logistic regression models to complex sample survey data
provides a table of the estimated model coefficients, the estimated standard
errors, and the test statistic and a “p-value” for the simple hypothesis test,
H0 : Bj = 0. (See, for example, Table€8.1.) Different software applications may
report the test statistic as a Student t-statistic, t = Bˆ j / se(Bˆ j ) , or as a Wald X2. If
the former of the two tests is output, the test statistic is referred to a Student
t distribution with nominal design-based degrees of freedom (df = #clusters
– #strata) to determine and report the p-value. If the output is in the form of

© 2010 by Taylor and Francis Group, LLC


Applied Survey Data Analysis

Estimated Logistic Regression Model for Arthritis


se( Bˆ )


P(t56 > t)


<12 yrs
12 yrs




< 0.01
< 0.01
< 0.01
< 0.01
< 0.01

AGE (years)

Source: Analysis based on the 2006 HRS data.
Note: n = 18,374.
a Reference categories for categorical predictors are GENDER (female);
ED3CAT(>12 yrs).

the Wald X2, the reference distribution for determining the p-value is χ12 , or
a central chi-square distribution with one degree of freedom. The two tests
are functionally equivalent. In fact, the absolute value of the Student t test
statistic is simply the square root of the Wald X2 test statistic for this single
parameter hypothesis test.
More generally, logistic regression software programs for complex sample
survey data provide convenient syntax to specify Wald tests for a variety of
hypotheses concerning the full vector of regression parameters. The general
form of the null hypothesis for a Wald test is defined by H0: CB = 0, where C
is a matrix of constants that defines the hypothesis being tested (see Section for examples). The Wald test statistic is computed as

= (CBˆ )′ [C ( var( Bˆ ))C ′]−1 (CBˆ ) ,


where var( Bˆ ) is a design-consistent estimate of the variance–covariance
matrix of the estimated logistic regression coefficients (see Equation 8.11 for
an example based on Taylor series linearization). Under the null hypothesis,
this Wald test statistic follows a chi-square distribution with q degrees of
freedom, where q is the rank, or number of independent rows, of the matrix
C. This test statistic can also be converted into an approximate F-statistic by
dividing the Wald X2 test statistic by the degrees of freedom.
In Stata, the test postestimation command is used to specify a multiparameter hypothesis test after fitting a model. Section 7.5 has already illustrated the application of the test command for constructing hypothesis
tests for parameters in the linear regression model. The syntax is identical for
constructing hypothesis tests for the parameters of the logistic or other generalized linear models estimated in Stata. For example, after fitting a logistic
regression model that includes two indicator variables to represent the effect

© 2010 by Taylor and Francis Group, LLC

Logistic Regression and Generalized Linear Models


of three National Health and Nutrition Examination Survey (NHANES)
race/ethnicity categories (1 = White, 2 = Black, 3 = Other), the command
test _Irace_2 _Irace_3

tests the combined effects of race, i.e. H0: {BBlack = 0, BOther = 0}, while the
test _Irace_2 -_Irace_3

tests the hypothesis H0: {BBlack - BOther = 0} or equivalently H0: {BBlack = BOther}.
It is important to emphasize here again that survey analysts must be
cautious about interpreting single parameter hypothesis tests when the
estimated coefficient applies to an indicator variable for a multicategory
predictor (e.g., levels of education) or when the model also includes an
interaction term that incorporates the predictor of interest. In the former
case, a significant result indicates that the category expectation (for the outcome) is significantly different from that of the reference category, but not
necessarily other levels of the multicategory predictor. Tests of parameters
for main effects in a model with interactions involving that same variable
are confounded and not easily interpreted (see Section 8.6 for an example
of interpretation of interaction terms).
8.5.2╇ Goodness of Fit and Logistic Regression Diagnostics
Summary statistics to measure overall goodness of fit and methods for diagnosing the fit of a logistic regression model for individual cases or specific
patterns of covariates have been developed for simple random samples of
data. These goodness-of-fit statistics and diagnostic tools have been included
in the standard logistic regressions programs in most major statistical software packages. Hosmer and Lemeshow (2000, Chapter 5) review these summary statistics and diagnostic methods in detail.
Included in the summary techniques are (1) two test statistics based on
the Pearson and deviance residuals for the fitted model; (2) the Hosmer–
Lemeshow goodness-of-fit test; (3) classification tables comparing observed
values of y with discrete classifications formed from the model’s predicted
values, πˆ ( x ) ; and (4) the area under the receiver operating characteristic
(ROC) curve. Several pseudo-R 2 measures have also been proposed as summary measures of the fit of a logistic regression model. However, since these
measures tend to be incorrectly confused with the R2 values (explained variability) in linear regression, we agree with Hosmer and Lemeshow (2000)
that while they may be used by the analyst to compare the fits of alternative
models they should not be cited as a measure of fit in scientific papers or

© 2010 by Taylor and Francis Group, LLC


Applied Survey Data Analysis

Archer and Lemeshow (A-L; 2006) and Archer, Lemeshow, and Hosmer
(2007) have extended the standard Hosmer and Lemeshow goodness-of-fit
test for application to complex sample survey data. The A-L procedure is a
modification of the standard Hosmer–Lemeshow test for goodness of fit that
takes the sampling weights and the stratification and clustering features of
the complex sample design into account when assessing the residuals (yi πˆ ( xi ) ) based on the fitted model. Archer and Lemeshow’s paper should be
consulted for more details, but Stata users can download the .ado file implementing the svylogitgof postestimation command by submitting this
command: findit svylogitgof.
At the time of this writing, survey analysis software is still in a phase
where summary measures of goodness of fit are being translated or newly
developed for application to logistic regression modeling of complex sample
survey data. Furthermore, it may be some time before the major software
systems routinely incorporate robust goodness-of-fit evaluation procedures
in the software procedures for complex sample survey data analysis. In lieu
of simply bypassing the goodness-of-fit evaluation entirely, we recommend
the following:

1.Applying the Archer and Lemeshow goodness-of-fit test and other
available summary goodness-of-fit measures when they are available in the software system.

2.If the logistic regression program for complex sample survey data in
the chosen software system does not provide capabilities to generate
summary goodness-of-fit measures, reestimate the model using the
sampling weights in the system’s standard logistic regression program. The weighted estimates of parameters and predicted probabilities will be identical and serious lack of fit should be quantifiable
even though the standard program tools do not correctly reflect the
variances and covariances of the parameter estimates given the complex sample design.

Summary measures such as the Archer and Lemeshow test statistic have
the advantage that they yield a single test of the overall suitability of the
fitted model. But even when a summary goodness-of-fit measure suggests
that the model provides an acceptable fit to the data, a thorough evaluation of the fit of model may also include examination of the fit for specific
patterns of covariates. Does the model fit well for some patterns of covariates xj but not for others? The number and statistical complexity of diagnostic tools that have been suggested in the literature preclude a detailed
discussion here. We encourage survey analysts to consult Hosmer and
Lemeshow (2000) for a description of the diagnostic options and guidance
on how these computational and graphic methods may be applied using
Stata, SAS, and other software systems. Regarding regression diagnostics

© 2010 by Taylor and Francis Group, LLC

Logistic Regression and Generalized Linear Models


for logistic regression models fit to complex sample survey data, we can
offer the following recommendations at this writing:

1.Use one or more of the techniques described in Chapter 5 of
Hosmer and Lemeshow (2000) to evaluate the fit of the model for
individual patterns of covariates. If the complex sample logistic
regression modeling program in your chosen software system
(e.g., SAS PROC SURVEYLOGISTIC) does not include the full set of
diagnostic capabilities of the standard programs, use the standard
programs (e.g., SAS PROC LOGISTIC) with a weight specification.
As mentioned before, the weighted estimates of parameters and
predicted probabilities will be identical and serious breakdowns
in the model for specific covariate patterns should be identifiable
even though the standard program does not correctly reflect the
variances and covariances of the parameter estimates given the
complex sample design.

2.Remember that regression diagnostics serve to inform the analyst
of specific cases where the model provides a poor fit or cases that
exert extreme influence on the overall fit of the model. They are
useful in identifying improvements in the model or anomalies that
warrant future investigation, but small numbers of “predictive failures” occur in almost all regression modeling and do not necessarily
invalidate the entire model.

8.6╇Building the Logistic Regression Model:
Stage 4, Interpretation and Inference
In logistic regression modeling, one can make inferences concerning the
significance and importance of individual predictor variables in several
ways. As described in Section 8.5.1, Wald X2 tests may be employed to test
the null hypothesis that a single coefficient is equal to zero, H0: Bj = 0, or
more complex hypotheses concerning multiple parameters in the fitted
model. Confidence intervals (CIs) for individual model coefficients may also
be used to draw inferences concerning the significance of model predictors
and to provide information on the potential magnitude and uncertainty
associated with the estimated effects of individual predictor variables.
Recall from Section 6.4.5 that an estimated logistic regression coefficient
is the natural logarithm of the odds ratio comparing the odds that y = 1
for a predictor with value x + 1 to the odds that y = 1 when that predictor
has a value x. In addition, the estimated coefficient for an indicator variable

© 2010 by Taylor and Francis Group, LLC


Applied Survey Data Analysis

associated with a level of a categorical predictor is the natural logarithm of
the odds ratio comparing the odds that y = 1 for the level represented by the
indicator to the odds that y = 1 for the reference level of the categorical variable. Consequently, the estimated coefficients are often labeled the log-odds
for the corresponding predictor in the model. A design-based confidence
interval for the logistic regression parameter is computed as
CI1−α (B j ) = Bˆ j ± tdf ,1−α/2 ⋅ se(Bˆ j )


Typically, α = 0.05 is used (along with the design-based degrees of freedom, df), and the result is a 95% confidence interval for the parameter. In
theory, the correct inference to make is that over repeated sampling, 95 of
100 confidence intervals computed in this way are expected to include the
true population value of Bj. If the estimated CI includes ln(1) = 0, analysts
may choose to infer that H0: Bj = 0 is accepted with a Type I error rate of α
= 0.05.
Inference concerning the significance/importance of predictors can be
performed directly for the Bˆ j s (on the log-odds scale). However, to quantify the magnitude of the effect of an individual predictor, it is more useful
to transform the inference to a scale that is easily interpreted by scientists,
policy makers, and the general public. As discussed in Section 6.4.5, in a
logistic regression model with a single predictor, x1, an estimate of the odds
ratio corresponding to a one unit increase in the value of x1 can be obtained
by exponentiating the estimated logistic regression coefficient:
ψˆ = exp(Bˆ1 )


If the model contains only a single predictor, the result is an estimate of
the unadjusted odds ratio. If the fitted logistic regression model includes
multiple predictors, that is,

 πˆ (x )  ˆ
logit(πˆ (x )) = ln 
 = B0 + B1 x1 + ⋅⋅⋅ + Bp x p
 1-πˆ (x ) 


the result ψˆ j |Bˆ k ≠ j = exp(Bˆ j ) is an adjusted odds ratio. In general, the adjusted
odds ratio represents the multiplicative impact of a one-unit increase in the
predictor variable xj on the odds of the outcome variable being equal to 1,
holding all other predictor variables constant. Confidence limits can also be
computed for adjusted odds ratios:

© 2010 by Taylor and Francis Group, LLC