10 McFadden’s Pseudo R-Square for Strength of Association
Tải bản đầy đủ
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
McFadden’s (1974) pseudo R-square, denoted RL2 , is based on the improvement in
model fit as predictors are added to a model. An expression that can be used for RL2 Â€is
RL2 =
χ2
, (11)
-2 LLbaseline
where the numerator is the χ2 test for the difference in fit between a reduced and full
model and the denominator is the measure of fit for the model that contains only the
intercept, or the baseline model with no predictors. The numerator then reflects the
amount that the model fit, as measured by the difference in the quantity −2LL for a
reduced model and its full model counterpart, is reduced by or improved due to a set of
predictors, analogous to the amount of variation reduced by a set of predictors in traditional regression. When this amount (i.e., χ2) is divided by −2LLbaseline, the resulting
proportion can be interpreted as the proportional reduction in the lack of fit associated
with the baseline model due to the inclusion of the predictors, or the proportional
improvement in model fit, analogous to R2 in traditional regression. In addition to the
close correspondence to R2, RL2 also has lower and upper bounds of 0 and 1, which is
not shared by other pseudo R2 measures. Further, RL2 can be used when the dependent
variable has more than two categories (i.e., for multinomial logistic regression).
We first illustrate use of RL2 to assess the contribution of treatment and motivation in
predicting health status. Recall that the fit of the model with no predictors, or −2LLbase, is 272.117. After adding treatment and motivation, the fit is 253.145, which is a
line
reduction or improvement in fit of 18.972 (which is the χ2 test statistic) and the numerator of EquationÂ€11. Thus, RL2 is 18.972/272.117 or .07, indicating a 7% improvement
2
in model fit due to treatment and motivation. Note that RL indicates the degree that
fit improves when the predictors are added while the use of the χ2 test statistic is done
to determine whether an improvement in fit is present or different from zero in the
population.
The RL2 statistic can also be used to assess the contribution of subsets of variables
while controlling for other predictors. In sectionÂ€11.9, we tested for the improvement
in fit that is obtained by adding an interaction between treatment and motivation to
a model that assumed this interaction was not present. Relative to the main effects
model, the amount that the model fit improved after including the interaction is 0.013
and the proportional improvement in model fit due to adding the interaction to the
model, or the strength of association between the interaction and outcome, is then
0.013 / 272.117, which is nearÂ€zero.
McFadden (1979) cautioned that values for RL2 are typically smaller than R-square
values observed in standard regression analysis. As a result, researchers cannot rely on
values, for example, as given in Cohen (1988) to indicate weak, moderate, or strong
associations. McFadden (1979) noted that for the entire model values of .2 to .4 represent a strong improvement in fit, but these values of course cannot reasonably be
applied in every situation as they may represent a weak association in some contexts
449
450
â†œæ¸€å±®
â†œæ¸€å±®
Binary Logistic Â�Regression
and may be unobtainably high in others. Note also that, at present, neither SAS nor
SPSS provides this measure of association for binary outcomes.
11.11â•‡SIGNIFICANCE TESTS AND CONFIDENCE INTERVALS FOR
SINGLE VARIABLES
When you are interested in testing the association between an individual predictor
and outcome, controlling for other predictors, several options are available. Of those
introduced here, the most powerful approach is the likelihood ratio test described in
sectionÂ€11.9. The reduced model would exclude the variable of interest, and the full
model would include that variable. The main disadvantage of this approach is practical, in that multiple analyses would need to be done in order to test each predictor. In
this example, with a limited number of predictors, the likelihood ratio test would be
easy to implement.
A more convenient and commonly used approach to test the effects of individual predictors is to use a z test, which provides results equivalent to the Wald test that is often
reported by software programs. The z test of the null hypothesis that a given regression
coefficient is zero (i.e., βjÂ€=Â€0)Â€is
z=
βj
Sβ j
, (12)
where Sβj is the standard error for the regression coefficient. To test for significance,
you compare this test statistic to a critical value from the standard normal distribution.
So, if alpha were .05, the corresponding critical value for a two-tailed test would be
±1.96. The Wald test, which is the square of the z test, follows a chi-square distribution with 1 degree of freedom. The main disadvantage associated with this procedure
is that when βj becomes large, the standard error of EquationÂ€12 becomes inflated,
which makes this test less powerful than the likelihood ratio test (HauckÂ€& Donner,
1977).
A third option to test the effect of a predictor is to obtain a confidence interval for the
odds ratio. AÂ€general expression for the confidence interval for the odds ratio, denoted
CI(OR), is givenÂ€by
(
( ))
CI(OR) = e cβ j ± z ( a ) cSβ , (13)
where c is the increment of interest in the predictor (relevant only for a continuous
variable) and z(a) represents the z value from the standard normal distribution for the
associated confidence level of interest (often 95%). If a value of 1 is not contained in
the interval, then the null hypothesis of no effect is rejected. In addition, the use of
confidence intervals allows for a specific statement about the population value of the
odds ratio, which may be of interest.
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
11.11.1 Impact of the Treatment
We illustrate the use of these procedures to assess the impact of the treatment on health.
When EquationÂ€9 is estimated, the coefficient reflecting the impact of the treatment,
β1, is 1.014 (SEÂ€=Â€.302). The z test for the null hypothesis that β1Â€=Â€0 is then 1.014 /
.302Â€=Â€3.36 (pÂ€=Â€.001), indicating that the treatment effect is statistically significant.
The odds ratio of about 3 (e1 014Â€=Â€2.76) means that the odds of good health for adults
in the educator group are about 3 times the odds of those in the control group, controlling for motivation. The 95% confidence interval is computed as e(1 014 ± 1 96 × 302) and
is 1.53 to 4.98. The interval suggests that the odds of being diabetes free for those in
the educator group may be as small as 1.5 times and as large as about 5 times the odds
of those in the control group.
11.12â•‡ PRELIMINARY ANALYSIS
In the next few sections, measures of residuals and influence are presented along with
the statistical assumptions associated with logistic regression. In addition, other problems that may arise with data for logistic regression are discussed. Note that formulas
presented for the following residuals assume that continuous predictors are used in
the logistic regression model. When categorical predictors only are used, where many
cases are present for each possible combination of the levels of these variables (sometimes referred to as aggregate data), different formulas are used to calculate residuals
(see Menard, 2010). We present formulas here for individual (and not aggregate) data
because this situation is more common in social science research. As throughout the
text, the goal of preliminary analysis is to help ensure that the results obtained by the
primary analysis are valid.
11.13â•‡ RESIDUALS AND INFLUENCE
Observations that are not fit well by the model may be detected by the Pearson residual. The Pearson residual is givenÂ€by
ri =
Yi - p^i
p^i (1 - p^i )
, (14)
where p^i is the probability (of YÂ€=Â€1) as predicted by the logistic regression equation
for a given individual i. The numerator is the difference (i.e., the raw residual) between
an observed Y score and the probability predicted by the equation, and the denominator
is the standard deviation of the Y scores according to the binomial distribution. In large
samples, this residual may approximate a normal distribution with a mean of 0 and a
standard deviation of 1. Thus, a case with a residual value that is quite distinct from
the others and that has a value of ri greater than 2.5 or 3.0 suggest a case that is not
fit well by the model. It would be important to check any such cases to see if data are
451
452
â†œæ¸€å±®
â†œæ¸€å±®
Binary Logistic Â�Regression
entered correctly and, if so, to learn more about the kind of cases that are not fit well
by the model.
An alternative or supplemental index for outliers is the deviance residual. The deviance
residual reflects the contribution an individual observation makes to the model deviance, with larger absolute values reflecting more poorly fit observations. This residual
may be computed for a given case by calculating the log likelihood in EquationÂ€8 (the
expression to the right of the summation symbol), multiplying this value by −2, and
then taking the square root of this value. The sign of this residual (i.e., positive or negative) is determined by whether the numerator in EquationÂ€14 is positive or negative.
Some have expressed preference for use of the deviance residual over the Pearson
residual because the Pearson residual is relatively unstable when the predicted probably
of YÂ€=Â€1 is close to 0 or 1. However, Menard (2010) notes an advantage of the Pearson
residual is that it has larger values and so outlying cases are more greatly emphasized
with this residual. As such, we limit our discussion here to the Pearson residual.
In addition to identifying outlying cases, a related concern is to determine if any cases
are influential or unduly impact key analysis results. There are several measures of
influence that are analogous to those used in traditional regression, including, for
example, leverage, a Cook’s influence measure, and delta beta. Here, we focus on delta
beta because it is directed at the influence a given observation has on the impact of a
specific explanatory variable, which is often of interest and is here in this example with
the impact of the intervention being a primary concern. As with traditional regression,
delta beta indicates the change in a given logistic regression coefficient if a case were
deleted. Note that the sign of the index (+ or −) refers to whether the slope increases
or decreases when the case is included in the data set. Thus, the sign of the delta beta
needs to be reversed if you wish to interpret the index as the impact of specific case on
a given regression coefficient when the case is deleted. For SAS users, note that raw
delta beta values are not provided by the program. Instead, SAS provides standardized
delta beta values, obtained by dividing a delta beta value by its standard error. There is
some agreement that standardized values larger than a magnitude of 1 may exert influence on analysis. To be on the safe side, though, you can examine further any cases
having outlying values that are less than this magnitude.
We now illustrate examining residuals and delta beta values to identify unusual and
influential cases with the chapter data. We estimated EquationÂ€9 and found no cases
had a Pearson residual value greater than 2 in magnitude. We then inspected histograms of the delta betas for β1 and β2. Two outlying delta beta values appear to be
present for motivation (β2), the histogram for which is shown in FigureÂ€11.3. The value
of delta beta for each of these cases is about −.004. Given the negative value, the value
for β2 will increase if these cases were removed from the analysis. We can assess the
impact of both of these cases on analysis results by temporarily removing the observations and reestimating EquationÂ€9. With all 200 cases, the value for β2 is 0.040 and
e( 04)Â€=Â€1.041, and with the two cases removed β2 is 0.048 and e( 048)Â€=Â€1.049. The change,
then, obtained by removing these two cases seems small both for the coefficient and
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
Figure 11.3:â•‡ Histogram of delta beta values for coefficientÂ€β2.
40
Mean = 1.37E-7
Std. Dev. = 0.00113
N = 200
Frequency
30
20
10
0
0.00400
0.00200
0.0
DFBETA for Motiv
0.00200
the odds ratio. We also note that with the removal of these two cases, all of the conclusions associated with the statistical tests are unchanged. Thus, these two discrepant
cases are not judged to exert excessive influence on key study results.
11.14â•‡ASSUMPTIONS
Three formal assumptions are associated with logistic regression. First, the logistic
regression model is assumed to be correctly specified. Second, cases are assumed to
be independent. Third, each explanatory variable is assumed to be measured without
error. You can also consider there to be a fourth assumption for logistic regression.
That is, the statistical inference procedures discussed earlier (based on asymptotic theory) assume that a large sample size is used. These assumptions are described in more
detail later. Note that while many of these assumptions are analogous to those used in
traditional regression, logistic regression does not assume that the residuals follow a
normal distribution or that the residuals have constant variance across the range of predicted values. Also, other practical data-related issues are discussed in sectionÂ€11.15.
11.14.1 Correct Specification
Correct specification is a critical assumption. For logistic regression, correct specification means that (1) the correct link function (e.g., the logistic link function) is used,
453
454
â†œæ¸€å±®
â†œæ¸€å±®
Binary Logistic Â�Regression
and (2) that the model includes explanatory variables that are nontrivially related to the
outcome and excludes irrelevant predictors. For the link function, there appears to be
consensus that choice of link function (e.g., use of a logistic vs. probit link function)
has no real consequence on analysis results. Also, including predictors in the model
that are trivially related to the outcome (i.e., irrelevant predictors) is known to increase
the standard errors of the coefficients (thus reducing statistical power) but does not
result in biased regression coefficient estimates. On the other hand, excluding important determinants introduces bias into the estimation of the regression coefficients and
their standard errors, which can cast doubt on the validity of the results. You should
rely on theory, previous empirical work, and common sense to identify important
explanatory variables. If there is little direction to guide variable selection, you could
use exploratory methods as used in traditional regression (i.e., the sequential methods
discussed in sectionÂ€3.8) to begin the theory development process. The conclusions
drawn from the use of such methods are generally much more tentative than studies
where a specific theory guides model specification.
The need to include important predictors in order to avoid biased estimates also extends
to the inclusion of important nonlinear terms and interactions in the statistical model,
similar to traditional regression. Although the probabilities of YÂ€=Â€1 are nonlinearly
related to explanatory variables in logistic regression, the log of the odds or the logit,
given no transformation of the predictors, is assumed to be linearly related to the predictors, as in EquationÂ€7. Of course, this functional form may not be correct.
The Box–Tidwell procedure can be used to test the linear aspect of this assumption.
To implement this procedure, you create new variables in the data set, which are the
natural logs of each continuous predictor. Then, you multiply this transformed variable by the original predictor, essentially creating a product variable that is the original
continuous variable times its natural log. Any such product variables are then added
to the logistic regression equation. If any are statistically significant, this suggests that
the logit has a nonlinear association with the given continuous predictor. You could
then search for an appropriate transformation of the continuous explanatory variable,
as suggested in Menard (2010).
The Box–Tidwell procedure to test for nonlinearity in the logit is illustrated here with
the chapter data. For these data, only one predictor, motivation, is continuous. Thus,
we computed the natural log of the scores for this variable and multiplied them by
motivation. This new product variable is named xlnx. When this predictor is added to
those included in EquationÂ€9, the p value associated with the coefficient of xlnx is .909,
suggesting no violation of the linearity assumption. SectionÂ€11.18 provides the SAS
and SPSS commands needed to implement this procedure as well as selected output.
In addition to linearity, the correct specification assumption also implies that important
interactions have been included in the model. In principle, you could include all possible
interaction terms in the model in an attempt to determine if important interaction terms
have been omitted. However, as more explanatory variables appear in the model, the
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
number of interaction terms increases sharply with perhaps many of these interactions
being essentially uninterpretable (e.g., four- and five-way interactions). As with traditional regression models, the best advice may be to include interactions as suggested by
theory or that are of interest. For the chapter data, recall that in sectionÂ€11.9 we tested the
interaction between treatment and motivation and found no support for the interaction.
11.14.2 Hosmer–Lemeshow Goodness-of-FitÂ€Test
In addition to these procedures, the Hosmer–Lemeshow (HL) test offers a global
goodness-of-fit test that compares the estimated model to one that has perfect fit. Note
that this test does not assess, as was the case with the likelihood ratio test in sectionÂ€11.9, whether model fit is improved when a set of predictors is added to a reduced
model. Instead, the HL test assesses whether the fit of a given model deviates from
the perfect fitting model, given all relevant explanatory variables are included. Alternatively, as Allison (2012) points out, the HL test can be interpreted as a test of the
null hypothesis that no additional interaction or nonlinear terms are needed in the
model. Note, however, that the HL test does not assess whether other predictors that
are entirely excluded from the estimated model could improve modelÂ€fit.
Before highlighting some limitations associated with the procedure, we discuss how it
works. The procedure compares the observed frequencies of YÂ€=Â€1 to the frequencies
predicted by the logistic regression equation. To obtain these values, the sample is
divided, by convention, into 10 groups referred to as the deciles of risk. Each group is
formed based on the probabilities of YÂ€=Â€1, with individuals in the first group consisting
of those cases that have the lowest predicted probabilities, those in the second group
are cases that have next lowest predicted probabilities, and so on. The predicted, or
expected, frequencies are then obtained by summing these probabilities over the cases
in each of the 10 groups. The observed frequencies are obtained by summing the number of cases actually having YÂ€=Â€1 in each of the 10 groups.
The probabilities obtained from estimating EquationÂ€9 are now used to illustrate this
procedure. TableÂ€11.3 shows the observed and expected frequencies for each of the
10 deciles. When the probabilities of YÂ€=Â€1 are summed for the 20 cases in group 1,
this sum or expected frequency is 3.995. Note that under the Observed column of
TableÂ€ 11.3, 4 of these 20 cases actually exhibited good health. For this first decile,
then, there is a very small difference between the observed and expected frequencies, suggesting that the probabilities produced by the logistic regression equation, for
this group, approximate reality quite well. Note that Hosmer and Lemeshow (2013)
suggest computing the quantity
Observed - Expected
for a given decile with values
Expected
larger than 2 in magnitude indicating a problem in fit for a particular decile. The largest
such value here is 0.92 for decile 2, i.e., (7 - 4.958) 4.958 = 0.92. This suggests that
there are small differences between the observed and expected frequencies, supporting
the goodness-of-fit of the estimated model.
455
456
â†œæ¸€å±®
â†œæ¸€å±®
Binary Logistic Â�Regression
Table 11.3:â•‡ Deciles of Risk Table Associated With the Hosmer–Lemeshow
Goodness-of-FitÂ€Test
HealthÂ€=Â€1
Number of groups
Observed
Expected
Number of cases
1
2
3
4
5
6
7
8
9
10
4
7
6
3
9
8
10
9
15
13
3.995
4.958
5.680
6.606
7.866
8.848
9.739
10.777
12.156
13.375
20
20
20
20
20
20
20
20
20
20
In addition to this information, this procedure offers an overall goodness-of-fit statistical test for the differences between the observed and expected frequencies. The
null hypothesis is that these differences reflect sampling error, or that the model has
perfect fit. AÂ€decision to retain the null hypothesis (i.e., p > a) supports the adequacy
of the model, whereas a reject decision signals that the model is misspecified (i.e.,
has omitted nonlinear and/or interaction terms). The HL test statistic approximates a
chi-square distribution with degrees of freedom equal to the number of groups formed
(10, here) − 2. Here, we simply report that the χ2 test value is 6.88 (dfÂ€=Â€8), and the
corresponding p value is .55. As such, the goodness-of-fit of the model is supported
(suggesting that adding nonlinear and interaction terms to the model will not improve
itsÂ€fit).
There are some limitations associated with the Hosmer–Lemeshow goodness-of-fit
test. Allison (2012) and Menard (2010) note that this test may be underpowered and
tends to return a result of correct fit of the model, especially when fewer than six
groups are formed and when sample size is not large (i.e., less than 500). Further,
Allison (2012) notes that even when more than six groups are formed, test results
are sensitive to the number of groups formed in the procedure. He further discusses
erratic behavior with the performance of the test, for example, that including a statistically significant interaction in the model can produce HL test results that indicate
worse model fit (the opposite of what is intended). Research continues on ways to
improve the HL test (Prabasaj, Pennell,Â€& Lemeshow, 2012). In the meantime, a sensible approach may be to examine the observed and expected frequencies produced
by this procedure to identify possible areas of misfit (as suggested by HosmerÂ€ &
Lemeshow, 2013) use the Box–Tidwell procedure to assess the assumption of linearity, and include interactions in the model that are based on theory or those that are
of interest.
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
11.14.3 Independence
Another important assumption is that the observations are obtained from independent cases. Dependency in observations may arise from repeatedly measuring the
outcome and in study designs where observations are clustered in settings (e.g., students in schools) or cases are paired or matched on some variable(s), as in a matched
case-control study. Note that when this assumption is violated and standard analysis is used, type IÂ€error rates associated with tests of the regression coefficients may
be inflated. In addition, dependence can introduce other problems, such as over- and
underdispersion (i.e., where the assumed binomial variance of the outcome does not
hold for the data). Extensions of the standard logistic regression procedure have been
developed for these situations. Interested readers may consult texts by Allison (2012),
Hosmer and Lemeshow (2013), or Menard (2010), that cover these and other extensions of the standard logistic regression model.
11.14.4 No Measurement Error for the Predictors
As with traditional regression, the predictors are assumed to be measured with perfect
reliability. Increasing degrees of violation of this assumption lead to greater bias in the
estimates of the logistic regression coefficients and their standard errors. Good advice
here obviously is to select measures of constructs that are known to have the greatest reliability. Options you may consider when reliability is lower than desired is to
exclude such explanatory variables from the model, when it makes sense to do that, or
use structural equation modeling to obtain parameter estimates that take measurement
error into account.
11.14.5 Sufficiently Large SampleÂ€Size
Also, as mentioned, use of inferential procedures in logistic regression assume large
sample sizes are being used. How large a sample size needs to be for these properties
to hold for a given model is unknown. Long (1997) reluctantly offers some advice
and suggests that samples smaller than 100 are likely problematic, but that samples
larger than 500 should mostly be adequate. He also advises that there be at least 10
observations per predictor. Note also that the sample sizes mentioned here do not,
of course, guarantee sufficient statistical power. The software program NCSS PASS
(Hintze, 2002) may be useful to help you obtain an estimate of the sample size needed
to achieve reasonable power, although it requires you to make a priori selections about
certain summary measures, which may require a good deal of speculation.
11.15â•‡ OTHER DATA ISSUES
There are other issues associated with the data that may present problems for logistic regression analysis. First, as with traditional regression analysis, excessive multicollinearity may be present. If so, standard errors of regression coefficients may be
457
458
â†œæ¸€å±®
â†œæ¸€å±®
Binary Logistic Â�Regression
inflated or the estimation process may not converge. SectionÂ€3.7 presented methods to
detect multicollinearity and suggested possible remedies, which also apply to logistic
regression.
Another issue that may arise in logistic regression is known as perfect or complete
separation. Such separation occurs when the outcome is perfectly predicted by an
explanatory variable. For example, for the chapter data, if all adults in the educator
group exhibited good health status (YÂ€=Â€1) and all in the control group did not (YÂ€=Â€0),
perfect separation would be present. AÂ€similar problem is known as quasi-complete or
nearly complete separation. In this case, the separation is nearly complete (e.g., YÂ€=Â€1
for nearly all cases in a given group and YÂ€=Â€0 for nearly all cases in another group).
If complete or quasi-complete separation is present, maximum likelihood estimation
may not converge or, if it does, the estimated coefficient for the explanatory variable associated with the separation and its standard error may be extremely large. In
practice, these separation issues may be due to having nearly as many variables in the
analysis as there are cases. Remedies here include increasing sample size or removing
predictors from the model.
A related issue and another possible cause of quasi-complete separation is known as
zero cell count. This situation occurs when a level of a categorical variable has only
one outcome score (i.e., YÂ€=Â€1 or YÂ€=Â€0). Zero cell count can be detected during the initial data screening. There are several options for dealing with zero cell count. Potential
remedies include collapsing the levels of the categorical variable to eliminate the zero
count problem, dropping the categorical variable entirely from the analysis, or dropping cases associated with the level of the “offending” categorical variable. You may
also decide to retain the categorical variable as is, as other parameters in the model
should not be affected, other than those involving the contrasts among the levels of the
categorical variable with that specific level. Allison (2012) also discusses alternative
estimation options that may be useful.
11.16â•‡CLASSIFICATION
Often in logistic regression, as in the earlier example, investigators are interested in
quantifying the degree to which an explanatory variable, or a set of such variables,
is related to the probability of some event, that is, the probability of YÂ€=Â€1. Given
that the residual term in logistic regression is defined as difference between observed
group membership and the predicted probability of Y = 1, a common analysis goal
is to determine if this error is reduced after including one or more predictors in the
model. McFadden’s RL2 is an effect size measure that reflects improved prediction
(i.e., smaller error term), and the likelihood ratio test is used to assess if this improvement is due to sampling error or reflects real improvement in the population. Menard
(2010) labels this type of prediction as quantitative prediction, reflecting the degree to
which the predicted probabilities of YÂ€=Â€1 more closely approximate observed group
membership after predictors are included.
Chapter 11
â†œæ¸€å±®
â†œæ¸€å±®
In addition to the goal of assessing the improvement in quantitative prediction, investigators may be interested or primarily interested in using logistic regression results
to classify participants into groups. Using the outcome from this chapter, you may be
interested in classifying adults as having a diabetes-free diagnosis or of being at risk
of being diagnosed with type 2 diabetes. Accurately classifying adults as being at risk
of developing type 2 diabetes may be helpful because adults can then change their
lifestyle to prevent its onset. In assessing how well the results of logistic regression can
effectively classify individuals, a key measure used is the number of errors made by
the classification. That is, for cases that are predicted to be of good health, how many
actually are and how many errors are there? Similarly, for those cases predicted to be
of poor health, how many actuallyÂ€are?
When results from a logistic regression equation are used for classification purposes,
the interest turns to minimizing the number of classification errors. In this context, the
interest is to find out if a set of variables reduces the number of classification errors, or
improves qualitative prediction (Menard, 2010). When classification is a study goal, a
new set of statistics then is needed to describe the reduction in the number of classification errors. This section presents statistics that can be used to address the accuracy
of classifications made by use of a logistic regression equation.
11.16.1Â€Percent Correctly Classified
A measure that is often used to assess the accuracy of prediction is the percent of cases
correctly classified by the model. To classify cases into one of two groups, the probabilities of Y = 1 are obtained from a logistic regression equation. With these probabilities,
you can classify a given individual after selecting a cut point. AÂ€cut point is a probability
of YÂ€=Â€1 that you select, with a commonly used value being .50, at or above which results
in a case being classified into one of two groups (e.g., success) and below which results
in a case being classified into the other group (e.g., failure). Of course to assess the accuracy of classification in this way, the outcome data must already be collected. Given that
actual group membership is already known, it is a simple matter to count the number of
cases correctly and incorrectly classified. The percent of cases classified correctly can be
readily determined, with of course higher values reflecting greater accuracy. Note that if
the logistic regression equation is judged to be useful in classifying cases, the equation
could then be applied to future samples without having the outcome data collected for
these samples. Cross-validation of the results with an independent sample would provide additional support for using the classification procedure in thisÂ€way.
We use the chapter data to obtain the percent of cases correctly classified by the full
model. TableÂ€11.4 uses probabilities obtained from estimating EquationÂ€9 to classify
cases into one of two groups: (1) of good health, a classification made when the probability of being of good health is estimated by the equation to be 0.5 or greater; or (2)
of poor health, a classification made when this probability is estimated at values less
than 0.5. In the Total column, TableÂ€11.4 shows that the number of observed cases that
did not exhibit good health was 116, whereas 84 cases exhibited good health. Of the
459