Tải bản đầy đủ

10 McFadden’s Pseudo R-Square for Strength of Association

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

McFadden’s (1974) pseudo R-square, denoted RL2 , is based on the improvement in

model fit as predictors are added to a model. An expression that can be used for RL2 Â€is

RL2 =

χ2

, (11)

-2 LLbaseline

where the numerator is the χ2 test for the difference in fit between a reduced and full

model and the denominator is the measure of fit for the model that contains only the

intercept, or the baseline model with no predictors. The numerator then reflects the

amount that the model fit, as measured by the difference in the quantity −2LL for a

reduced model and its full model counterpart, is reduced by or improved due to a set of

predictors, analogous to the amount of variation reduced by a set of predictors in traditional regression. When this amount (i.e., χ2) is divided by −2LLbaseline, the resulting

proportion can be interpreted as the proportional reduction in the lack of fit associated

with the baseline model due to the inclusion of the predictors, or the proportional

improvement in model fit, analogous to R2 in traditional regression. In addition to the

close correspondence to R2, RL2 also has lower and upper bounds of 0 and 1, which is

not shared by other pseudo R2 measures. Further, RL2 can be used when the dependent

variable has more than two categories (i.e., for multinomial logistic regression).

We first illustrate use of RL2 to assess the contribution of treatment and motivation in

predicting health status. Recall that the fit of the model with no predictors, or −2LLbase, is 272.117. After adding treatment and motivation, the fit is 253.145, which is a

line

reduction or improvement in fit of 18.972 (which is the χ2 test statistic) and the numerator of EquationÂ€11. Thus, RL2 is 18.972/272.117 or .07, indicating a 7% improvement

2

in model fit due to treatment and motivation. Note that RL indicates the degree that

fit improves when the predictors are added while the use of the χ2 test statistic is done

to determine whether an improvement in fit is present or different from zero in the

population.

The RL2 statistic can also be used to assess the contribution of subsets of variables

while controlling for other predictors. In sectionÂ€11.9, we tested for the improvement

in fit that is obtained by adding an interaction between treatment and motivation to

a model that assumed this interaction was not present. Relative to the main effects

model, the amount that the model fit improved after including the interaction is 0.013

and the proportional improvement in model fit due to adding the interaction to the

model, or the strength of association between the interaction and outcome, is then

0.013 / 272.117, which is nearÂ€zero.

McFadden (1979) cautioned that values for RL2 are typically smaller than R-square

values observed in standard regression analysis. As a result, researchers cannot rely on

values, for example, as given in Cohen (1988) to indicate weak, moderate, or strong

associations. McFadden (1979) noted that for the entire model values of .2 to .4 represent a strong improvement in fit, but these values of course cannot reasonably be

applied in every situation as they may represent a weak association in some contexts

449

450

â†œæ¸€å±®

â†œæ¸€å±®

Binary Logistic Â�Regression

and may be unobtainably high in others. Note also that, at present, neither SAS nor

SPSS provides this measure of association for binary outcomes.

11.11â•‡SIGNIFICANCE TESTS AND CONFIDENCE INTERVALS FOR

SINGLE VARIABLES

When you are interested in testing the association between an individual predictor

and outcome, controlling for other predictors, several options are available. Of those

introduced here, the most powerful approach is the likelihood ratio test described in

sectionÂ€11.9. The reduced model would exclude the variable of interest, and the full

model would include that variable. The main disadvantage of this approach is practical, in that multiple analyses would need to be done in order to test each predictor. In

this example, with a limited number of predictors, the likelihood ratio test would be

easy to implement.

A more convenient and commonly used approach to test the effects of individual predictors is to use a z test, which provides results equivalent to the Wald test that is often

reported by software programs. The z test of the null hypothesis that a given regression

coefficient is zero (i.e., βjÂ€=Â€0)Â€is

z=

βj

Sβ j

, (12)

where Sβj is the standard error for the regression coefficient. To test for significance,

you compare this test statistic to a critical value from the standard normal distribution.

So, if alpha were .05, the corresponding critical value for a two-tailed test would be

±1.96. The Wald test, which is the square of the z test, follows a chi-square distribution with 1 degree of freedom. The main disadvantage associated with this procedure

is that when βj becomes large, the standard error of EquationÂ€12 becomes inflated,

which makes this test less powerful than the likelihood ratio test (HauckÂ€& Donner,

1977).

A third option to test the effect of a predictor is to obtain a confidence interval for the

odds ratio. AÂ€general expression for the confidence interval for the odds ratio, denoted

CI(OR), is givenÂ€by

(

( ))

CI(OR) = e cβ j ± z ( a ) cSβ , (13)

where c is the increment of interest in the predictor (relevant only for a continuous

variable) and z(a) represents the z value from the standard normal distribution for the

associated confidence level of interest (often 95%). If a value of 1 is not contained in

the interval, then the null hypothesis of no effect is rejected. In addition, the use of

confidence intervals allows for a specific statement about the population value of the

odds ratio, which may be of interest.

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

11.11.1 Impact of the Treatment

We illustrate the use of these procedures to assess the impact of the treatment on health.

When EquationÂ€9 is estimated, the coefficient reflecting the impact of the treatment,

β1, is 1.014 (SEÂ€=Â€.302). The z test for the null hypothesis that β1Â€=Â€0 is then 1.014 /

.302Â€=Â€3.36 (pÂ€=Â€.001), indicating that the treatment effect is statistically significant.

The odds ratio of about 3 (e1 014Â€=Â€2.76) means that the odds of good health for adults

in the educator group are about 3 times the odds of those in the control group, controlling for motivation. The 95% confidence interval is computed as e(1 014 ± 1 96 × 302) and

is 1.53 to 4.98. The interval suggests that the odds of being diabetes free for those in

the educator group may be as small as 1.5 times and as large as about 5 times the odds

of those in the control group.

11.12â•‡ PRELIMINARY ANALYSIS

In the next few sections, measures of residuals and influence are presented along with

the statistical assumptions associated with logistic regression. In addition, other problems that may arise with data for logistic regression are discussed. Note that formulas

presented for the following residuals assume that continuous predictors are used in

the logistic regression model. When categorical predictors only are used, where many

cases are present for each possible combination of the levels of these variables (sometimes referred to as aggregate data), different formulas are used to calculate residuals

(see Menard, 2010). We present formulas here for individual (and not aggregate) data

because this situation is more common in social science research. As throughout the

text, the goal of preliminary analysis is to help ensure that the results obtained by the

primary analysis are valid.

11.13â•‡ RESIDUALS AND INFLUENCE

Observations that are not fit well by the model may be detected by the Pearson residual. The Pearson residual is givenÂ€by

ri =

Yi - p^i

p^i (1 - p^i )

, (14)

where p^i is the probability (of YÂ€=Â€1) as predicted by the logistic regression equation

for a given individual i. The numerator is the difference (i.e., the raw residual) between

an observed Y score and the probability predicted by the equation, and the denominator

is the standard deviation of the Y scores according to the binomial distribution. In large

samples, this residual may approximate a normal distribution with a mean of 0 and a

standard deviation of 1. Thus, a case with a residual value that is quite distinct from

the others and that has a value of ri greater than 2.5 or 3.0 suggest a case that is not

fit well by the model. It would be important to check any such cases to see if data are

451

452

â†œæ¸€å±®

â†œæ¸€å±®

Binary Logistic Â�Regression

entered correctly and, if so, to learn more about the kind of cases that are not fit well

by the model.

An alternative or supplemental index for outliers is the deviance residual. The deviance

residual reflects the contribution an individual observation makes to the model deviance, with larger absolute values reflecting more poorly fit observations. This residual

may be computed for a given case by calculating the log likelihood in EquationÂ€8 (the

expression to the right of the summation symbol), multiplying this value by −2, and

then taking the square root of this value. The sign of this residual (i.e., positive or negative) is determined by whether the numerator in EquationÂ€14 is positive or negative.

Some have expressed preference for use of the deviance residual over the Pearson

residual because the Pearson residual is relatively unstable when the predicted probably

of YÂ€=Â€1 is close to 0 or 1. However, Menard (2010) notes an advantage of the Pearson

residual is that it has larger values and so outlying cases are more greatly emphasized

with this residual. As such, we limit our discussion here to the Pearson residual.

In addition to identifying outlying cases, a related concern is to determine if any cases

are influential or unduly impact key analysis results. There are several measures of

influence that are analogous to those used in traditional regression, including, for

example, leverage, a Cook’s influence measure, and delta beta. Here, we focus on delta

beta because it is directed at the influence a given observation has on the impact of a

specific explanatory variable, which is often of interest and is here in this example with

the impact of the intervention being a primary concern. As with traditional regression,

delta beta indicates the change in a given logistic regression coefficient if a case were

deleted. Note that the sign of the index (+ or −) refers to whether the slope increases

or decreases when the case is included in the data set. Thus, the sign of the delta beta

needs to be reversed if you wish to interpret the index as the impact of specific case on

a given regression coefficient when the case is deleted. For SAS users, note that raw

delta beta values are not provided by the program. Instead, SAS provides standardized

delta beta values, obtained by dividing a delta beta value by its standard error. There is

some agreement that standardized values larger than a magnitude of 1 may exert influence on analysis. To be on the safe side, though, you can examine further any cases

having outlying values that are less than this magnitude.

We now illustrate examining residuals and delta beta values to identify unusual and

influential cases with the chapter data. We estimated EquationÂ€9 and found no cases

had a Pearson residual value greater than 2 in magnitude. We then inspected histograms of the delta betas for β1 and β2. Two outlying delta beta values appear to be

present for motivation (β2), the histogram for which is shown in FigureÂ€11.3. The value

of delta beta for each of these cases is about −.004. Given the negative value, the value

for β2 will increase if these cases were removed from the analysis. We can assess the

impact of both of these cases on analysis results by temporarily removing the observations and reestimating EquationÂ€9. With all 200 cases, the value for β2 is 0.040 and

e( 04)Â€=Â€1.041, and with the two cases removed β2 is 0.048 and e( 048)Â€=Â€1.049. The change,

then, obtained by removing these two cases seems small both for the coefficient and

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

Figure 11.3:â•‡ Histogram of delta beta values for coefficientÂ€β2.

40

Mean = 1.37E-7

Std. Dev. = 0.00113

N = 200

Frequency

30

20

10

0

0.00400

0.00200

0.0

DFBETA for Motiv

0.00200

the odds ratio. We also note that with the removal of these two cases, all of the conclusions associated with the statistical tests are unchanged. Thus, these two discrepant

cases are not judged to exert excessive influence on key study results.

11.14â•‡ASSUMPTIONS

Three formal assumptions are associated with logistic regression. First, the logistic

regression model is assumed to be correctly specified. Second, cases are assumed to

be independent. Third, each explanatory variable is assumed to be measured without

error. You can also consider there to be a fourth assumption for logistic regression.

That is, the statistical inference procedures discussed earlier (based on asymptotic theory) assume that a large sample size is used. These assumptions are described in more

detail later. Note that while many of these assumptions are analogous to those used in

traditional regression, logistic regression does not assume that the residuals follow a

normal distribution or that the residuals have constant variance across the range of predicted values. Also, other practical data-related issues are discussed in sectionÂ€11.15.

11.14.1 Correct Specification

Correct specification is a critical assumption. For logistic regression, correct specification means that (1) the correct link function (e.g., the logistic link function) is used,

453

454

â†œæ¸€å±®

â†œæ¸€å±®

Binary Logistic Â�Regression

and (2) that the model includes explanatory variables that are nontrivially related to the

outcome and excludes irrelevant predictors. For the link function, there appears to be

consensus that choice of link function (e.g., use of a logistic vs. probit link function)

has no real consequence on analysis results. Also, including predictors in the model

that are trivially related to the outcome (i.e., irrelevant predictors) is known to increase

the standard errors of the coefficients (thus reducing statistical power) but does not

result in biased regression coefficient estimates. On the other hand, excluding important determinants introduces bias into the estimation of the regression coefficients and

their standard errors, which can cast doubt on the validity of the results. You should

rely on theory, previous empirical work, and common sense to identify important

explanatory variables. If there is little direction to guide variable selection, you could

use exploratory methods as used in traditional regression (i.e., the sequential methods

discussed in sectionÂ€3.8) to begin the theory development process. The conclusions

drawn from the use of such methods are generally much more tentative than studies

where a specific theory guides model specification.

The need to include important predictors in order to avoid biased estimates also extends

to the inclusion of important nonlinear terms and interactions in the statistical model,

similar to traditional regression. Although the probabilities of YÂ€=Â€1 are nonlinearly

related to explanatory variables in logistic regression, the log of the odds or the logit,

given no transformation of the predictors, is assumed to be linearly related to the predictors, as in EquationÂ€7. Of course, this functional form may not be correct.

The Box–Tidwell procedure can be used to test the linear aspect of this assumption.

To implement this procedure, you create new variables in the data set, which are the

natural logs of each continuous predictor. Then, you multiply this transformed variable by the original predictor, essentially creating a product variable that is the original

continuous variable times its natural log. Any such product variables are then added

to the logistic regression equation. If any are statistically significant, this suggests that

the logit has a nonlinear association with the given continuous predictor. You could

then search for an appropriate transformation of the continuous explanatory variable,

as suggested in Menard (2010).

The Box–Tidwell procedure to test for nonlinearity in the logit is illustrated here with

the chapter data. For these data, only one predictor, motivation, is continuous. Thus,

we computed the natural log of the scores for this variable and multiplied them by

motivation. This new product variable is named xlnx. When this predictor is added to

those included in EquationÂ€9, the p value associated with the coefficient of xlnx is .909,

suggesting no violation of the linearity assumption. SectionÂ€11.18 provides the SAS

and SPSS commands needed to implement this procedure as well as selected output.

In addition to linearity, the correct specification assumption also implies that important

interactions have been included in the model. In principle, you could include all possible

interaction terms in the model in an attempt to determine if important interaction terms

have been omitted. However, as more explanatory variables appear in the model, the

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

number of interaction terms increases sharply with perhaps many of these interactions

being essentially uninterpretable (e.g., four- and five-way interactions). As with traditional regression models, the best advice may be to include interactions as suggested by

theory or that are of interest. For the chapter data, recall that in sectionÂ€11.9 we tested the

interaction between treatment and motivation and found no support for the interaction.

11.14.2 Hosmer–Lemeshow Goodness-of-FitÂ€Test

In addition to these procedures, the Hosmer–Lemeshow (HL) test offers a global

goodness-of-fit test that compares the estimated model to one that has perfect fit. Note

that this test does not assess, as was the case with the likelihood ratio test in sectionÂ€11.9, whether model fit is improved when a set of predictors is added to a reduced

model. Instead, the HL test assesses whether the fit of a given model deviates from

the perfect fitting model, given all relevant explanatory variables are included. Alternatively, as Allison (2012) points out, the HL test can be interpreted as a test of the

null hypothesis that no additional interaction or nonlinear terms are needed in the

model. Note, however, that the HL test does not assess whether other predictors that

are entirely excluded from the estimated model could improve modelÂ€fit.

Before highlighting some limitations associated with the procedure, we discuss how it

works. The procedure compares the observed frequencies of YÂ€=Â€1 to the frequencies

predicted by the logistic regression equation. To obtain these values, the sample is

divided, by convention, into 10 groups referred to as the deciles of risk. Each group is

formed based on the probabilities of YÂ€=Â€1, with individuals in the first group consisting

of those cases that have the lowest predicted probabilities, those in the second group

are cases that have next lowest predicted probabilities, and so on. The predicted, or

expected, frequencies are then obtained by summing these probabilities over the cases

in each of the 10 groups. The observed frequencies are obtained by summing the number of cases actually having YÂ€=Â€1 in each of the 10 groups.

The probabilities obtained from estimating EquationÂ€9 are now used to illustrate this

procedure. TableÂ€11.3 shows the observed and expected frequencies for each of the

10 deciles. When the probabilities of YÂ€=Â€1 are summed for the 20 cases in group 1,

this sum or expected frequency is 3.995. Note that under the Observed column of

TableÂ€ 11.3, 4 of these 20 cases actually exhibited good health. For this first decile,

then, there is a very small difference between the observed and expected frequencies, suggesting that the probabilities produced by the logistic regression equation, for

this group, approximate reality quite well. Note that Hosmer and Lemeshow (2013)

suggest computing the quantity

Observed - Expected

for a given decile with values

Expected

larger than 2 in magnitude indicating a problem in fit for a particular decile. The largest

such value here is 0.92 for decile 2, i.e., (7 - 4.958) 4.958 = 0.92. This suggests that

there are small differences between the observed and expected frequencies, supporting

the goodness-of-fit of the estimated model.

455

456

â†œæ¸€å±®

â†œæ¸€å±®

Binary Logistic Â�Regression

Table 11.3:â•‡ Deciles of Risk Table Associated With the Hosmer–Lemeshow

Goodness-of-FitÂ€Test

HealthÂ€=Â€1

Number of groups

Observed

Expected

Number of cases

1

2

3

4

5

6

7

8

9

10

4

7

6

3

9

8

10

9

15

13

3.995

4.958

5.680

6.606

7.866

8.848

9.739

10.777

12.156

13.375

20

20

20

20

20

20

20

20

20

20

In addition to this information, this procedure offers an overall goodness-of-fit statistical test for the differences between the observed and expected frequencies. The

null hypothesis is that these differences reflect sampling error, or that the model has

perfect fit. AÂ€decision to retain the null hypothesis (i.e., p > a) supports the adequacy

of the model, whereas a reject decision signals that the model is misspecified (i.e.,

has omitted nonlinear and/or interaction terms). The HL test statistic approximates a

chi-square distribution with degrees of freedom equal to the number of groups formed

(10, here) − 2. Here, we simply report that the χ2 test value is 6.88 (dfÂ€=Â€8), and the

corresponding p value is .55. As such, the goodness-of-fit of the model is supported

(suggesting that adding nonlinear and interaction terms to the model will not improve

itsÂ€fit).

There are some limitations associated with the Hosmer–Lemeshow goodness-of-fit

test. Allison (2012) and Menard (2010) note that this test may be underpowered and

tends to return a result of correct fit of the model, especially when fewer than six

groups are formed and when sample size is not large (i.e., less than 500). Further,

Allison (2012) notes that even when more than six groups are formed, test results

are sensitive to the number of groups formed in the procedure. He further discusses

erratic behavior with the performance of the test, for example, that including a statistically significant interaction in the model can produce HL test results that indicate

worse model fit (the opposite of what is intended). Research continues on ways to

improve the HL test (Prabasaj, Pennell,Â€& Lemeshow, 2012). In the meantime, a sensible approach may be to examine the observed and expected frequencies produced

by this procedure to identify possible areas of misfit (as suggested by HosmerÂ€ &

Lemeshow, 2013) use the Box–Tidwell procedure to assess the assumption of linearity, and include interactions in the model that are based on theory or those that are

of interest.

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

11.14.3 Independence

Another important assumption is that the observations are obtained from independent cases. Dependency in observations may arise from repeatedly measuring the

outcome and in study designs where observations are clustered in settings (e.g., students in schools) or cases are paired or matched on some variable(s), as in a matched

case-control study. Note that when this assumption is violated and standard analysis is used, type IÂ€error rates associated with tests of the regression coefficients may

be inflated. In addition, dependence can introduce other problems, such as over- and

underdispersion (i.e., where the assumed binomial variance of the outcome does not

hold for the data). Extensions of the standard logistic regression procedure have been

developed for these situations. Interested readers may consult texts by Allison (2012),

Hosmer and Lemeshow (2013), or Menard (2010), that cover these and other extensions of the standard logistic regression model.

11.14.4 No Measurement Error for the Predictors

As with traditional regression, the predictors are assumed to be measured with perfect

reliability. Increasing degrees of violation of this assumption lead to greater bias in the

estimates of the logistic regression coefficients and their standard errors. Good advice

here obviously is to select measures of constructs that are known to have the greatest reliability. Options you may consider when reliability is lower than desired is to

exclude such explanatory variables from the model, when it makes sense to do that, or

use structural equation modeling to obtain parameter estimates that take measurement

error into account.

11.14.5 Sufficiently Large SampleÂ€Size

Also, as mentioned, use of inferential procedures in logistic regression assume large

sample sizes are being used. How large a sample size needs to be for these properties

to hold for a given model is unknown. Long (1997) reluctantly offers some advice

and suggests that samples smaller than 100 are likely problematic, but that samples

larger than 500 should mostly be adequate. He also advises that there be at least 10

observations per predictor. Note also that the sample sizes mentioned here do not,

of course, guarantee sufficient statistical power. The software program NCSS PASS

(Hintze, 2002) may be useful to help you obtain an estimate of the sample size needed

to achieve reasonable power, although it requires you to make a priori selections about

certain summary measures, which may require a good deal of speculation.

11.15â•‡ OTHER DATA ISSUES

There are other issues associated with the data that may present problems for logistic regression analysis. First, as with traditional regression analysis, excessive multicollinearity may be present. If so, standard errors of regression coefficients may be

457

458

â†œæ¸€å±®

â†œæ¸€å±®

Binary Logistic Â�Regression

inflated or the estimation process may not converge. SectionÂ€3.7 presented methods to

detect multicollinearity and suggested possible remedies, which also apply to logistic

regression.

Another issue that may arise in logistic regression is known as perfect or complete

separation. Such separation occurs when the outcome is perfectly predicted by an

explanatory variable. For example, for the chapter data, if all adults in the educator

group exhibited good health status (YÂ€=Â€1) and all in the control group did not (YÂ€=Â€0),

perfect separation would be present. AÂ€similar problem is known as quasi-complete or

nearly complete separation. In this case, the separation is nearly complete (e.g., YÂ€=Â€1

for nearly all cases in a given group and YÂ€=Â€0 for nearly all cases in another group).

If complete or quasi-complete separation is present, maximum likelihood estimation

may not converge or, if it does, the estimated coefficient for the explanatory variable associated with the separation and its standard error may be extremely large. In

practice, these separation issues may be due to having nearly as many variables in the

analysis as there are cases. Remedies here include increasing sample size or removing

predictors from the model.

A related issue and another possible cause of quasi-complete separation is known as

zero cell count. This situation occurs when a level of a categorical variable has only

one outcome score (i.e., YÂ€=Â€1 or YÂ€=Â€0). Zero cell count can be detected during the initial data screening. There are several options for dealing with zero cell count. Potential

remedies include collapsing the levels of the categorical variable to eliminate the zero

count problem, dropping the categorical variable entirely from the analysis, or dropping cases associated with the level of the “offending” categorical variable. You may

also decide to retain the categorical variable as is, as other parameters in the model

should not be affected, other than those involving the contrasts among the levels of the

categorical variable with that specific level. Allison (2012) also discusses alternative

estimation options that may be useful.

11.16â•‡CLASSIFICATION

Often in logistic regression, as in the earlier example, investigators are interested in

quantifying the degree to which an explanatory variable, or a set of such variables,

is related to the probability of some event, that is, the probability of YÂ€=Â€1. Given

that the residual term in logistic regression is defined as difference between observed

group membership and the predicted probability of Y = 1, a common analysis goal

is to determine if this error is reduced after including one or more predictors in the

model. McFadden’s RL2 is an effect size measure that reflects improved prediction

(i.e., smaller error term), and the likelihood ratio test is used to assess if this improvement is due to sampling error or reflects real improvement in the population. Menard

(2010) labels this type of prediction as quantitative prediction, reflecting the degree to

which the predicted probabilities of YÂ€=Â€1 more closely approximate observed group

membership after predictors are included.

Chapter 11

â†œæ¸€å±®

â†œæ¸€å±®

In addition to the goal of assessing the improvement in quantitative prediction, investigators may be interested or primarily interested in using logistic regression results

to classify participants into groups. Using the outcome from this chapter, you may be

interested in classifying adults as having a diabetes-free diagnosis or of being at risk

of being diagnosed with type 2 diabetes. Accurately classifying adults as being at risk

of developing type 2 diabetes may be helpful because adults can then change their

lifestyle to prevent its onset. In assessing how well the results of logistic regression can

effectively classify individuals, a key measure used is the number of errors made by

the classification. That is, for cases that are predicted to be of good health, how many

actually are and how many errors are there? Similarly, for those cases predicted to be

of poor health, how many actuallyÂ€are?

When results from a logistic regression equation are used for classification purposes,

the interest turns to minimizing the number of classification errors. In this context, the

interest is to find out if a set of variables reduces the number of classification errors, or

improves qualitative prediction (Menard, 2010). When classification is a study goal, a

new set of statistics then is needed to describe the reduction in the number of classification errors. This section presents statistics that can be used to address the accuracy

of classifications made by use of a logistic regression equation.

11.16.1Â€Percent Correctly Classified

A measure that is often used to assess the accuracy of prediction is the percent of cases

correctly classified by the model. To classify cases into one of two groups, the probabilities of Y = 1 are obtained from a logistic regression equation. With these probabilities,

you can classify a given individual after selecting a cut point. AÂ€cut point is a probability

of YÂ€=Â€1 that you select, with a commonly used value being .50, at or above which results

in a case being classified into one of two groups (e.g., success) and below which results

in a case being classified into the other group (e.g., failure). Of course to assess the accuracy of classification in this way, the outcome data must already be collected. Given that

actual group membership is already known, it is a simple matter to count the number of

cases correctly and incorrectly classified. The percent of cases classified correctly can be

readily determined, with of course higher values reflecting greater accuracy. Note that if

the logistic regression equation is judged to be useful in classifying cases, the equation

could then be applied to future samples without having the outcome data collected for

these samples. Cross-validation of the results with an independent sample would provide additional support for using the classification procedure in thisÂ€way.

We use the chapter data to obtain the percent of cases correctly classified by the full

model. TableÂ€11.4 uses probabilities obtained from estimating EquationÂ€9 to classify

cases into one of two groups: (1) of good health, a classification made when the probability of being of good health is estimated by the equation to be 0.5 or greater; or (2)

of poor health, a classification made when this probability is estimated at values less

than 0.5. In the Total column, TableÂ€11.4 shows that the number of observed cases that

did not exhibit good health was 116, whereas 84 cases exhibited good health. Of the

459

## 2016 keenan a pituch, james p stevens applied multivariate statistics for the social sciences analyses with SAS and IBMs SPSS routledge (2015)

## 2 Type I Error, Type II Error, and Power

## 2 Addition, Subtraction, and Multiplication of a Matrix by a Scalar

## K-Group MANOVA: A Priori and Post Hoc Procedures

## 14 Power Analysis—A Priori Determination of Sample Size

## 9 MANCOVA—Several Dependent Variables and Several Covariates

## 9 Sample Size for Power = .80 in Single-Sample Case

## 5 Example 1: Examining School Differences in Mathematics

## 8 Example 2: Evaluating the Efficacy of a Treatment

## 6 Example 1: Using SAS and SPSS to Conduct Two-Level Multivariate Analysis

## 7 Example 2: Using SAS and SPSS to Conduct Three-Level Multivariate Analysis

Tài liệu liên quan