Tải bản đầy đủ

6…SPSS Procedures for Performing Factor Analysis on Defaulter Prediction

11.6

SPSS Procedures for Performing Factor Analysis on Defaulter Prediction

Step 2

Step 3

Step 4

Step 5

249

Move repayment behaviour into the Dependent variable box and age,

income, number of dependents, job, education and other loan into

Covariates boxMake sure Enter is the selected Method. (This enters all

the variables in the covariates box into the logistic regression equation

simultaneously). See Fig. 11.4

If you have categorical independent variable in the study, click on

Categorical and move all the categorical independent variables from the

left panel window of Covariates to right panel window of Categorical

Covariates to get Fig. 11.5. Then click on Continue to get back to

Logistic Regression window

Click on save to produce probabilities and group membership, which will

give Fig. 11.6. Then click on Continue to get back to Logistic

Regression window

Click on Options to produce Fig. 11.7. Click on Classification Plots and

Hosmer–Lemeshow Goodness of Fit. Then click on Continue to get

back to Logistic Regression window. Then Click on OK to get the

output window

Table 11.3 Defaulter prediction data (First 20 samples)

Account

Repayment

Age Gender Income Number

number

behaviour

dependents

Job

Education Other_loan

21.00

31.00

51.00

71.00

74.00

91.00

111.00

131.00

141.00

191.00

201.00

241.00

251.00

261.00

271.00

283.00

291.00

311.00

312.00

0.00

1.00

1.00

1.00

1.00

1.00

1.00

0.00

1.00

0.00

0.00

0.00

1.00

1.00

0.00

0.00

1.00

0.00

0.00

0.00

0.00

1.00

1.00

0.00

0.00

0.00

0.00

0.00

1.00

1.00

0.00

0.00

0.00

1.00

1.00

0.00

1.00

1.00

0.00

0.00

1.00

0.00

0.00

0.00

1.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

0.00

56.00

43.00

56.00

64.00

49.00

46.00

52.00

63.00

42.00

55.00

74.00

53.00

58.00

56.00

69.00

51.00

43.00

64.00

44.00

0.00

0.00

1.00

1.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

1.00

0.00

20.00

25.00

6.00

40.00

42.00

8.00

26.00

6.00

43.00

23.00

26.00

10.00

40.00

30.00

10.00

32.00

12.00

41.00

23.00

0.00

0.00

1.00

0.00

0.00

0.00

1.00

0.00

0.00

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

1.00

0.00

0.00

0.00

1.00

1.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

1.00

0.00

250

Fig. 11.3 IBM SPSS 20.0 binary logistic selection

Fig. 11.4 IBM SPSS logistic regression window

11 Binary Logistic Regression

11.6

SPSS Procedures for Performing Factor Analysis on Defaulter Prediction

Fig. 11.5 Defining of categorical independent variables

Fig. 11.6 Selection of probabilities and group membership

251

252

11 Binary Logistic Regression

Fig. 11.7 Logistic regression option window

11.7 IBM SPSS 20.0 Syntax for Binary Logistic Regression

GET

FILE = ’G:\LIBRARY\I-BOOK DEVELOPMENT\Logistic Regression.sav’.

DATASET NAME DataSet1 WINDOW = FRONT.

LOGISTIC REGRESSION VARIABLES RepaymentBehaviour

/METHOD = ENTER age Gender Income No_Dep Job Education

Other_Loan

/CONTRAST (Gender) = Indicator

/CONTRAST (No_Dep) = Indicator

/CONTRAST (Job) = Indicator

/CONTRAST (Education) = Indicator

/CONTRAST (Other_Loan) = Indicator

/SAVE = PRED PGROUP

/CLASSPLOT

/PRINT = GOODFIT

/CRITERIA = PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5).

11.8 IBM SPSS 20.0 Output for Logistic Regression

Table 11.4 shows the number of observations Included in Analysis and the

number of observation used (Total) in the analysis. The number of observations

Included in Analysis may be less than the Total if there are missing values for

11.8

IBM SPSS 20.0 Output for Logistic Regression

Table 11.4 Case processing summary

Unweighted casesa

Selected cases

Included in analysis

Missing cases

Total

Unselected cases

Total

a

253

N

Percent (%)

609

0

609

0

609

100.0

0.0

100.0

0.0

100.0

If weight is in effect, see classification table for the total number of cases

Table 11.5 Dependent variable encoding

Original value

Internal value

Defaulter

Non-defaulter

0

1

Table 11.6 Categorical variables codings

Other loan possessed by the

account holder

Number of people depended

on Account holder

Fixed job versus temporary

Education of the account holder

(School vs. College)

Gender of the account holder

Yes

No

2 or \2 dependents

More than 2 dependents

Permanent Job

Temporary Job

School education

College education

Male

Female

Frequency

Parameter coding

(1)

511

98

443

166

222

387

354

255

487

122

1.000

0.000

1.000

0.000

1.000

0.000

1.000

0.000

1.000

0.000

any variables in the equation. By default, SPSS does a listwise deletion of

incomplete cases. In the current example, both are same, because there are no

missing values.

Table 11.5 shows the categorical representation for the dichotomously dependent variable-Repayment Behaviour. 0 used for defaulter category and 1 for nondefaulter category.

Table 11.6 explicitly shows the labelling used by the researcher to represent the

categorically independent variables.

The Classification Table shown in Table 11.7. As mentioned earlier, it is

common practice to use 0.5 as the cut-off for predicting occurrence. That is, to

predict non-occurrence of the event of interest whenever p \ 0.5 and to predict

occurrence if p [ 0.5. The Classification table indicates how many correct and

incorrect predictions would be made for a wide range of probability cut-off points

used for the model. In this case, 88.3 per cent of the cases are correctly classified

254

11 Binary Logistic Regression

Table 11.7 Classification tablea,b

Observed

Predicted

Repayment behaviour

Step 0

Repayment behaviour

Defaulter

Non-defaulter

Percentage correct

Defaulter

Non-defaulter

538

71

0

0

100.0

0.0

88.3

Overall percentage

a

b

Constant is included in the model

The cut value is 0.500

Table 11.8 Variables not in the equation

Step 0

Variables

Age

Gender(1)

Income

No_Dep(1)

Job(1)

Education(1)

Other_Loan(1)

Overall statistics

Table 11.9 Omnibus tests of model coefficients

Chi square

Step 1

Step

Block

Model

46.798

46.798

46.798

Score

df

Significant

11.819

16.246

17.302

7.483

5.429

11.536

6.775

49.121

1

1

1

1

1

1

1

7

0.001

0.000

0.000

0.006

0.020

0.001

0.009

0.000

df

Significant

7

7

7

0.000

0.000

0.000

using the 0.50 cut-off point, which is similar to ‘Hit Ratio’ in discriminant

analysis.

Table 11.8 shows how individually these independent variables predict the

dependent variable. In this study, all the variables are found to be significant at 5

per cent (p \ 0.05).

In SPSS, we can adopt different methods to prove the contribution or predictability of this independent variable on the dichotomously dependent variable. One

of the test that SPSS follows is Omnibus Tests of Model Coefficients in

Table 11.9. This test will give an inference that, when we consider all the independent together, the model specified is significant or not. In this example, it found

that all the variable taken together the specified Model is significant (X2 = 46.79,

df = 7, N = 75, p \ 0.001).

11.9

Assessing a Model’s Fit and Predictive Ability

255

11.9 Assessing a Model’s Fit and Predictive Ability

There are several statistics printed by SPSS that can be used to assess model fit.

The important among them are as follows:

(i) The R2 table, which is the Cox and Snell R2, generalized coefficient of

determination. The closer the values of R2 to 1, the better the fit of the model.

Cox and Snell R2 may not achieve a maximum value of 1. The second R2,

Nagelkerke R2, is a better one to use (Table 11.10).

(ii) Observe the Hosmer and Lemeshow tables shown in Table 11.11. SPSS

computes a Chi square from observed and expected frequencies in the

Table 11.12. Large Chi square values (and correspondingly small p-values)

indicate a lack of fit for the model. In our example, the Hosmer and Lemeshow

Chi square test for the final model yields a p value of 0.225, thus suggesting a

model with satisfactory predictive value. Note that the Hosmer and Lemeshow

Chi square test is not a test of importance of specific model parameter

In Table 11.13, Estimates are the binary logit regression estimates or coefficients for the Parameters in the model. The logistic regression model models the

Table 11.10 Model summary

Step

-2 Log likelihood

391.761a

1

a

Cox & Snell R2

Nagelkerke R2

0.074

0.144

Estimation terminated at iteration number 6 because parameter estimates changed by \0.001

Table 11.11 Hosmer and lemeshow test

Step

Chi square

df

Significant

1

8

0.092

13.636

Table 11.12 Contingency table for Hosmer and Lemeshow test

Repayment behaviour = defaulter

Repayment behaviour = non-D

Step 1

1

2

3

4

5

6

7

8

9

10

Observed

Expected

Observed

Expected

62

59

60

57

51

55

54

57

46

37

60.404

59.536

57.839

57.967

56.241

55.491

54.302

51.725

47.489

37.006

0

3

1

5

10

6

7

4

15

20

1.596

2.464

3.161

4.033

4.759

5.509

6.698

9.275

13.511

19.994

Total

62

62

61

62

61

61

61

61

61

57

256

11 Binary Logistic Regression

Table 11.13 Variables in the equation

B

Step 1a

a

age

Gender(1)

Income

No_Dep(1)

Job(1)

Education(1)

Other_Loan(1)

Constant

0.035

-1.536

0.010

-0.244

-0.681

-0.861

1.749

-4.118

S.E.

Wald

df

Significant

Exp(B)

0.015

0.526

0.004

0.298

0.307

0.331

0.616

1.056

5.194

8.511

7.510

0.672

4.922

6.786

8.053

15.195

1

1

1

1

1

1

1

1

0.023

0.004

0.006

0.412

0.027

0.009

0.005

0.000

1.036

0.215

1.010

0.784

0.506

0.423

5.749

0.016

Variable(s) entered on step 1: age, Gender, Income, No_Dep, Job, Education, Other_Loan

log odds of a positive response (probability modelled for Non-Defaulter = 1) as a

linear combination the predictor variables. This is written as follows:

ProbNonÀdefaulter

1 À ProbNonÀdefaulter

¼ À4:118 þ 0:035 Ã Age À 1:536 Ã Gender þ 0:010 Ã Income

LOGITi ¼ ln

À 0:244 Ã Number of Dependents À 0:681 Ã Job

À 0:861 Ã Education þ 1:749 Ã OtherLoan

SPSS will give the output of both logistic coefficients and exponentiated

logistic coefficients. According to Hair et al. (2010), the original logistic coefficients are most appropriate for determining the direction of the relationship and

less useful in determining the magnitude of relationships. Exponentiated coefficients directly reflect the magnitude of the change in the odds value. Because they

are exponents, they are interpreted with slight difference. The exponentiated

coefficients less than 1.0 reflect negative relationships, while values above 1.0

denote positive relationships.

Age: This is the estimated logistic regression coefficient for the variable age,

given the other variables are held constant in the model. The difference in log-odds

is expected to be 0.035 units higher for non-defaulter compared with defaulter,

while holding the other variables constant in the model. We got an exponentiated

coefficient value of 1.036 for age. For assessing magnitude, the easier approach to

determine the change in probability from these values is:

Percentage change in odds = (Exponentiated coefficient-1.0) * 100

= (1.036-1) * 100 = 3.6 %

which means if the exponentiated coefficient is 1.036, a one unit change in the

independent variable will increase the odds by 3.6 %.

Gender (1): This is a dichotomous independent variable and we considered

male group (male = 1, female = 0) as our reference category. The value we

estimated is the estimate logistic regression coefficient for a one unit change in

gender, given the other variables in the model are held constant. The logit

## 2014 s sreejesh, sanjay mohapatra, m r anusree (auth ) business research methods an applied orientation springer international publishing (2014)

## Part IV: Multivariate Data Analysis Using IBM SPSS 20.0

## 3…Role of Business Research in Decision-Making

## 6…Business Research and the Internet

## 1…Steps in the Research ProcessSteps in the Research Process

## 2…Part I: Exploratory Research Design

## 3…Part II: Descriptive Research Design

## 6…Errors in Survey Research

## 8…Part III: Causal Research Design

## 11…Type of Experimental DesignsExperimental designs

## 1…Identifying and Deciding on the Variables to be Measured

Tài liệu liên quan