Tải bản đầy đủ - 0 (trang)
Figure 1.4: Relationship between PD, LGD, EAD and RWA

Figure 1.4: Relationship between PD, LGD, EAD and RWA

Tải bản đầy đủ - 0trang

8 Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT

mechanism to exploit the power of SAS and publish dynamic results throughout the organization through a

point-and-click interface. SAS Enterprise Miner (Figure 1.6) is a powerful data mining tool for applying

advanced modeling techniques to large volumes of data in order to achieve a greater understanding of the

underlying data. SAS Model Manager (Figure 1.7) is a tool which encompasses the steps of creating, managing,

deploying, monitoring, and operationalizing analytic models, ensuring the best model at the right time is in

production.

Typically analysts utilize a variety of tools in their development and refinement of model building and data

visualization. Through a step-by-step approach, we can identify which tool from the SAS toolbox is best suited

for each task a modeler will encounter.

Figure 1.5: Enterprise Guide Interface



Chapter 1: Introduction 9

Figure 1.6: Enterprise Miner Interface



10 Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT

Figure 1.7: Model Manager Interface



Chapter 1: Introduction 11



1.5 Chapter Summary

This introductory chapter explores the key concepts that comprise credit risk modeling, and how this impacts

financial institutions in the form of the regulatory environment. We have also looked at how regulations have

evolved over time to better account for global risks and to fundamentally prevent financial institutions from

over exposing themselves to difficult market factors. To summarize, Basel defines how financial institutions

calculate:













Expected Loss (EL) - the means loss over 12 months

Unexpected Loss (UL) - the difference between the Expected Loss and a 1 in 1000 chance level of

loss

Risk-Weighted Assets (RWA) - the assets of the financial institution (money lent out to customers &

businesses) accounted for by their riskiness

How much Capital financial institutions hold to cover these losses



Three key parameters underpin the calculation of expected loss and risk weighted assets:











Probability of Default (PD) - the likelihood that a loan will not be repaid and will therefore fall into

default in the next 12 months

Loss Given Default (LGD) - the estimated economic loss, expressed as a percentage of exposure,

which will be incurred if an obligor goes into default - in other words, LGD equals: 1 minus the

recovery rate

Exposure At Default (EAD) - a measure of the monetary exposure should an obligor go into default



The purpose of these regulatory requirements is to strengthen the stability of the banking system by ensuring

adequate provisions for loss are made.

We have also outlined the SAS technology which will be used through a step-by-step approach to apply the

theoretical information given into practical examples.

In order for financial institutions to estimate these three key parameters that underpin the calculation of EL and

RWA, they must begin by utilizing the correct data. Chapter 2 covers the area of sampling and data preprocessing. In this chapter, issues such as variable selection, missing values, and outlier detection are defined

and contextualized within the area of credit risk modeling. Practical applications of how these issues can be

solved are also given.



1.6 References and Further Reading

Basel Committee on Banking Supervision. 2001a. The New Basel Capital Accord. Jan. Available at:

http://www.bis.org/publ/bcbsca03.pdf.

Basel Committee on Banking Supervision. 2004. International Convergence of Capital Measurement and

Capital Standards: a Revised Framework. Bank for International Settlements.

SAS Institute. 2002. “Comply and Exceed: Credit Risk Management for Basel II and Beyond.” A SAS White

Paper.

Schuermann T. 2004. “What do we know about loss given default?” Working Paper No. 04-01, Wharton

Financial Institutions Center, Feb.



12 Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT



Chapter 2 Sampling and Data Pre-Processing

2.1 Introduction.........................................................................................................13

2.2 Sampling and Variable Selection .........................................................................16

2.2.1 Sampling............................................................................................................................. 17

2.2.2 Variable Selection.............................................................................................................. 18

2.3 Missing Values and Outlier Treatment.................................................................19

2.3.1 Missing Values ................................................................................................................... 19

2.3.2 Outlier Detection ............................................................................................................... 21

2.4 Data Segmentation ..............................................................................................22

2.4.1 Decision Trees for Segmentation .................................................................................... 23

2.4.2 K-Means Clustering .......................................................................................................... 24

2.5 Chapter Summary................................................................................................25

2.6 References and Further Reading .........................................................................25



2.1 Introduction

Data is the key to unlock the creation of robust and accurate models that will provide financial institutions with

valuable insight to fully understand the risks they face. However, data is often inadequate on its own and needs

to be cleaned, polished, and molded into a much richer form. In order to achieve this, sampling and data preprocessing techniques can be applied in order to give the most accurate and informative insight possible.

There is an often used expression that 80% of a modeler’s effort is spent in the data preparation phase, leaving

only 20% for the model development. We would tend to agree with this statement; however, developing a clear,

concise and logical data pre-processing strategy at the start of a project can drastically reduce this time for

subsequent projects. Once an analyst knows when and where techniques should be used and the pitfalls to be

aware of, their time can be spent on the development of better models that will be beneficial to the business.

This chapter aims to provide analysts with this knowledge to become more effective and efficient in the data

pre-processing phase by answering questions such as:











Why is sampling and data pre-processing so important?

What types of pre-processing are required for credit risk modeling?

How are these techniques applied in practice?



The fundamental motivations behind the need for data cleansing and pre-processing are that data is not always

in a clean fit state for practice. Often data is dirty or “noisy;” for example, a customer’s age might be incorrectly

recorded as 200 or their gender encoded as missing. This could purely be the result of the data collection

process where human imputation error can prevail, but it is important to understand these inaccuracies in order

to accurately understand and profile customers. Others examples include:







Inconsistent data where proxy missing values are used; -999 is used to determine a missing value in

one data feed, whereas 8888 is used in another data feed.

Duplication of data; this often occurs where disparate data sources are collated and merged, giving an

unclear picture of the current state of the data.



14 Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT







Missing values and extreme outliers; these can be treated, removed or used as they are in the modeling

process. For example, some techniques, such as decision trees (Figure 2.1), can cope with missing

values and extreme outliers more effectively than others. Logistic regression cannot handle missing

values without excluding observations or applying imputation. (This concept will be discussed in more

detail later in the chapter).



Figure 2.1: Handling Missing Values in a Decision Tree



A well-worn term in the field of data modeling is “garbage in, garbage out,” meaning that if the data you have

coming into your model is incorrect, inconsistent, and dirty then an inaccurate model will result no matter how

much time is spent on the modeling phase. It is also worth noting that this is by no means an easy process; as

mentioned above, data pre-processing tends to be time consuming. The rule of thumb is to spend 80% of your

time preparing the data to 20% of the time actually spent on building accurate models.

Data values can also come in a variety of forms. The types of variables typically utilized within a credit risk

model build fall into two distinct categories; Interval and Discrete.

Interval variables (also termed continuous) are variables that typically can take any numeric value from −∞ to

∞ . Examples of interval variables are any monetary amount such as current balance, income, or amount

outstanding. Discrete variables can be both numeric and non-numeric but contain distinct separate values that

are not continuous. Discrete variables can be further split into three categories: Nominal, Ordinal, and Binary.

Nominal variables contain no order between the values, such as marital status (Married, Single, Divorced, etc.)

or Gender (Male, Female, and Unknown). Ordinal variables share the same properties as Nominal variables;

however, there is a ranked ordering or hierarchy between the variables for example rating grades (AAA, AA,

A…). Binary variables contain two distinct categories of data, for example, if a customer has defaulted (bad

category) or not defaulted (good category) on a loan.



Chapter 2: Sampling and Data Pre-processing 15

When preparing data for use with SAS Enterprise Miner, one must first identify how the data will be treated.

Figure 2.2 shows how the data is divided into categories.

Figure 2.2: Enterprise Miner Data Source Wizard



The Enterprise Miner Data Source Wizard automatically assigns estimated levels to the data being brought into

the workspace. This should then be explored to determine whether the correct levels have been assigned to the

variables of interest. Figure 2.3 shows how you can explore the variable distributions.



Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Figure 1.4: Relationship between PD, LGD, EAD and RWA

Tải bản đầy đủ ngay(0 tr)

×