Predictive Modeling techniques can be used for various scenarios. Some of the commonly used Statistical and Machine learning techniques are Logistic Regression, Decision Tree, Neural Network or Multiple Regression.
Logistic Regression can be used when target variable is binary or dichotomous. Though, Ordinal and Nominal target variables can also be modeled using relevant form of logistic regression. But the current discussion is focus when target or dependent variable is Binary (takes two values). One of the common applications in credit card scenario is approval or rejection of credit card application.
When a logistic regression technique is used for building predictive model, independent or exploratory variables are transformed using Weight of Evidence (WOE). Weight of Evidence (WOE) is a statistical concept which allows to group independent variable values based similarity of target variable distribution.
Two important steps adopted to transform the independent variables are Fine Classing and Coarse Classing. Independent variables are first discredized into a few classes based on variable values & frequency. Weight of Evidence (WOE) is calculated for these classes to further group together based on Weight of Evidence. The grouping of the fine classes into limited classes is called Coarse Classing.
Weight of Evidence ji = ln (% Goodi /%Badi)
i is fine class for an independent variable j
% Goodi is percentage “Good” represented by class i
% Badi is percentage “Bad” represented by class i
Information Value for each variable sum of IV for all the groups/fine classes
IVj = sum of Weight of Evidence ji across i
Attached SAS Macro creates specified number of groups for the input numeric variables and calculates WOE for each of the groups. It also calculates Information Value (IV) for the input independent variables.
SAS Macro helps analysts in creating fine classes for input independent variables and calculating Weight of Evidence (WOE).
- Check if target variable is binary and takes value 1 and 0
- For each input & independent variable, create specified number of groups (SAS PROC RANK)
- Summarize Good (Target Variable value=1) and Bad (Target Variable value=0) for each variable rank or groups
- Calculate Weight of Evidence(WOE) and Information Value (IV)
- Consolidate WOE and IV for all the variables
– Dependent or Target Variable value 1 is “Good” and 0 is “Bad”
– Only Numeric Independent or Exploratory Variables are considered
– Fine classes are created based on frequency of the class values
– Data is at right level- e.g. if model is built at customer level the dataset is available at customer level
– If input variable name is more than 25 characters then there may be an issue
– Appropriate SAS version