In decision making scenarios, the decision makers look for ways to segregate one set of items, customers or transactions from another. A few examples where Decision Tree can be used to segregate or separate intended objects, customers or transactions are:
- Cross Sell Manager: Identify the customers who will be interested to take up a product
- Fraud Manager: Segregate the transactions which could be fraudulent
- Credit Card Risk officer: Reject the credit card applications which are less likely to default
- Health Claim approver: Approve the claims which are non-fraudulent
- Maintenance manager: Identify products/Spare-parts which require preventive maintenance
Decision Tree is objective segmentation techniques. Objective segmentation has a mechanism to define the segments or has a target variable to evaluate segments being created. The segments will be defined based on input variables or attributes which captures information about the objects, customers or transactions.
For example, based on historical data on fraudulent transactions, Fraud Manager is interested in segmenting the transactions into fraudulent and non-fraudulent. In the process, Fraud Manager is looking for transaction attributes such as source and location of the transactions to segment, and conditions such as amount above or below $10000 for the segmentation. Once Fraud Manager has the insights or rule to segments the transactions with required confidence level, it will apply the rules on future to transactions to classify them as fraudulent or non fraudulent.
Decision Tree Technique helps in developing rules to segment the items or objects. The objects could be transactions or account or customers. The decision tree building process segments based on available input variables or independent variables. The independent variables are also called Predictors.
Below illustrative example has target variables as whether to play golf or not based on certain environmental conditions. The predictor variables are Outlook, Temperature, Humidity and whether it is windy. Decision Tree will help in identifying the rules or conditions when to play golf.
Above example illustrate only classification decision tree. A decision tree can be used for regression type of problems (for example identifying rules to calculate loan amount – who are interested for higher loan amount) and multi-level decision type of problems (for example develop rule to identify who is interested in Silver, Platinum and Titanium credit cards ).
Terminology Used in Decision Tree
Type of Decision Trees: Mainly there are two types of decision tree – classification tree or regression tree.
Decision Tree Algorithm: Different algorithms are developed to build a decision tree. Some of the most commonly used are CART, CHAID, ID5 and MARS.
Similarity Measure: Number of similarity measures proposed. These measure group items or objects together based on similarity measures
- Entropy: ID3 algorithm uses entropy to calculate the homogeneity of a sample.
- Chi-square Statistics: CHAID uses Chi-square p value to select an attribute and split a data table
- Information Gain: Information gain is based on the decrease in entropy after a dataset is split on an attribute