Aim is to illustrate activities typically happens in data analysis life cycle. Of course, there will be some example in real life where some of the life cycle stages may not happen. But it is important to follow a structured approach for data analytics.
An analysis starts with a business objective or problem statement. For example, business problem could be that average banking product holding of the customers is very low (retail banking scenario) or retailer wants to launch a promotion campaign on a television. In a few cases, analytics team can also proactively form a list of hypotheses and develop insights for the management to act. Once an overall business problem is defined, the problem is converted into an analytical problem. Consider the example of retail promotion campaign, one of the important questions could be to find the target customer segments. Once target segment is defined, the retailer could decide the TV channel and timing of the campaign. So, assume that building the customer segmentation is the analytical problem.
- Data Manipulation
Once the business problem is defined, next stage is data manipulation. Data manipulation involves
- Extraction: Pulling data from different systems
- Transformation: Aggregation of the transactions and activities at a particular level
- Descriptive Analysis and visualization: Understanding of the variables values and distribution is an important step in data analysis. Analyst looks at the minimum, maximum, average and variance values of the continuous variables. Box plot could be used. A frequency plot for categorical variables could also be required and is relevant.
- Treatment: The input variables may have missing or outlier values. A scatter plot may be helpful to see the distribution of variable values.
For the above example of customer segmentation, we may require to pull transactions, payments, channel interactions and customer demographic data. Since we may require building segmentation at a customer level, the transaction or channel data needs to be aggregated at a customer level. Some customers may not have made any transactions; hence variable values are missing. The variable could be treated with zero or by any other values.
- Data Analysis and Modeling
Data Analysis involves multiple steps. Typically, the first analysis is Exploratory Data Analysis (EDA). EDA helps in understanding the data trends and patterns.
After EDA, a relevant statistical technique is selected based on the business problem. Using the relevant techniques, the statistical model is built or analysis is completed. The model or analysis insights are validated on a training data set. The exact steps and sequence may be different for different types of analysis and techniques used.
In summary, data analysis/modeling steps
- Exploratory Data Analysis (EDA)
- Statistical Technique Selection
- Model Building or Analysis
- Validation of Results
For the customer segmentation following approach can be followed. Customer Segmentation Approach
- Action on Insights or Deployment
Analysis output is typically used in two ways – informing decision makers or deploying in the system. Sometimes an analysis is carried out for decision makers to understand and be aware of the customer behaviors or business performance. For example, what is profile of the customers who responded to a particular campaign? Or what is risk profile of the customers acquired from online channels? In these cases there is no mathematical or statistical relationship between target objective and input variables.
In another example of predicting customer churn, we may build a statistical model which can be deployed in the system. So at a regular interval customers who are at the risk of closing a relationship (attrition or churn) are identified.
In the example of customer segmentation, the relevant customer segment is identified. Based on the channel/serial the target segment is likely to watch, the appropriate advertisement campaign is designed and promotion schedule is selected.
- Learning and Guiding
Once a statistical model or analysis output is deployed, the performance is monitored and analyzed regularly to understand and improve the business performance.
In the customer segmentation and promotion scenario, if the impact of promotion is not significant, alternate channel or slot could be considered instead of wasting the marketing budget.