Customer Analytics in Financial Services

Customer Analytics is applying statisical, machine learning and data mining technique to understand about the customers and grow the business.

Customer analytics is used across industries  to acquire right customers cost effectively, engaging & servicing existing customers and retaining the profitable customers.

Financial Services prganization leverage customer analytics to understand user preferences, build right product, sell or cross sell products to prospect and existing customers, and retain the profitable customers.

Read  more on customer analytics


Customer Analytics

Customer analytics has been used by the organization across industry verticals. The customer analytics can help in acquiring right customers, growing engagement from the existing customers and retaining most profitable customers.

Banks and Financial Institutions are using analytics for understanding customer behaviour, Identifying customer needs and preferences and Engaging the customers.

Read more on “Customer Analytics in Financial Services”

Health Triggers and Preventive Medication

Our body send signals related to some of the challenging diseases well before the situation become critical and quite often we ignore them.

As mentioned in the article on “These 4 Things Happen Right Before a Heart Attack” 1, body send some signals and some of these signals are related to Heart Attack.

Data Science and Analytics can help in linking the triggers which are strongly related to the diseases.  And educating patients and customers about these signals can save lives. Some of the key activities are first defining the events or triggers.  Once we define the events or triggers, then important things would be to link them with the different medical complications.   Some of the statistical techniques and analytical frameworks can help in finding out the strength of relationship.  The event or trigger based insights can help  the medical practitioners to recommend require investigations or actions to the patients and reduce the health risk for the patients.

One of the other angle could potentially be identified would be to estimate time between an event and serious complications

If non significant events are used for recommendations and the patients would not take next recommendation (may be serious one)  positively and also inaccurate triggers will cause waste of money & time for medical practitioners.

The advantage of identifying the triggers are

  • Understanding the factors which are linked to a disease and why?
  • Taking measures for Preventive Medication for the clients/patients
  • Recommending life style and other changes for the clients
  • Reducing medical expenses and  complications for the clients



Relevance of Credit Scoring and Analytics

The banks and financial institutions have moved away from manual credit authorization and credit approval processes along back. The reasons of such migration are

  • Standardisation: Adopting a standard process for all the applicants instead of local branch manager taking decision which can have certain bias and discriminations.
  • Reduction of Risk: Measuring and reducing is one of the important competitive advantages for financial institutions. The centralised decision has helped in measuring and managing credit risk not only at the time of acquisition but across customer life cycle.
  • Technology: Due to advancement of technology and adoption of data warehouse based systems facilitated the banks and financial services to migrate to centralised & automated credit approval process.
  • Regulations: Basel and other regulation has played a role in banks and financial institutions adopting prudent approach of credit approval.
  • Data Driven Intelligence: Due to regulations, technology and data availability, the financial institutions have leveraged Data Driven Intelligence for credit approval process. This intelligence is called Credit Scoring which is built using statistical and machine learning techniques.

A person loan, credit card or Mortgage loan can rejected for various reasons but one of the main reasons is Bad Credit Score.

Credit Score for a person is calculated based on applicants person characteristics (Demographic Data), Payment History and Financial Conditions (such as Debt level, # of Credit Cards etc) Data.

Increased number of Banks and Financial Institutions are employing external data such as social media and web data.

These additional sources of data help them in improving accuracy of applicant risk level. Example, job losses  for employees from a company could lead to increase in risk level for the applicant from the company. The financial institution can be more prudent in underwriting credit to the applicants. The points to note is that not all applicants from the company will default and this could be one of the factor in credit approval.

Trends in Fraud Analytics

Fraud: How big is the challenge?

Quarter of a million people are impacted the fraud each year in UK alone and overall losses will be in billions.

Fraud Trend UK

Financial Institutes have to be vigilant to manage the fraud. Fraud has more than monetary impact. It affects customers trust on the financial systems overall and a specific financial institute. Considering vulnerability of online “Identify” theft, the financial institutes are taking extra measure to keep the customer confidence. 

Card Fraud: How is it being tackled?

Two-factor authentication:  Two factor authentication is a two-step security process to verify the personal identity. First verification is Personal Identification Number (PIN) or Password and second is physical device or token such as card or RSA Security token.   Banks and credit card provider has implemented the two factor authentication process and it did help the financial organizations to manage the fraud [2].

security device

There is also discussion to go with three factor authentication process which also requires biometric information in some form.  Of course, the user has to carry additional security device if the financial institution has provide a security token.

Chip & PIN:  Chip & PIN technology helps in increasing security of card transactions not only at Point of Sale (POS) but also when it is lost.  Due to Chip, it is difficult to create a duplicate version of the card as compared to that of magnetic strip card.  card with chip

When a card holder swipe a card at a Point of Sale (POS), it will ask for PIN, making a transaction extra secure.   The card holders require remembering their PIN and not all the Point of Sale (POS) machines are Chin & PIN enabled.  

Peer-to-peer fraud-reduction network: Similar to credit bureau – where all banks are sharing information about the customers who have default on an obligation (e.g. credit, personal loan or mortgage payment)- in case of Peer to Peer fraud reduction, merchants and  financial institutions share information such as IP address of the fraudsters .  RBS and Ethoca signed a deal on data sharing to manage fraud [1].

Case management systems: HSBC has worked with SAS to developed Case management system to manage the fraud[4].  Similarly there are a number of Case Management systems which screen the transactions and make a decision on whether the transaction is fraudulent.  Fraud Case Management from Pega, Intelligent Investigation Manager from IBM, Computer-Assisted Subject Examination and Investigation Tool (CASEit®) from PwC, FICO® Falcon® Fraud Manager and Aithent Case Management Solutions are a few examples.  These systems require analytics team to define the rules or flagging a transaction as potential fraudulent transactions.  Fraud Analytics and Modeling team works on building predictive models using diverse data sources and statistical & machine learning techniques.



Marketing Strategy to grow Mortgage Business

Geo Focused Strategy

The demand for mortgage loan is very closely linked to economic and developmental activities. Development activities may vary across states and cities; hence the financial institution should align their mortgage growth in tendon with geo-development. Also, supply of new mortgages will vary by different geography. Based on whether majority of mortgage loan demand is for new home or re-sale home, the customer acquisition strategy may be different.

Below graph illustrate mortgage price rise across cities in India for the year 2012. With the assumption is that prices and demand are linked (if higher mortgage demand, appreciation in mortgage price will be higher), the demand for mortgage across cities will be different. One caveat, the financial institutions will have to look at the longer term mortgage growth for building right geo focus strategy for mortgage growth.

 Mortgage Price Changes

Typically, the financial institutions have collaborations with builders to generate the leads for new mortgage.

Mortgage Advisor Led

Over the years, the customer channel behavior has undergone a huge change. A lot of customers are moving to direct channels such as web, social media channels. But, customers still trust or depend on financial advisor or mortgage broker for financial product discussions and especially for mortgage loan.

The bank or financial institution are moving or will move to have specialized financial advisor to help their customers make right decisions. And the Mortgage Brokers or Financial Advisors who are well equipped to help the customers will be differentiators in growing mortgage book.

The lenders will have to build right financial advisor incentive program to reward right behaviors. Also, how effectively data driven insights helps their advisors in understanding the customers and advising them appropriately.

For example, the leads for mortgage loan may have been generated based on predictive model or customer web activities such as customer enquire about the mortgage. The leads will be passed on to a financial advisor. The financial advisor will be able to help the customers appropriately if it knows about the customers. What is relationship of the customer with the financial institution? What are the demographic details?

Digital Platform: Social Media & Web

Contribution of digital marketing spend is going up. A lot of digital channel came into existence in the last few years. Social and Web channel are expected to key focus for most of the organizations. Financial Institutions have been slow and conservative in moving to digital channels. But gradually they are speeding up.

Increased numbers of customers do research online and read reviews before walk into a branch or take decisions, also trust peer reviews for their decisions3.

Financial Aggregator websites are also important in generating leads. The examples of aggregator websites are listed in the reference.

Social Media and Web can help the financial institutions to create leads including for mortgage product2, 3.  Based on target segment, the financial intuitions can advertise or engage prospective customers over the social or web channels.


  • Professionals aged between 30 and 40 years are the target segment. They have built fund for down payment and also settled in their professional career. They are looking to own their house.
  • Professional aged between 30 and 40 years are more likely to be active on social media channel “Social Media” and are members of these groups “Group A”, “Group B”
  • Formulate digital media strategy to engage the prospective customers from these groups
    • Brand engagement: Create awareness and positive perceptions about financial institution
    • Education Series around mortgage market, risk, opportunities etc
    • Create advocates
    • Assign financial advisors to these groups engage and share information


  4. Aggregators Websites  and

Data Analytics Life Cycle and Role of Big Data

Data Analytics and Big Data are getting importance and attention. They are expected to create customer value and competitive advantage for the business. We have depicted the Data Analytics Life Cycle in details. Considering focus around big data, an analysis is undertaken to understand impact of big data on data analytics life cycle. Typical analytics projects have following (column chart below) effort and time distribution. Of course, various factors influence time taken across data analytics life stages such as complexity of business problem, messiness of data (quality, variety and volume), experience of data analyst or scientist, maturity of analytics in an organization or analytical tools/systems. But, data manipulation is one of the biggest effort drains of analyst time1.

Effort Distribution across Data Analytics Life Cycle

What is an impact of big data across Data Analytics Life Cycle?

  • Understanding Business Objective

Big Data or any other technology plays little role in understanding the business objective and converting a business problem into an analytics problem. But the flexibility and versatility of the tools and technology guides in what all can or can’t be done.  For example, a brick and mortar retailer may have to launch a survey to understand customer sensitivity toward prices. But an eCommerce retailer may carry out an analysis using customers’ web visits – what different ecommerce website customers visit pre and post the visiting the eCommerce retailer.

  • Data Manipulation

Data manipulation requires significant effort from an analyst and the big data is expected to impact this stage the most. The big data will help an analyst in getting the result of a query quicker (Velocity of Big Data). Also, the big data facilitates accessing and using unstructured data (Variety of Big Data) which was a challenge in traditional technology. The data volume handling (Volume of big data) is expected to help by taking away a data volume processing constraint or improving the speed.  Statistical Scientists had devised sampling techniques to get rid of constraint of processing high volume of data. Though, big data can process high volume of data and the sampling techniques may not be required from this perspective. But the sampling is still relevant and required.

Speech Analytics and Big data Example: In one my previous experience Eureka Call miner3 was used to understand customers’ needs and concerns along with monitoring agent performance.  Due to size of the call volume and space requirements, only latest 2 weeks of data were available for an analysis. This was a constraint on what hypotheses can be validated. Due to big data technology, this constraint may not be relevant and many more hypotheses could be validated to add value to the end customers and the business.

  • Data Analysis and Modeling

Most of the machine learning and statistical techniques are available in traditional technology platform, so the value add of big data could be limited. One of the arguments in favour of machine learning in big data is “more data is fed to the machine learning algorithm more it can learn and higher would be quality of insights”2.  Many practitioners do not believe in simply volume leading to quality of insights.

Certainly having different dimensions of data such as customer web clicks and calls data will lead to better insights and improved accuracy of the predictive models.

  • Action on Insights or Deployment

Big Data has created a new wave in industry and there is a lot of pressure on organizations to think of big data. The big data technology is still maturing, but organizations are making investment to tap big data for competitive advantage. A few organizations such as Facebook and Amazon have already adopted and are using the big data.  The real differentiator between successful and non-successful originations will be rights insights and action on the insights.

 Big Data technology is expected to enables deployment of insights or predictive models quicker but more importantly speed to action on analytics will be almost in real time.

 Offer Recommendation on Web and Big Data

A generic offer is prevalent on a web without much success. A personalized and relevant offer is the customer expectation and the organizations are proceeding in this direction. One of the ways to identify customer needs is combining web clicks behavior and transactional behavior in a real time, and providing a personalized offer to the customer. This may be a realty using big data & big data analytics.

  • Learning and Guiding

Due to Big Data and Big Data Analytics, data analytics cycle time and cost is expected to come down. The cost reduction and shrinkage in cycle time will have propitious impact on analytics adoptions. The organizations will be open proceed toward experimentation and learning culture. Of course, this is not going to happen automatically.


Big Data is industry buzz word with a lot of focus, attention and investment. Big Data investment is going to add value to the customers and the business only if right insights are developed and actioned upon. Big data is going to impact each stage of Data Analytics life cycle, but the main value add (till Big Data analytics tools matures) will be around data manipulation.




Data Analysis Life Cycle

Aim is to illustrate activities typically happens in data analysis life cycle.  Of course, there will be some example in real life where some of the life cycle stages may not happen. But it is important to follow a structured approach for data analytics.

Data Analysis Life CycleBusiness Objective

An analysis starts with a business objective or problem statement. For example, business problem could be that average banking product holding of the customers is very low (retail banking scenario) or retailer wants to launch a promotion campaign on a television.  In a few cases, analytics team can also proactively form a list of hypotheses and develop insights for the management to act. Once an overall business problem is defined, the problem is converted into an analytical problem. Consider the example of retail promotion campaign, one of the important questions could be to find the target customer segments. Once target segment is defined, the retailer could decide the TV channel and timing of the campaign. So, assume that building the customer segmentation is the analytical problem.

  • Data Manipulation

Once the business problem is defined, next stage is data manipulation. Data manipulation involves

    • Extraction: Pulling data from different systems
    • Transformation: Aggregation of the transactions and activities  at a  particular level
    • Descriptive Analysis and visualization: Understanding of the variables values and distribution is an important step in data analysis. Analyst looks at the minimum, maximum, average and variance values of the continuous variables. Box plot could be used. A frequency plot for categorical variables could also be required and is relevant.
    • Treatment: The input variables may have missing or outlier values. A scatter plot may be helpful to see the distribution of variable values.

For the above example of customer segmentation, we may require to pull transactions, payments, channel interactions and customer demographic data.  Since we may require building segmentation at a customer level, the transaction or channel data needs to be aggregated at a customer level. Some customers may not have made any transactions; hence variable values are missing. The variable could be treated with zero or by any other values.

  • Data Analysis and Modeling

Data Analysis involves multiple steps. Typically, the first analysis is Exploratory Data Analysis (EDA). EDA helps in understanding the data trends and patterns.

After EDA, a relevant statistical technique is selected based on the business problem. Using the relevant techniques, the statistical model is built or analysis is completed. The model or analysis insights are validated on a training data set. The exact steps and sequence may be different for different types of analysis and techniques used.

In summary, data analysis/modeling steps

    • Exploratory Data Analysis (EDA)
    • Statistical Technique Selection
    • Model Building or Analysis
    • Validation of Results

For the customer segmentation following approach can be followed. Customer Segmentation Approach

  • Action on Insights or Deployment

Analysis output is typically used in two ways – informing decision makers or deploying in the system. Sometimes an analysis is carried out for decision makers to understand and be aware of the customer behaviors or business performance.  For example, what is profile of the customers who responded to a particular campaign? Or what is risk profile of the customers acquired from online channels? In these cases there is no mathematical or statistical relationship between target objective and input variables.

In another example of predicting customer churn, we may build a statistical model which can be deployed in the system. So at a regular interval customers who are at the risk of closing a relationship (attrition or churn) are identified.

In the example of customer segmentation, the relevant customer segment is identified.  Based on the channel/serial the target segment is likely to watch, the appropriate advertisement campaign is designed and promotion schedule is selected.

  • Learning and Guiding

Once a statistical model or analysis output is deployed, the performance is monitored and analyzed regularly to understand and improve the business performance.

In the customer segmentation and promotion scenario, if the impact of promotion is not significant, alternate channel or slot could be considered instead of wasting the marketing budget.

3 challenges in getting value from analytics investments

There are a lot of success stories of analytics applications. Organizations across industries from banks to sports have used analytics to create competitive advantages or finding winning ideas.

Tesco– one of the biggest retailer, Capital One – a leading credit card provider, Netflix – a movie rental organization, and Marriott International – a hotelier are some of the organizations which have employed analytics for sustainable competitive advantage.

Some of the common challenges or difficulties with analytics application for the business decisions are

    • Poor quality of data
    • Limited data or poorly structured data sample
    • Poor design of analytics deployment and over fitting the analytics

Above 3 hindrances limit the value addition of analytics deployment for improved business decisions

Poor quality of data

Data analytics and insights are based on input data and if the data has an issue the insights will be inaccurate. It is garbage in garbage out. So, the recommendation in such a scenario is not to use analytics or insights.  But organizations should focus on to improve quality of the data.

For one of the clients, at the end customer calls the customer service representatives enter the comments to capture the important points.  When we started looking at the data – unstructured data, we realized that comments are not really making sense from the business perspective or just illustrative general category of the call, which is already available as a structured column. This is not an isolated example.

One of the other issues with the data is a lot of missing values. But this is lesser of the devil. There are multiple approaches available for missing value treatment and analysis. One of the important points to keep in mind is to review the variables and find out the rationale around missing values. There may a business reason for missing value and the reason could be helpful e.g. in understanding customer behavior.  A few years back we were building a customer churn model for a telecom client and found that a variable had around 80% missing values. Typically analyst would exclude the variables with over 30-40% missing.  When we look at the variables, we found that one of the variables was “Value of international calls”. Of course, it is not expected that all the customers would be international callers. We have treated the variable and used in the model.

Limited data or poorly structured data samples

In the age of big data, you might be wondering why I am bringing this point. There is a difference between volume of data and diversification data. We may have huge volume of customer transactions for the recent period. We may have all customer interaction data but the not the calls or web interaction data.

 For developing good statistical model, we may not require high volume of data. The volume of data may necessarily improve model effectiveness or quality. But we have to be very careful in creating data sample for the statistical modeling and analysis.

Example:  If one wants to develop a mortgage customer attrition model, the sample data points used to build the model play an important role. The customer behavior in terms of attrition is influenced by economic condition – whether interest rate increasing or decreasing scenario.  So, relevant sample of data points be available and used in an appropriate way to bring out the right insights and patterns.

Poor design of analytics deployment and over fitting the analytics

One of the crucial aspects of Capital One’s analytics success story is running thousands of business experiments and learning from them.  The successful experiments are deployed on a larger scale. If analytics are not deployed properly, a limited learning and performance can be derived.

A lot of organizations use analytics in an ad-hoc way and any un-successful result is taken as an excuse of not using analytics in future.  But the main result is a poor design of analytics deployment.

What is measured can be managed but not improved upon. And for improving decisions, one has to synthesize and learn from the historical decisions-results, what works or do not work and why. A proper design of implementation plan before deployment will ensure that the insights can be generated on what works and why it works. Analytics deployment and learning is a systematic adaptive improvement mechanism, which is key in getting value from analytics investment and creating competitive advantage.


Thomas H. Davenport, and Jeanne G. Harris , Competing on Analytics: The New Science of Winning

Tactical and practical approach for treating outliers and missing values

Missing Data

There are multiple types of missing data e.g. Missing at Random (MAR) and Missing at Completely Random (MCAR) missing data. For details on the classification examples, users are referred to Little and Ruben1.  For satisfying both MAR and MCAR, the missing records or observation should not be related to specific information.  Example, when housewife fill up the information, the income field may be missing but for a reason.

There are multiple approaches available for missing value treatment and analysis. One of the important points to keep in mind is to review the variables and find out the rationale around missing values. There may a business reason for missing value and the reason could be helpful e.g. in understanding customer behavior.  A few years back we were building a customer churn model for a telecom client and found that a variable had around 80% missing values. Typically analyst would exclude the variables with over 30-40% missing.  When we look at the variables, we found that one of the variables was “Value of international calls”. Of course, it is not expected that all the customers would be international callers.

 Missing DataA few approaches on missing value treatment

  • Deletion of missing observations:  This approach can be adopted with assumption of Missing at Random (MAR) or Missing Completely at Random (MCAR) otherwise the sample could be bias.
  • Replacing with zero, mean or median values: This approach can also cause bias in mean or variance estimation.
  • Using Multiple Imputation2,3 techniques

In the graph, it seems some of the values are outliers, but actually they are missing values. Analysts have to be careful about these values. Some time the missing values are denoted with 99999 etc. In this case for missing date of birth (DOB) and Start Date, a default date is populated hence when age and years with an organization is calculated, it has some patterns with exceptionally high values

Outlier Data

Outlier data points are the observations and values which are significant beyond the typical values of a variable. For example, income of a successful businessman or COE may have a value which is significantly higher than the typical values. The inclusion of such observations may cause bias in estimates including mean or variance values.  The impact could be more pronounced on a sample depending on whether these observations are selected in a sample.


In a statistical or predictive modeling, the outliers could be two types, first outlier values for a dependent variable and second outlier values for a predictor. Outliers for predictor variables are also called leverage points.  Residual analysis for regression and graphical analysis are some of the ways to identify outliers.

Why outliers are important? How outliers are different from influential points? How outliers can be detected? How robust regression can help?

WOE Variable transforming for tackling missing and outlier observations

One of the practical approaches adopted by many practitioners while building predictive model using Binary Logistic Regression is transforming variables to Weightage of Evidence (WOE) variables.  WOE variable transformation is used for tackling both missing and outliers. Missing or outlier classes are grouped with other classes based on Weight of Evidence (WOE) using fine and coarse classing.


1 Little, R.J.A. & Rubin, D.B. (1987). Statistical analysis with missing data. New York: Wiley.