Churn Analysis Using Information Value and Weight of Evidence

IV and WOE analysis in Python (Telco customer churn)

Klaudia Nazarko
Towards Data Science

--

Photo by Marek Szturc on Unsplash

Customer Churn is one of the most important and challenging problems for businesses like banks, SAAS or telecommunication companies. Churn is expensive for the business since it costs more to acquire new customers than it does to retain the existing ones. Stakeholders invest a lot of time and effort in finding out how they can accurately distinguish existing customers that are about to leave. The answer to this question allows teams to take action.

Most of the churn analysis approaches focus on predicting which customers are about to churn. In order to do that, one can use logistic regression, decision trees or even neural networks. On the other hand, developing and implementing such models is time-consuming and requires a lot of resources. Sometimes the management only looks for answers which features may suggest that a customer is dissatisfied with the product, rather than for state-of-the-art predictive solutions.

This is where the attribute relevance analysis comes in. It has two important functions: recognition of variables with the greatest impact on target variable and understanding relations between most important predictor and target variable. In order to run this kind of analysis, Information Value and Weight of Evidence approach may be used. These two concepts are simple, yet powerful techniques to perform variable transformation and selection. What is more, in contrast to more sophisticated models, they provide high interpretability.

Let’s focus on running attribute relevance analysis on Telco dataset to understand customer churn. For detailed analysis with code, check out this GitHub repository.

Table of contents

  1. Telco Data
  2. IV and WOE Methodology
  3. Telco Churn Analysis
  4. IV & WOE vs Statistical Significance
  5. See Also & References

Telco Data

Telco dataset contains data about customer churn in a telephone service company. It includes information about customers who left within the last month, services that each customer has signed up for, customer account information and demographic information. It consists of 7043 records representing the company’s customers. Churn column indicates who opted out of services within the last month. There are 20 explanatory variables — both categorical and continuous features. There is class imbalance in the dataset — clients who churned make up only 27% of all customers.

Exploratory Analysis

This exploratory analysis gives us quick overview of customers’ characteristics. There is approximately equal number of male and female clients, around 50% of them has a partner. Senior customers are in the minority, constitute less than 20% of all customers. It’s important to notice that almost 60% of all users have month-to-month contract (other options: one and two year contract). There are four different payment methods: electronic check, mailed checked, bank transfer and credit card. Their popularity is on similar level, however electronic check has the biggest share.

Categorical variables in Telco dataset | Image by author

Analysis of continuous variables, shows that there aren’t any outliers in the data. It’s also worth noticing that (as expected): tenure * monthly charges gives approximate value of total charges.

Continuous variables in Telco dataset | Image by author

IV and WOE methodology

Information value (IV) and weight of evidence (WOE) are simple and powerful techniques of conducting attribute relevance analysis. They provide a great framework for exploratory analysis and have been used extensively in the credit risk world for several decades.

In the heart of IV & WOE methodology are groups (bins) of observations. For categorical variables, usually each category is a bin (however some smaller categories can be grouped together) while continuous variables need to be split into categories. Values are grouped according to the following rules [1] (source: listendata.com):

  • each bin should have at least 5% of the observations,
  • each bin should be non-zero for both non-events and events,
  • the WOE should be monotonic, i.e. either growing or decreasing with the groupings,
  • missing values should be binned separately.
Code for binning the values of variables

Weight of Evidence

The weight of evidence measures the predictive power of an independent variable in relation to the dependent variable. It has its roots in credit scoring world and it tells the degree of the separation of good and bad customers. “Good Customers” refers to the customers who pay back loan (non-events) and “Bad Customers” refers to the customers who fall behind with paying a loan (events). [1] (according to listendata.com)

Weight of Evidence formula

Positive WOE implies higher probability of paying back the loan (non-event, good customer) and negative WOE — the opposite.

Code for Weight of Evidence calculation

Information Value

Information Value gives a measure of how variable X is good in distinguishing between a binary response (e.g. “good” versus “bad”) in some target variable Y. Low Information Value of a variable X means that it may not classify the target variable on a sufficient level and should be removed as an explanatory variable. [2] (source: stackexchange.com)

Information Value formula
Code for Information Value calculation

In order to interpret IV, refer to the table below:

Interpretation of Information Value

Example:

Example of IV & WOE table

Telco Churn Analysis

In order to run the analysis, the variables of numerical type were recognized as continuous features while the others were processed as categorical features.

Running attribute relevance analysis revealed that the group of features recognized as strong predictors is quite big. Among them we can find eg payment method and total charges. They show strong relation between value of the feature and churn, thus they can be successfully used to predict churn of the customer.

At the same time, some of the variables (like contract, tenure and internet service) suggest that the relation between values of those features and churn is very strong — so strong that it should be examined carefully. It may suggest that there is some error in the data or that explanatory and explained variables are not independent (“suspicious” predictors according to IV interpretation).

IV values of variables in Telco dataset | Image by author

Detailed examination of weight of evidence provides interesting insights and shows some possible explainations to the data.

Tenure vs Total Charges

Analysis of total charges shows that customers who have already spent more than $678 are less likely to churn. To fully understand this relation it’s good to notice that total charges is the resultant of monthly charge and the tenure. Monthly charges variable tells us that users who pay $18 — $50 monthly are less likely to churn and the risk of churn increases with the increase of monthly fee. In this case, although total charges feature has higher IV, it seems more actionable to use monthly charges and tenure variables separately.

Contract

Analysis of ‘contract’ feature shows that WOE for month-to-month contracts is negative while it’s very high for two year contracts. It can be explained by the fact that customers with shorter contract have many more ‘churn moments’ (every month they need to renew their service) than customers with long-term contracts, thus their churn rate is much higher.

Internet service

Among the customers there are those who use DSL network, Fiber optic network or don’t use the internet service. It’s interesting to see that customers using Fiber optic are much more likely to churn — it may suggest some problems with the service.

Payment method

The company offers four different payment methods: bank transfer, credit card, electronic check and mailed check. The analysis shows that users who pay with electronic check are more likely to churn. In order to understand this relation it’s good to check which payment methods are recurring and what are the possible issues that customers have while using electronic check.

WOE values of variables in Telco dataset | Image by author

Churner Profile

IV & WOE analysis enables us to interpret data and define the churner profile:

  • Tenure: has been using the services for less than 3 months
  • Contract: has month-to-month contract
  • Monthly charges: pays more than $50 monthly
  • Internet service: Fiber optic
  • Payment method: electronic check
  • Additional Internet services: doesn’t use online security and tech support

IV & WOE vs Statistical Significance

While IV & WOE method is very useful and provides clear insights about the data, one may ask — what is the statistical significance of these results?

In order to answer this question, as an additional part of this analysis, p-value (chi-square test of independence of variables) and effect size (Cramers’ v effect size) were measured.

Code for p-value and Cramers’ v effect size calculation

P-value answers the question how likely it is that any observed difference between the sets arose by chance. It tells us if there is a significant relationship between variables, but it does not say just how strong and important it is. And here’s where effect size comes into play. Effect size is a way of quantifying the size of the difference between two groups. The larger the effect size, the stronger the relationship between two variables.

Analysis of features with both IV & WOE and Chi-square test & Cramers’V shows some interesting relations between results of those two methods.

P-value vs information value

P-value for almost all featues is very, very small (less than 0.01, which gives us 99% confidence level). Only the differences of distribution for two features (with IV = 0) aren’t statistically significant. It leads to the conclusion that for features that were recognized as at least medium predictor (the most interesting from analysis perspective) the differences in distribution of ‘goods’ and ‘bads’ arestatistically significant. However, it’s good to note that low p-value doesn’t give information about the strength of relationship.

Information value vs effect size

There is strong, almost linear, relationship between information value and effect size. Features with high information value have high effect size as well. Correlation coefficient for these values is: 0.94 (Pearson) and 0.98 (Spearman).

Relation between Information Value and Cramers’ v effect size | Image by author

To conclude, the differences in distribution of ‘goods’ and ‘bads’ for features recognized as at least ‘medium predictor’ usually are statistically significant. What is more, features with high IV, show high effect size as well.

See also (GitHub)

References

  1. https://www.listendata.com/2015/03/weight-of-evidence-woe-and-information.html
  2. https://stats.stackexchange.com/questions/93170/why-do-we-calculate-information-value
  3. https://towardsdatascience.com/attribute-relevance-analysis-in-python-iv-and-woe-b5651443fc04
  4. https://www.kaggle.com/pavansanagapati/weight-of-evidence-woe-information-value-iv
  5. https://medium.com/@sundarstyles89/weight-of-evidence-and-information-value-using-python-6f05072e83eb
  6. https://multithreaded.stitchfix.com/blog/2015/08/13/weight-of-evidence/
  7. https://en.wikipedia.org/wiki/Chi-squared_test
  8. https://en.wikipedia.org/wiki/Cram%C3%A9r%27s_V

--

--

Data Scientist @ Allegro.pl — Machine Learning Research Team — 💙— Passionate about data analysis, statistics and machine learning.