The world’s leading publication for data science, AI, and ML professionals.

Predict Customer Churn With Precision

Balancing precision and recall for actionable retention tactics

© Olivier Le Moal using Adobe Stock license
© Olivier Le Moal using Adobe Stock license

Accuracy, please take a back seat. We’ll be promoting Precision and Recall today.

"Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers" – Wikipedia

For this post, let’s agree on the universal assumption that "customer churn is bad", and "customer retention is good".

Abstract: This article takes a different slant on a well-traveled churn dataset that does not have the right features to blindly deploy even the best predictive model we can generate. Below I’ll share the problem statement, data preparation steps, feature analysis, visualizations and select Python code from the best of the Scikit-learn classification models I tried for predicting customer churn. Most importantly, I’ll show that by moving the decision threshold probability along the precision-recall curve, you may find tranches of churn cases where you feel confident enough to deploy real retention actions.

In the interest of brevity, I’m not including the full code used on this project. For full details, please refer to my Jupyter notebook on GitHub.


Business Objective

A telecommunications company (hereby "TelCo"), who sells residential Voice and Internet services, is experiencing a massive customer churn rate of nearly 27%. This level of churn may be enough to bankrupt a real-life company. Given the lack of publically available customer data, we are using the IBM Cognos Telco Customer Churn simulated data set which contains labeled churn for 7,043 customers.

TelCo wants to deploy customer retention strategies to reduce customer churn. The company has asked us to:

  • Develop a predictive model to classify customer churn risk
  • Explain the relative influence of each predictor on the model’s predictions
  • Suggest potential approaches to reduce customer churn

Here we have a binary classification problem to solve so we’ll set our target dependent variable to 1 (churn) or 0 (retain). We will use metric ROC AUC to optimize our estimators. ROC AUC is the Area Under the Curve of a Receiver Operator Characteristic.

Image by Martin Thoma, CC0, via Wikimedia Commons
Image by Martin Thoma, CC0, via Wikimedia Commons

With ROC AUC, a random classifier scores 0.5 while a perfect classifier scores 1.0. I like ROC AUC as it recommends models that optimize both the true positive and false positive rates that are significantly above random chance, which is not assured with accuracy as the evaluation metric. This is important since we need confidence in our positive class predictions (churn) when taking retention actions.

For more details on ROC AUC, I recommend this article.


Customer Churn Raw Data

Let’s start by taking a look at the 33 columns we have to work with. After converting our source file to a pandas data frame, the top 5 records show:

Screenshot by Author | First 19 columns...
Screenshot by Author | First 19 columns…
Screenshot by Author | ...last 14 columns
Screenshot by Author | …last 14 columns

It’s helpful to group columns by domain when thinking about their potential usefulness as predictors. There’s a data dictionary in the GitHub notebook, but here’s how I’m viewing these:

Image by Author
Image by Author
  • Customer – A unique customerID indicates 7,043 unique customers in our sample with customer tenure ranging from 0 to 72 months
  • Demographic – Four useful customer demographic fields
  • Geographic – Customer location down to geographic coordinates
  • Service – Eight billable services (Internet splits to Fiber, DSL or None) and two streaming indicators
  • Billing – Includes contract type, billing configuration and billed charges
  • Churn – Churn value (our target), Churn Reason, and two internal TelCo metrics for Churn Score and Customer Lifetime Value.

One note here. This data set has been broadly modeled using just 23 columns from a different Kaggle competition. I chose this version with 33 columns hoping that additional geographic fields would enable deeper insights. In the end, they didn’t pay off. It seems that when IBM put this data together, they allocated only 4 or 5 customers to each of the 1,653 California zip codes. After some effort to make the geographical fields useful, I backed off sensing diminishing returns.


Data Preparation

I wrote six functions for data preparation and feature creation. I exposed these functions through FunctionTransformer in a pipeline using scikit-learn libraries.

The most interesting 3 features I created were:

  • Mapping 4 payment methods into "Payment Method Automated" (1=Yes).
  • Mapping 8 services into "Product Profile" columns that delineate core service (phone, fiber/DSL, or bundled voice & internet) while also indicating layered "add-on" services like multi-line, tech support, etc.
  • Created "Customer Charge Index" which measures the relative pricing of each customer to standard prices. I used combinations of services across all training data and incremental average pricing to derive the standard, stand-alone price for each service. From there I indexed each customers monthly charges against the standard prices for their basket of services.
Image by Author
Image by Author

Prior to data profiling and feature analysis, I split the data into 80% train and 20% test. All analysis was performed only on training data to avoid data leakage into any predictive model.


Customer Churn Feature Analysis

Now on to some basic insights.

Tableau Chart by Author
Tableau Chart by Author

From above, the churn rate is lower for "has partner", "has dependents", "is not senior citizen" and "streams content". Gender was not differentiated.

Tableau Chart by Author
Tableau Chart by Author

There is a major issue with Month-To-Month contracts which show a 43% churn rate versus customers on term-based contracts (11% and 3%). Higher customer tenure reduces the churn rate on M-T-M contacts, but not until 4–5 years tenure does the churn rate achieve overall average of 26.7%. Clearly M-T-M contracts, and likely the associated pricing, are problematic.

Tableau Chart by Author
Tableau Chart by Author

From a product perspective, Internet Fiber on M-T-M contracts drives churn rates north of 50%! Note that TelCo always requires a bundling of Phone with Internet Fiber, and the largest group of customers are on M-T-M contracts for Fiber. Customers who add-on "Plus" services for their M-T-M contracts churn slightly less.

Tableau Chart by Author
Tableau Chart by Author

Customer tenure positively correlates with higher level of services and monthly charges (with statistically significant p-values). I’m assuming here that customers add services over time. We can also see from the above scatter plot that average monthly charges for M-T-M contracts with any Internet service are actually lower than term contracts. This is not intuitive, given the churn rate is higher. This likely indicates customers requiring M-T-M contracts are more price-sensitive than those willing to sign-up for terms.

Running Pearson correlations in Python largely confirms associations we’ve seen in prior plots.

Screenshot by Author
Screenshot by Author

Predicting Customer Churn

Tableau Chart by Author
Tableau Chart by Author

I trained four classification models on the 80% training split: Decision Tree, Logistic Regression, Random Forest and XGBoost.

For the latter 3 models, I used GridSearchCV to iterate through relevant parameters and refit the best estimator based on highest mean ROC AUC from 5-fold cross validation. Performance for the latter 3 models was similar.

I chose the XGBoost ensemble model as it had the highest AUC of 0.79, a strong recall of 0.83 and higher precision of 0.54.

I applied standard scaling to 16 features with an objective set to "binary:logistic" to predict the binary churn outcome in the XGBoost model.

A function instantiate_grid takes a model and grid parameters and establishes standard scoring metrics to instantiate the GridSearchCV object.

Through trial and error, I iterated through feature combinations and key parameters for XGBoost to optimize the model’s scoring. For all iterations, I included a positive class weight of 2.8 to correct the modest class imbalance of churn (e.g. 1) occurring 26.7% of time.

The best estimator’s final parameters chose only 4 max depth levels, 100 estimators, and a more conservative L2 regularization of 2.

Screenshot by Author
Screenshot by Author

Using the coefficients of the model as a proxy for feature importance, we see our 1-yr and 2-yr contracts (over the baseline M-T-M) are by far the strongest influence. Internet Fiber products and customers with Dependents had moderate importance, with the remaining features have minor impact.

Tableau Chart by Author
Tableau Chart by Author

The cross-validation and other parameters helped the model avoid overfitting as the scores on Test were in-line with Train.

Screenshot by Author
Screenshot by Author

And so we come to the confusion matrix for our Test predictions.

Screenshot by Author
Screenshot by Author

I was pleased with the Test predictive recall of 83% (311/374). Our XGBoost model can correctly predict 5 out 6 true churn cases which is great.

The best model’s precision, however, is not acceptable at 54% (311/571). Nearly 1 of 2 churn predictions from our model are not correct (false positives). To deploy the model with confidence, we need better precision.


Moving the Decision Threshold

With this model, I can’t recommend that TelCo blindly apply default predictions using the decision threshold (DT) of 0.5 churn probability. By assuming all customers with greater than 50% probability will churn, we may experience excessive costs, customer confusion and possibly unneeded churn. We want to move the decision threshold higher to a churn probability that gives us precision above 80%.

Screenshot by Author
Screenshot by Author

By plotting precision and recall scores as a function of decision threshold, we can see the probability required to achieve the precision we want. While we’ll limit the share of churn customers which we can address to under 20% of recall, we’ll be more confident in taking action with higher precision.

Screenshot by Author
Screenshot by Author

Using method predict_proba on our best estimator, I reassigned predicted classess for a variety of decision thresholds. I recommend splitting the model into a 3-tier approach, covering these incremental churn cases:

  1. High Precision: DT=0.9–1.0 / 84% Precision / 18% Recall (67 true pos, 13 false pos)
  2. Medium Precision: DT=0.8–0.9 / 71% Precision / 26% Recall (98 true pos, 40 false pos)
  3. Low Precision: DT=0.7–0.8 / 49% Precision / 19% Recall (70 true pos, 72 false pos)

TelCo can target their retention actions according to precision groups, covering 63% of all churn cases in these 3 scenarios:

  1. High Precision – target most aggressive retention like proactive outreach with discounts or adding free services to retain the mostly likely-to-churn customers.
  2. Medium Precision – target more general, and lower cost, approaches like empowerment of customer service agents to save customers in certain contexts.
  3. Low Precision – add these customers to watch lists, or general retention communications efforts.

Expanding Data Collection

We’ve seen how some basic data may be enough to predict customer churn with enough precision to take action. While a good start, I strongly recommend TelCo invest in data collection in these domains to improve churn predictions.

  • Product – Service Utilization, Quality Levels & SLA, Devices, Competitor Coverage
  • Customer Service – Contacts/Call/Emails, Tickets, Portal/App Usage, Sentiment Analysis
  • Marketing – Relative Price Point, Value Received, Pending Contract Renewal, Marketing Engagement, Lifetime Customer Value
  • Demographics – Credit Score, Income, Home Ownership, Household Size, Persona/Lifecyle, Time at Address
  • Geographic – Block/Neighborhood Linkage

Thanks for reading this post, and please leave comments.


Join Medium with my referral link – Chuck Utterback


Related Articles