The world’s leading publication for data science, AI, and ML professionals.

Predicting Customer Churn using Logistic Regression

My first exposure to the Logistic Regression algorithm

Photo by Austin Distel on Unsplash
Photo by Austin Distel on Unsplash

What is Customer Churn?

The churn rate, also known as the rate of attrition or customer churn, is the rate at which customers stop doing business with an entity. It is most commonly expressed as the percentage of service subscribers who discontinue their subscriptions within a given time period. A high customer churn rate indicates that the company is losing customers at an alarming rate. Customer churn can be attributed to myriad reasons and it is the company that needs to discover these reasons via patterns and trends that are present in customer data.

Modern businesses nowadays employ complex algorithms to predict customers that are most likely to churn, i.e. move away from the company. By using such algorithms, companies can know in advance, the customers that are most likely to give up the company’s services and therefore, come up with customer retention strategies to mitigate the losses that the company might face.


Problem Statement

The reason behind describing customer churn in the preceding paragraphs is because the goal of my next Machine Learning project is to develop an algorithm that can accurately predict customers who are most likely to churn.

As an added challenge, I tried to discover trends amongst customers who churn and ascertain the factors that are prevalent when a customer decides to terminate his/her contract. With the help of various visualization libraries that were at my disposal, I was able to figure out possible parameters that govern a customer’s decision to churn. These factors will be discussed in subsequent sections of this blog.


Work Flow

Customer Churn Prediction for Telecommunication Industry is a Kaggle Machine Learning problem. Therefore, procuring the data for this project was straightforward. I downloaded the data set from Kaggle and loaded it into my jupyter notebook. Additionally, I also imported the requisite libraries which were essential for completing this project. The only task required now was to analyze the data, clean it, and train an ML model using the cleaned data set.

Data Exploration and Data Engineering

Step 1: Checking for missing values.

One of the most important steps while exploring a data-set is to search for missing values. Missing values have the capability to hinder the training process of an ML algorithm as well as affect the accuracy of the trained model. After using pandas’ isnull() function on the data-set, I discovered that there are no missing values in the data set. Therefore, I moved onto the data exploration step.

List of Features along with the count of missing values. (Image by author)
List of Features along with the count of missing values. (Image by author)

Step 2: Data Exploration.

As I am now certain that there are no missing values in my data-set, I can turn my attention towards exploring the data on hand. Data Exploration is a crucial step since it allows me to familiarise myself with the different features present in a data-set as well as the type of values that each feature column holds. Key points about the data that I discovered during the data exploration phase are listed below.

  • There are in total, 17 categorical features, and 4 continuous features.
  • Apart from "SeniorCitizen", "tenure", and "MonthlyCharges", all other features are of data type Object.
  • Lastly, the data-set contains 1,869 records of customers who have churned and 5,163 records of customers who haven’t churned.

Step 3: Data Visualiztion

Data visualization is a key aspect of any Machine Learning or Data Science project. Visualizations often provide a birds-eye view of the data that allows a Data Scientist or an ML Engineer to discern trends and patterns from the data on hand. I used seaborn library’s countplot function to plot categorical features and then, tried to discover trends, prevalent amongst customers that churn. Output of the aforementioned visualization task can be seen in the image below.

(Image by author)
(Image by author)

From the above image, I was able to pick out a few interesting points that were prevalent among customers who churn. The same have been listed below:

  • Churn is almost comparable in Males and Females.
  • Churn is low among senior citizens.
  • Churn rate is higher for customers who have phone services.
  • Churn rate among customers with partners & dependents is lower than customers who don’t have partners & dependents.
  • Customers with an electronic payment method have a higher churn rate compared to other payment methods.
  • Customers with no internet service have a lower churn rate.
  • Churn rate is much higher in the case of Fiber Optic Internet Services.

Such visualizations often help companies discover probable causes for customer churn. For example, customers who make their payments electronically are more likely to churn, probably because of some inconvenience they face while making electronic payments, which the company can look into, or customers who have opted for Fiber Optic Internet Services contribute significantly towards the company’s attrition rate, this may hint towards customer dis-satisfaction towards the Fiber Optic Internet Services offered by the company, and the company can look into this matter and resolve the issue at the earliest. Data Visualization is an important step since it provides actionable insights and helps a company make informed decisions.

Step 4: Data Engineering

Since the data exploration stage is now completed, and I have a good understanding of the data-set, I move on to the data engineering part of the project. Firstly, I got rid of the "Customer Id" column as it will not affect the prediction capabilities of our ML model. Next, I converted all the categorical features into One-hot encoded features. Lastly, the data-set was split into training and testing data to facilitate the training of an ML model.

Machine Learning Model Creation and Assessment

Step 1: Train a Machine Learning model.

With the data exploration and engineering stage now complete, I move on towards training my Machine Learning model. Customer churn prediction is a classification problem therefore, I have used Logistic Regression algorithm for training my Machine Learning model. In my opinion, Logistic Regression is a fairly easy algorithm to implement, interpret, and very efficient to train. Moreover, it works really well on numerical data. Lastly, the "l1" and "l2" regularization techniques that can be implemented via Logistic Regression prevent the model from over-fitting.

I imported the Logistic Regression model from the sklearn library and used the "fit" function on the training data. As soon as the fit operation was completed I moved on to the next step which was to assess the performance of my ML model.

Step 2: Assessing Model Performance.

This is the final step of my Machine Learning project, which is to test the performance of my ML model. This step is crucial since I can gauge the accuracy of my ML model on unseen customer data. To determine the performance of my ML model, I used the test data and calculated the accuracy score as well as the confusion matrix for the predicted labels. The accuracy score for the Logistic Regression model was 81.5%. A total of 1148 labels were accurately predicted whereas 259 labels were incorrectly predicted.


Conclusion

This Machine Learning project on Customer Churn Prediction provided me a new learning experience. It allowed me to work with various visualization libraries and made me realize the importance of Data Visualization. It also gave me the opportunity to work with a new ML algorithm, that is Logistic Regression. The project was a prime example of Machine Learning in action, in the real world, and working on this project helped me, hone my skills, in the domain of Machine Learning.


The workflow followed by me for this project can be found on my _Github_ profile. I hope you enjoyed reading my blog.


Related Articles