Data Driven Growth with Python

Customer Segmentation

Segmentation by RFM clustering

Barış Karaman

Published in

Towards Data Science

6 min readMay 4, 2019

Introduction

This series of articles was designed to explain how to use Python in a simplistic way to fuel your company’s growth by applying the predictive approach to all your actions. It will be a combination of programming, data analysis, and machine learning.

I will cover all the topics in the following nine articles:

1- Know Your Metrics

2- Customer Segmentation

3- Customer Lifetime Value Prediction

4- Churn Prediction

5- Predicting Next Purchase Day

6- Predicting Sales

7- Market Response Models

8- Uplift Modeling

9- A/B Testing Design and Execution

Articles will have their own code snippets to make you easily apply them. If you are super new to programming, you can have a good introduction for Python and Pandas (a famous library that we will use on everything) here. But still without a coding introduction, you can learn the concepts, how to use your data and start generating value out of it:

Sometimes you gotta run before you can walk — Tony Stark

As a pre-requisite, be sure Jupyter Notebook and Python are installed on your computer. The code snippets will run on Jupyter Notebook only.

Alright, let’s start.

Part 2: Customer Segmentation

In the previous article, we have analyzed the major metrics for our online retail business. Now we know what and how to track by using Python. It’s time to focus on customers and segment them.

But first off, why we do segmentation?

Because you can’t treat every customer the same way with the same content, same channel, same importance. They will find another option which understands them better.

Customers who use your platform have different needs and they have their own different profile. Your should adapt your actions depending on that.

You can do many different segmentations according to what you are trying to achieve. If you want to increase retention rate, you can do a segmentation based on churn probability and take actions. But there are very common and useful segmentation methods as well. Now we are going to implement one of them to our business: RFM.

RFM stands for Recency - Frequency - Monetary Value. Theoretically we will have segments like below:

Low Value: Customers who are less active than others, not very frequent buyer/visitor and generates very low - zero - maybe negative revenue.
Mid Value: In the middle of everything. Often using our platform (but not as much as our High Values), fairly frequent and generates moderate revenue.
High Value: The group we don’t want to lose. High Revenue, Frequency and low Inactivity.

As the methodology, we need to calculate Recency, Frequency and Monetary Value (we will call it Revenue from now on) and apply unsupervised machine learning to identify different groups (clusters) for each. Let’s jump into coding and see how to do RFM Clustering.

Recency

To calculate recency, we need to find out most recent purchase date of each customer and see how many days they are inactive for. After having no. of inactive days for each customer, we will apply K-means* clustering to assign customers a recency score.

For this example, we will continue using same dataset which can be found here. Before jumping into recency calculation, let’s recap the data work we’ve done before.

Now we can calculate recency:

Our new dataframe tx_user contains recency data now:

To get a snapshot about how recency looks like, we can use pandas’ .describe() method. It shows mean, min, max, count and percentiles of our data.

We see that even though the average is 90 day recency, median is 49.

Our code snippet above has a histogram output to show us how is the distribution of recency across our customers.

Now it is the fun part. We are going to apply K-means clustering to assign a recency score. But we should tell how many clusters we need to K-means algorithm. To find it out, we will apply Elbow Method. Elbow Method simply tells the optimal cluster number for optimal inertia. Code snippet and Inertia graph are as follows:

Inertia graph:

Here it looks like 3 is the optimal one. Based on business requirements, we can go ahead with less or more clusters. We will be selecting 4 for this example:

We have calculated clusters and assigned them to each Customer in our dataframe tx_user.

We can see how our recency clusters have different characteristics. The customers in Cluster 1 are very recent compared to Cluster 2.

We have added one function to our code which is order_cluster(). K-means assigns clusters as numbers but not in an ordered way. We can’t say cluster 0 is the worst and cluster 4 is the best. order_cluster() method does this for us and our new dataframe looks much neater:

Great! 3 covers most recent customers whereas 0 has the most inactive ones.

Let’s apply same for Frequency and Revenue.

Frequency

To create frequency clusters, we need to find total number orders for each customer. First calculate this and see how frequency look like in our customer database:

Apply the same logic for having frequency clusters and assign this to each customer:

Characteristics of our frequency clusters look like below:

As the same notation as recency clusters, high frequency number indicates better customers.

Revenue

Let’s see how our customer database looks like when we cluster them based on revenue. We will calculate revenue for each customer, plot a histogram and apply the same clustering method.

We have some customers with negative revenue as well. Let’s continue and apply k-means clustering:

Overall Score

Awesome! We have scores (cluster numbers) for recency, frequency & revenue. Let’s create an overall score out of them:

The scoring above clearly shows us that customers with score 8 is our best customers whereas 0 is the worst.

To keep things simple, better we name these scores:

0 to 2: Low Value
3 to 4: Mid Value
5+: High Value

We can easily apply this naming on our dataframe:

Now, it is the best part. Let’s see how our segments distributed on a scatter plot:

You can see how the segments are clearly differentiated from each other in terms of RFM. You can find the code snippets for graphs below:

We can start taking actions with this segmentation. The main strategies are quite clear:

High Value: Improve Retention
Mid Value: Improve Retention + Increase Frequency
Low Value: Increase Frequency

Getting more and more exciting! In Part 3, we will be calculating and predicting lifetime value of our customers.

You can find the jupyter notebook for this article here.

*Ideally, what we do here can be easily achieved by using quantiles or simple binning (or Jenks natural breaks optimization to make groups more accurate) but we are using k-means to get familiar with it.

I’ve started to write the more advanced and updated version of my articles here. Feel free to visit, learn more and support.