Using Machine Learning to Predict Customers’ Next Purchase Day
Machine Learning model to predict whether customers will make their next purchase after a certain period.
Introduction
If there is one major lesson that those in the retail business have learnt from the SARS-CoV-2 pandemic, it is the demand to switch to doing business via the Internet, i.e., e-commerce. The idea of e-commerce assists those in managerial positions to make decisions for the progress of their companies. Undoubtedly, most of these decisions are influenced by the results derived from studying the purchasing behavioural data of online customers by experts in data analysis, data science, and machine learning.
Suppose the managerial team of an online retail shop approaches you, a data scientist, with the dataset wanting to know whether customers will make their next purchase 90 days from the day they made their last purchase. Your answer to their inquiry will help them identify which customers their marketing team need to have a focus on with regard to the next promotional offers they will be rolling out.
In this article, my goal as a data scientist is to build a model that will provide a suitable answer to the question posed by the firm's managers. More precisely, using the given dataset, I build a machine learning model that predicts whether an online customer of a retail shop will make their next purchase 90 days from the day they made their last purchase.
It is worth mentioning that Barış Karaman¹ has done a similar work that answers a different question and not this exact forecast the question we have, seeks to preempt.
Some Information about the Dataset
Before proceeding to answer the question of interest, I will first present some general and useful information about the dataset.
The dataset recorded 5942 online customers from 43 different countries. Among the online customers of the retail shop, 90.8% of them were living in the United Kingdom.
With such a huge customer base in the United Kingdom, it is not surprising that 83% of the company’s revenue came from the United Kingdom.
Figure 3 below gives a visual representation of the monthly revenue earned by the online retail company.
Here, one can observe that the company recorded its highest revenue in the month of November 2010, followed by November 2011. In addition, there is a rise in monthly revenue after August.
From the analysis made in this section, there is the advice one can give to the managers for consideration. In the company’s bid to increase its customer base in other countries than the United Kingdom, what could be a possible advice a data scientist can suggest to the managerial team? In an answer to this, I would say that…
Since the company has a solid customer base in the United Kingdom, it could capitalise on that and roll out a “win-win promotion”. In implementing this rollout, specifically, for any product that an existing customer buys, he/she gets the opportunity to invite a new customer outside the United Kingdom via a web link. If the new customer buys something from the online shop using the web link he/she received from the already existing customer, the company gives both the existing customer and the new customer a cash voucher that can be used by both parties in their next purchase. By so doing we see that the company, the existing or earlier customer, and the new customer all receive a level of satisfaction in the transaction made.
Predicting Customers’ Next Purchase
In this section, I focus on the methods that I deployed to solve the problem of interest. That is, to build a machine learning model that will predict whether an online customer of a retail shop will make their next purchase 90 days from the day they made their last purchase.
The major steps included the following:
- Data Wrangling
- Feature Engineering
- Building Machine Learning Models
- Selecting Model
I begin by importing the necessary Python packages, download the dataset and then load it into my Python environment. The code snippet below summarises this step.
Data Wrangling
I then wrangle with the dataset to put it into good shape so as to introduce new X
features.
The CustomerID
column of the given dataset has 243007
missing data. That represents 22.77%
of the entire online customers. Moreover, the Description
column has 4382
missing data. How do I deal with these missing data? After talking to the company leaders, they suggested that any item that has missing CustomerID
should be dropped.
The dataframe df_data
is split into two pandas data frames. Namely,
- The first sub-dataframe,
ctm_bhvr_dt
, contains purchases made by customers from 01–12–2009 to 30–08–2011. From this dataset, I get the last purchase date of all online customers.
- The second sub-dataframe
ctm_next_quarter
is used to get the first purchase date of the customers from 01–09–2011 to 30–11–2011.
Next, I create a pandas dataframe that contains a set of features of each customer for us to build our prediction model. I begin by creating a dataset that contains the distinct customers in the dataframe ctm_bhvr_dt
.
I then add a new label, NextPurchaseDay
to the dataframe ctm_dt
. This new label will be the number of days between the last purchase date of a customer in the dataframe the customer who has the most frequently purchased item that is with missing CustomerID the following procedure to deal with the missing values in the CustomerID column. turns out that thectm_bhvr_dt
and his/her first purchase date in the dataframe ctm_next_quarter
.
Figure 5 below is the output of the code snippet above. It shows the first 5 entries of the dataframe object, ctm_dt
.
In the next section, I introduce some features and add them to the dataframe ctm_dt
to build our machine learning model.
Feature Engineering
I introduce features into our dataframe ctm_dt
that segments customers into groups based on their value to the company. In executing this I use the RFM segmentation method. RFM stands for
- Recency: indicating how recent a customer made a purchase.
- Frequency: How often or the number of times a customer purchases.
- Monetary Value/Revenue: The amount of money a customer spends when making a purchase at a point in time.
Using these three features being recency, frequency, and monetary value/revenue, I create an RFM score system to group the customers. Essentially, the RFM score derived is what helps to give an insight into what a customer would probably do regarding purchase decisions in the future.
After calculating the RFM score, I then apply unsupervised machine learning to identify different groups (clusters) for each score and add them to the dataframe ct_dt
. Finally, I apply the pandas dataframe method get_dummies
to ctm_dt
to deal with the categorical features in the dataframe. I now make a move into coding to fish out the computation of the RFM scores and the clustering.
Recency
In getting to know who is likely to make a current purchase, I use the recency feature to work this out. Factoring in the length of time a customer has taken off after his or her last purchase, the recency characteristic comes in handy here. I use this feature to know which customer will be coming in for a transaction. It is relevant to also note that the sales transaction of a recent purchasing customer is of far more worth than the customer who has not bought in a while.
Let us get into the coding here below.
Figure 6 below gives a visual presentation of the recency data of the online customers.
The code used to generate Figure 6 above can be accessed in the Jupyter notebook here.
Next, I need to assign a recency score for the recency values. This can be achieved by applying the K-means clustering algorithm. However, we need to know the number of clusters before using the algorithm. Applying the Elbow Method, one can determine the number of clusters needed for a given data. In our case, given the recency values as our data, the number of clusters computed is 4. The code used to compute the number of clusters is available in the Jupyter notebook here.
I can now build 4-clusters using the Recency
column in the dataframe ctm_dt
and create a new column RecencyCluster
in ctm_dt
whose values are the cluster value predicted by the unsupervised machine learning algorithm kmeans
. Using the user-defined Python function order_cluster
accessible here, I sort the dataframe ctm_dt
in decreasing order of the values in RecencyCluster
. The code snippet below outputs Figure 7 below.
Let us group the dataframe ctm_dt
by the cluster values in the column labelled RecencyCluster
and fetch out the statistical description of the Recency
data of each of these clusters
From Figure 8 above, it can be observed that cluster value 3 covers the most recent customers whereas 0 has the most inactive customers.
In the next subsections, I apply the method we have discussed in this subsection for the Frequency and Revenue features.
Frequency
So as earlier mentioned, in a particular frame of time, if we consider the number of times a customer has engaged in a purchasing transaction, frequency comes into play. Now, this frequency characteristic is that which helps us know a customers alliance to a specific company or trading brand. In view of this, it gives the company insight into the marketing strategies to relay and at what points in time, in order to reach out to such customers in particular.
Here, I conduct a similar procedure of analysis as I did in the previous subsection (Recency).
Figure 10 below illustrates the histogram of customers whose purchase frequency is less than 1200.
The code snippet below assigns a cluster value for the purchase frequency of each customer and sorts the cluster values in decreasing order.
The code snippet below groups the dataframe ctm_dt
by the cluster values recorded in the column labelled FrequencyCluster
and fetches out the statistical description of the Frequency
data of each of these FrequencyCluster
values.
As it was for the case of the Recency, customers with a higher frequency cluster value are better customers. In other words, they patronise the products of the retail shop very often than those with a lower frequency cluster value.
Monetary Value/Revenue
To give a little more detail to Monetary Value or revenue, it centres more on the money a customer spends when in for a purchase at any point in time. So here it helps to ascertain how much money a customer is likely to let out when making a purchase. Even though this feature of revenue does not expose one to predict when next there will be a purchase from the customer, knowing how much could come in when the customer comes through for a transaction is worth knowing.
I again follow a similar procedure to obtain a revenue score for each customer and assign cluster values for each customer based on their revenue score.
The figure below illustrates a visual representation of customers whose revenue is below £10,000.
The code snippet below assigns a cluster value for the revenue of each customer and sorts the cluster values in ascending order.
Overall Score
In the code snippet below, I add a new column OverallScore
to the dataframe ctm_dt
with values as the sum of the cluster values obtained for the Recency, Frequency and Revenue.
The scoring above clearly shows us that customers with an overall score of 8 are the positively outstanding customers who bring much value to the company whereas those assigned a score of 3 are supposedly unreliable and merely wandering.
As a follow-up, I group the customers into the segments based on their overall score as follows:
- 3 to 4: Low Value
- 5 to 6: Mid Value
- 7 to 8: High Value
The code snippet is as follows:
I then create a copy of the dataset ctm_dt
and apply the method get_dummies
to it so as to convert all categorical column Segment
to indicator variables.
In pursuance of my goal to estimate whether a customer will make a purchase in the next quarter, I create a new column NextPurchaseDayRange
with values as either 1 or 0 defined as follows:
- If the value is 1, then it indicates that the customer will buy something in the next quarter, i.e., 90 days from his or her last purchase.
- The value 0 indicates that the customer will buy something in more than 90 days from his or her last purchase.
I conclude this section by computing the correlation between our features and label. I achieve this by applying the corr
method to the dataframe ctm_class
.
From Figure 18 above, it can be seen that OverallScore
has the highest positive correlation of 0.97 with RecencyCluster
and Segment_Low-Value
has the highest negative of -0.99 with Segment_Mid-Value
.
In Figure 19 below, I present a good visualisation of the coefficient matrix. The code snippet is below.
Building Machine Learning Models
In this section, I have what it takes with regard to the necessary prerequisites to build the machine learning model. The code snippet below separates the dataframe ctm_class
into X
features and the target variable y
. Afterwards, I split X
and y
to get the training and test datasets and then measure the accuracy, F₁-score, recall, and precision of the different models.
From the results in Figure 20 above, we see that the LogisticRegression
model is the best in terms of the metrics accuracy and F₁-score.
Let’s see how an improvement can be made for the existing model XGB Classifier
which ranks fourth in Figure 20 above, by finding suitable parameters to control the learning process of the model. This process is called hyperparameter tuning. I then verify using the computation below to know if the improved XGB Classifier
model outperforms the LogisticRegression
model.
Selecting Model
Comparing the accuracy of the LogisticsRegression
in Figures 20 above and that of the refined XGB classifier
in Figure 22 above, it is obvious that the refined XGB classifier
model is accurate than the LogisticRegression
model by a margin of 0.1
. How about the other metrics?
It is obvious from the output in Figure 23 above that for each metric, accuracy
, F₁-score
, recall
, and precision
, the refined XGB classifier
model outperforms the LogisticRegression
model.
In forecasting the expectancy of a customer to make another purchase at the online retail shop after 90 days of one’s last purchase, there is the need to be accurate in our submission. As a result, I am interested in the model which gives the highest accuracy possible in making this pre-emption. Thus it is the best of options to make a choice to use the improved XGB classifier
model over the LogisticRegression
model.
Conclusion
From the dataset, I highlight the fact that the strong customer base of the online shop centred in the United Kingdom is a major reason for the high revenue the company profits from the United Kingdom as a region.
I also give a detailed demonstration of how to build a machine learning model to predict whether an online customer of the retail shop will make their next purchase 90 days from the day they made their last purchase. Among the models that I used, I had to further improve on the XGB classifier
model by the process of hyperparameter tuning to outperform the LogisticRegression
model. The initial metrics after the hyperparameter tuning of the XGB classifier
model max_depth
and min_child_weight
both set to 3, did not outperform that of the LogisticRegression
. So I had to further tweak these values heuristically in order to get the XGB classifier
model to outperform the LogisticRegression
model.
The above notwithstanding, it will be interesting to investigate with further work how one can again improve the model’s accuracy and F₁-score metrics. I suggest improving the dataset by introducing the “right” X
features so as to avoid the usage of a hyperparameter tuning process. So then my question now stands that
What
X-
features would be appropriate to introduce into the dataset to reach or increase the model’s hightened accuracy and F₁-score metrics without hyperparameter tuning?
The Jupyter notebook used for this article is available here.
Reference
[1] Barış Karaman. (Accessed on April 28, 2021) Predicting Next Purchase Day