The world’s leading publication for data science, AI, and ML professionals.

Unlocking eCommerce growth with machine learning and behavioural psychology

Online shopper segmentation and review score prediction – and what that means for the business

Segmentation and review score prediction of 100,000 online shoppers – and what that means for the business

Photo by rupixen.com on Unsplash
Photo by rupixen.com on Unsplash

The Problem

Marketers and researchers have long been using demographics and claimed psychographics for customer segmentation. There’s no denying that this information is essential to addressing customer needs. However, as insights practices evolve, we now know that segmentation based on purchase behaviour is far more efficient as it leads to better return on ad spend and higher conversions.

There is also psychological basis for why behavioural segmentation might lead to better insights – Nobel prize-winning economist Daniel Kahneman found that shoppers tend to operate using a ‘system 1’ mindset (fast & intuitive) vs ‘system 2’ (slow & deliberate).

After reading about Recency Frequency Monetary (RFM) segmentation, I wanted to experiment with this method on a large dataset and explore what insights it can uncover.


In a nutshell, here’s what I found –

  1. K-means clustering helps determine shopper segments based on behaviour – revealing immediate opportunities for the business.
  2. Highest value customers leave the worst reviews, tend to be located in Maranhão, and shop most during Q2. In contrast, lowest value customers leave the best reviews, tend to be located in Sao Paolo and shop most during Q1.
  3. I was able to predict customer segments with 93% accuracy and review scores with 94% accuracy.

The data

Olist is the largest Ecommerce website in Brazil. It connects small retailers from all over the country to sell directly to customers. The business has generously shared a large dataset containing 110k orders on its site from 2016 to 2018.

The SQL-style relational database includes customers and their orders in the site, which contains around 100k unique orders and 73 categories. It also includes item prices, timestamps, reviews, and gelocation associated with the order. This is real commercial data that has been anonymised.

The methodology

RFM is a data modelling method used to analyze customer behaviour. It stands for –

Recency measures the time (in days) between when your customer last ordered to today.

Frequency measures how many total orders the customer had.

Monetary is the average amount they spent on those orders.

By segmenting customers through these three lenses, we can pinpoint clusters of customers that behave a certain way and create personas – which in turn leads to better Marketing results. RFM analysis can help a business figure out what that challenges and opportunities are, where to focus, and what to do.

I used K-means clustering (unsupervised machine learned) to determine RFM clusters, and then assigned an overall score to each customer – splitting them into high-value, mid-value and low-value.

#creates a generic user dataframe to keep CustomerID and new Segmentation scores
df_user = pd.DataFrame(df['customer_unique_id'])
df_user.columns = ['customer_unique_id']
#gets the max purchase date for each customer and create a dataframe with it
df_max_purchase = df.groupby('customer_unique_id').order_purchase_timestamp.max().reset_index()
df_max_purchase.columns = ['customer_unique_id', 'MaxPurchaseDate']
#we take our observation point as the max purchase date in our dataset
df_max_purchase['Recency'] = (df_max_purchase['MaxPurchaseDate'].max() - df_max_purchase['MaxPurchaseDate']).dt.days
#merge this dataframe to our new user dataframe
df_user = pd.merge(df_user, df_max_purchase[['customer_unique_id','Recency']], on='customer_unique_id')
df_user.head()
Image by Author
Image by Author

To validate the number of clusters, I used the ‘Elbow Method’ with K-means clustering.

This method estimates the optimal value of K . Using the visualisation below, the ‘elbow’ calculates the point where distortion declines, or in other words, if the plot looks like an arm, the elbow is where the forearm begins.

[Image by Author] Using the elbow method, we can determine there are 5 ideal clusters for Recency
[Image by Author] Using the elbow method, we can determine there are 5 ideal clusters for Recency

After I applied the same method to Frequency and Monetary, I segmented customers based on their overall cluster scores. Scores under 3 were classified as ‘Low-value’, between 3–6 as ‘Mid-value’, and above 6 as ‘High-value’.

Another way of doing this would be to assign weights or split segments even further, but I decided to keep it simple for this analysis.

#assigning value labels to segments
df_user['Segment'] = 'Low-Value'
df_user.loc[df_user['OverallScore']>3,'Segment'] = 'Mid-Value' 
df_user.loc[df_user['OverallScore']>6,'Segment'] = 'High-Value'
[Image by Author] Although only 10% of the customer base, high value customers account for 50% of the revenue!
[Image by Author] Although only 10% of the customer base, high value customers account for 50% of the revenue!

From a machine learning perspective, this was a multi-class classification problem. Predictive modelling of the value segments turned out fairly accurate, with an F1 score of 93% achieved with a Random Forest model.

[Image by Author] Final Model: Random Forest | F1 Score: 93%
[Image by Author] Final Model: Random Forest | F1 Score: 93%

The insights

Exploring the 3 segments turned out to be very useful and the differences are clear. Each segment has distinct purchase patterns, locations and top product categories. This information can inform the marketing team on how to best strategize for these customers.

[Image by Author] High value customers have clear category preferences - e.g. driving computer accessory sales in February
[Image by Author] High value customers have clear category preferences – e.g. driving computer accessory sales in February
[Image by Author] Macro-level personas of the 3 segments
[Image by Author] Macro-level personas of the 3 segments

Perhaps the most interesting difference was is how the 3 segments leave reviews. The highest value customers are responsible for the worst reviews, which is a HUGE red flag for the company. For any eCommerce ecosystem, reviews are critical towards nurturing the seller-buyer relationship. Therefore predicting customer review scores and understanding what drives positive reviews is the logical next step.

[Image by Author] High-value customers leave an average review score of only 3.2
[Image by Author] High-value customers leave an average review score of only 3.2

This led me down the rabbit hole of predicting customer reviews. Reviews were binarised into 0 for negative (1–3) and 1 for positive (more than 4). The overall distribution appears to be skewed more towards the positive class.

Image by Author
Image by Author

After attempting a few different machine learning models, I predicted customer satisfaction with an F1 score of 94%, and found 2 key factors that determine positive or negative reviews-

[Image by Author] Final model: XGBoost | F1 Score: 94%
[Image by Author] Final model: XGBoost | F1 Score: 94%

The biggest predictor of satisfaction is the difference between the actual and estimated date of delivery. Essentially, customers will be more satisfied if the order arrives sooner than expected, or unhappy if received after expected. From a cognitive pychology perspective, this can be addressed by the anchoring heuristic — e.g. Olist could build in a buffer window for expected delivery time, thereby leading to a higher chance of delighting shoppers with an earlier-than-expected delivery.

Product description length is the 2nd biggest predictor – meaning that products with larger descriptions tend to get more positive reviews. This implies that managing shopper expectations can help to drive positive reviews. Once again we can turn to behavioural biases to ‘nudge’ shopper behaviour— this is a great chance to apply the framing effect. e.g. Olist could work with sellers to help provide better (and longer) descriptions, which would likely lead to better customer reviews.

The caveats

While this was a fascinating project with clear business implications, I can think of a few ways it be improved further –

  1. Data is limited to a sample. 100k orders is only a small fraction of Olist’s actual business volume. It would be more insightful (and computationally draining) if all of the data was available.
  2. An additional layer of demographic information (age, gender etc) would allow for deeper analysis on customer segmentation.

Final thoughts

If you’d like to have a look, the full project is available on my GitHub (where I also looked into churn and customer lifetime value using the lifetimes library and beta geofitter/negative binomial distribution model, as well as creating a product recommender engine).

I have also published a macro-view of the findings on a Tableau story.


Related Articles