The world’s leading publication for data science, AI, and ML professionals.

Data Mining: Market Basket Analysis with Apriori Algorithm

Uncovering the secret behind why breads are always conveniently placed beside butter in groceries

Photo by Anne Preble on Unsplash
Photo by Anne Preble on Unsplash

Introduction

Some of us go to the grocery with a standard list; while some of us have a hard time sticking to our grocery shopping list, no matter how determined we are. No matter which type of person you are, retailers will always be experts at making various temptations to inflate your budget.

Remember the time when you had the "Ohh, I might need this as well." moment? Retailers boost their sales by relying on this one simple intuition.

People that buy this will most likely want to buy that as well.

People who buy bread will have a higher chance of buying butter together, therefore an experienced assortment manager will definitely know that having a discount on bread pushes the sales on butter as well.

Data-driven strategies

Huge retailers pivot on a detailed market basket analysis to uncover associations between items. Using this valuable information, they are able to carry out various strategies to improve their revenue:

  • Associated products are placed close to each other, so that buyers of one item would be prompted to buy the other.
  • Discounts can be applied to only one of the associated products.
Photo by Artem Beliaikin on Unsplash
Photo by Artem Beliaikin on Unsplash

Association Rule Mining

But how exactly is a Market Basket Analysis carried out?

Data scientists are able to carry out Market Basket Analysis by implementing Association Rule Mining. Association Rule Mining is a rule-based machine learning method that helps to uncover meaningful correlations between different products according to their co-occurrence in a data set.

However, one of the major pitfalls is that it consists of various formulas and parameters that may make it difficult for people without expertise in data mining. Therefore, before sharing your results with stakeholders, make sure that the underlying definitions are well-understood.

Photo by Priscilla Du Preez on Unsplash
Photo by Priscilla Du Preez on Unsplash

Core concepts illustration

I will be illustrating three of the core concepts that are used in Association Rule Mining with some simple examples below. This will assist you in grasping the data mining process.

Let’s say you have now opened up your own cafeteria. How will you utilize your Data Science skills to understand which of the items on your menu are associated?

Photo by Aleksander Sadowski on Unsplash
Photo by Aleksander Sadowski on Unsplash

There are six transactions in total with various different purchases that happened in your cafeteria.

Image by author
Image by author

We can utilize three core measures that are used in Association Rule Learning, which are: Support, Confidence, and Lift.

  1. Support.

Support is just the plain basic probability of an event to occur. It is measured by the proportion of transactions in which an item set appears. To put it in another way, Support(A) is the number of transactions which includes A divided by the total number of transactions.

If we analyze the transaction table above, the support for cookie is 3 out of 6. That is, out of a total of 6 transactions, purchases containing cookies have occurred 3 times. or 50%.

Support equation image by author
Support equation image by author

Support can be implemented onto multiple items at the same time as well. The support for cookie and cake is 2 out of 6.

Support equation image by author
Support equation image by author

2. Confidence.

The confidence of a consequent event given an antecedent event can be ** described by using conditional probability. Simply put, it is the probability of event A happening given that event B has already happened**.

This can be used to describe the probability of an item being purchased when another item is already in the basket. It is measured by dividing the proportion of transactions with item X and Y, over the proportion of transactions with Y.

From the transactions table above, the confidence of {cookie -> cake} can be formulated below:

Confidence equation image by author
Confidence equation image by author

The conditional probability can also be written as:

Confidence equation image by author
Confidence equation image by author

Finally, we arrive at a solution of 2 out of 3. We can understand the intuition of confidence if we were to look only at Transaction 1 to Transaction 3. Out of 3 purchases with cookies, 2 of them are actually bought together with a cake !


3. Lift.

Lift is the observed to expected ratio (abbreviation o/e). Lift measures how likely an item is purchased when another item is purchased, while controlling for how popular both items are. It can be calculated by dividing the probability of both of the items occurring together by the product of the probabilities of the both individuals items occurring as if there was no association between them.

Lift equation image by author
Lift equation image by author

A lift of 1 will then mean that both of the items are actually independent and without any association. For any value higher than 1, lift shows that there is actually an association. The higher the value, the higher the association.

Looking at the table again, the lift of {cookies -> cake} is 2,which implies that there is actually an association between cookies and cakes.

Now that we have mastered all the core concepts, we can look into an algorithm that is able to generate item sets from transactional data, which is used to calculate these association rules.


The Apriori Algorithm

Overview

The Apriori Algorithm is one of the most popular algorithms used in association rule learning over relational databases. It identifies the items in a data set and further extends them to larger and larger item sets.

However, the Apriori Algorithm only extends if the item sets are frequent, that is the probability of the itemset is beyond a certain predetermined threshold.

Photo by Shane Aldendorff on Unsplash
Photo by Shane Aldendorff on Unsplash

More formally,

The Apriori Algorithm proposes that:

The probability of an itemset is not frequent if:

  • P(I) < Minimum support threshold, where I is any non-empty itemset
  • Any subset within the itemset has value less than minimum support.

The second characteristic is defined as the Anti-monotone Property. A good example would be if the probability of purchasing a burger is below the minimum support already, the probability of purchasing a burger and fries will definitely be below the minimum support as well.

Steps in the Apriori Algorithm

The diagram below illustrates how the Apriori Algorithm starts building from the smallest itemset and further extends forward.

  • The algorithm starts by generating an itemset through the Join Step, that is to generate (K+1) itemset from K-itemsets. For example, the algorithm generates Cookie, Chocolate and Cake in the first iteration.
  • Immediately after that, the algorithm proceeds with the Prune Step, that is to remove any candidate item set that does not meet the minimum support requirement. For example, the algorithm will remove Cake if Support(Cake) is below the predetermined minimum Support.

It iterates both of the steps until there are no possible further extensions left.

Note that this diagram is not the complete version of the transactions table above. It serves as an illustration to help paint the bigger picture of the flow.

Apriori Algorithm Rough Concept image by author
Apriori Algorithm Rough Concept image by author

Code Implementation

To perform a Market Basket Analysis implementation with the Apriori Algorithm, we will be using the Groceries dataset from Kaggle. The data set was published by Heeral Dedhia on 2020 with a General Public License, version 2.

The dataset has 38765 rows of purchase orders from the grocery stores.

Photo by Cookie the Pom on Unsplash
Photo by Cookie the Pom on Unsplash

Import and read data

  • First of all, let’s import some necessary modules and read the datasets that we have downloaded from Kaggle.

Code:

Output:

Code output by author
Code output by author

Grouping into transactions

  • The data set records individual item purchases in a row. We will have to group these purchases into baskets of items.
  • After that, we will use TransactionEncoder to encode the transactions into a format that is suitable for the Apriori function.

Code:

Output:

Code output by author
Code output by author

Note: The data frame records each row as a transaction, and the items that were purchased in the transaction will be recorded as True.


Apriori and Association Rules

  • The Apriori Algorithm will be used to generate frequent item sets. We will be specifying the minimum support to be 6 out of total transactions. The association rules are generated and we filter for Lift value > 1.5.

Code:

Output:

Code output by author
Code output by author

Visualizations

  • To visualize our association rules, we can plot them in a 3D scatter plot. Rules that are closer to top right are the rules that can be the most meaningful to be further dived in.

Code:

Output:

3D Scatterplot by author
3D Scatterplot by author

  • Another type of visualizations to look at the relationship between the products is via Network Graph. Let’s define a function to draw a network graph which can specify how many rules we want to show.

Code:

Output:

Network graph by author
Network graph by author

Business Application

Let’s say the grocery has bought up too much Whole Milk and is now worrying that the stocks will expire if they cannot be sold out in time. To make matters worse, the profit margin of Whole Milk is so low that they cannot afford to have a promotional discount without killing too much of their profits.

Photo by Daria Volkova on Unsplash
Photo by Daria Volkova on Unsplash

One approach that can be proposed is to find out which products drive the sales of Whole Milk and offer discounts on those products instead.

Code:

Output:

Code output by author
Code output by author

For instance, we can apply a promotional discount on Brandy, Softener, Canned Fruit, Syrup and Artificial Sweetener. Some of the associations may seem counter-intuitive, but the rules state that these products do drive the sales of Whole Milk.


Takeaway

By implementing the Apriori Algorithm and analyzing the association measures, businesses can derive dozens of data-driven strategies to boost their revenue and profits. These association rules are critical in data mining for analyzing consumer’s purchasing behavior. Some of the most important strategies of a retailer, such as Customer analytics, Market Basket analysis and Product Clustering are able to derive valuable insights through association rule mining.

Finally, thank you so much for reading to the end. I hope you enjoyed this piece of writing!

Photo by Hanny Naibaho on Unsplash
Photo by Hanny Naibaho on Unsplash

References

[1] M.Mohammed and B. Arkok. An Improved Apriori Algorithm For Association Rules. (2014). International Journal on Natural Language Computing. 3. 10.5121/ijnlc.2014.3103.

[2] D.H. Goh, R.P. Ang. An introduction to association rule mining: An application in counseling and help-seeking behavior of adolescents. (2007). Behavior Research Methods 39, 259–266

[3] S. Raschka. Machine Learning Extensions Documentation. (2021). Retrieved from: https://rasbt.github.io/mlxtend/

[4] A. Hagberg, D. Schult, P. Swart. NetworkX Reference Release 2.7.1. (2022). Retrieved from: https://networkx.org/

[5] H. Dedhia. Groceries Dataset licensed under GPL 2. (2020). Retrieved from: https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset


Related Articles