
Introduction
Some of us go to the grocery with a standard list; while some of us have a hard time sticking to our grocery shopping list, no matter how determined we are. No matter which type of person you are, retailers will always be experts at making various temptations to inflate your budget.
Remember the time when you had the "Ohh, I might need this as well." moment? Retailers boost their sales by relying on this one simple intuition.
People that buy this will most likely want to buy that as well.
People who buy bread will have a higher chance of buying butter together, therefore an experienced assortment manager will definitely know that having a discount on bread pushes the sales on butter as well.
Data-driven strategies
Huge retailers pivot on a detailed market basket analysis to uncover associations between items. Using this valuable information, they are able to carry out various strategies to improve their revenue:
- Associated products are placed close to each other, so that buyers of one item would be prompted to buy the other.
- Discounts can be applied to only one of the associated products.

Association Rule Mining
But how exactly is a Market Basket Analysis carried out?
Data scientists are able to carry out Market Basket Analysis by implementing Association Rule Mining. Association Rule Mining is a rule-based machine learning method that helps to uncover meaningful correlations between different products according to their co-occurrence in a data set.
However, one of the major pitfalls is that it consists of various formulas and parameters that may make it difficult for people without expertise in data mining. Therefore, before sharing your results with stakeholders, make sure that the underlying definitions are well-understood.

Core concepts illustration
I will be illustrating three of the core concepts that are used in Association Rule Mining with some simple examples below. This will assist you in grasping the data mining process.
Let’s say you have now opened up your own cafeteria. How will you utilize your Data Science skills to understand which of the items on your menu are associated?

There are six transactions in total with various different purchases that happened in your cafeteria.

We can utilize three core measures that are used in Association Rule Learning, which are: Support, Confidence, and Lift.
- Support.
Support is just the plain basic probability of an event to occur. It is measured by the proportion of transactions in which an item set appears. To put it in another way, Support(A) is the number of transactions which includes A divided by the total number of transactions.
If we analyze the transaction table above, the support for cookie is 3 out of 6. That is, out of a total of 6 transactions, purchases containing cookies have occurred 3 times. or 50%.

Support can be implemented onto multiple items at the same time as well. The support for cookie and cake is 2 out of 6.

2. Confidence.
The confidence of a consequent event given an antecedent event can be ** described by using conditional probability. Simply put, it is the probability of event A happening given that event B has already happened**.
This can be used to describe the probability of an item being purchased when another item is already in the basket. It is measured by dividing the proportion of transactions with item X and Y, over the proportion of transactions with Y.
From the transactions table above, the confidence of {cookie -> cake} can be formulated below:

The conditional probability can also be written as:

Finally, we arrive at a solution of 2 out of 3. We can understand the intuition of confidence if we were to look only at Transaction 1 to Transaction 3. Out of 3 purchases with cookies, 2 of them are actually bought together with a cake !
3. Lift.
Lift is the observed to expected ratio (abbreviation o/e). Lift measures how likely an item is purchased when another item is purchased, while controlling for how popular both items are. It can be calculated by dividing the probability of both of the items occurring together by the product of the probabilities of the both individuals items occurring as if there was no association between them.

A lift of 1 will then mean that both of the items are actually independent and without any association. For any value higher than 1, lift shows that there is actually an association. The higher the value, the higher the association.
Looking at the table again, the lift of {cookies -> cake} is 2,which implies that there is actually an association between cookies and cakes.
Now that we have mastered all the core concepts, we can look into an algorithm that is able to generate item sets from transactional data, which is used to calculate these association rules.
The Apriori Algorithm
Overview
The Apriori Algorithm is one of the most popular algorithms used in association rule learning over relational databases. It identifies the items in a data set and further extends them to larger and larger item sets.
However, the Apriori Algorithm only extends if the item sets are frequent, that is the probability of the itemset is beyond a certain predetermined threshold.

More formally,
The Apriori Algorithm proposes that:
The probability of an itemset is not frequent if:
- P(I) < Minimum support threshold, where I is any non-empty itemset
- Any subset within the itemset has value less than minimum support.
The second characteristic is defined as the Anti-monotone Property. A good example would be if the probability of purchasing a burger is below the minimum support already, the probability of purchasing a burger and fries will definitely be below the minimum support as well.
Steps in the Apriori Algorithm
The diagram below illustrates how the Apriori Algorithm starts building from the smallest itemset and further extends forward.
- The algorithm starts by generating an itemset through the Join Step, that is to generate (K+1) itemset from K-itemsets. For example, the algorithm generates Cookie, Chocolate and Cake in the first iteration.
- Immediately after that, the algorithm proceeds with the Prune Step, that is to remove any candidate item set that does not meet the minimum support requirement. For example, the algorithm will remove Cake if Support(Cake) is below the predetermined minimum Support.
It iterates both of the steps until there are no possible further extensions left.
Note that this diagram is not the complete version of the transactions table above. It serves as an illustration to help paint the bigger picture of the flow.

Code Implementation
To perform a Market Basket Analysis implementation with the Apriori Algorithm, we will be using the Groceries dataset from Kaggle. The data set was published by Heeral Dedhia on 2020 with a General Public License, version 2.
The dataset has 38765 rows of purchase orders from the grocery stores.

Import and read data
- First of all, let’s import some necessary modules and read the datasets that we have downloaded from Kaggle.
Code:
Output:

Grouping into transactions
- The data set records individual item purchases in a row. We will have to group these purchases into baskets of items.
- After that, we will use TransactionEncoder to encode the transactions into a format that is suitable for the Apriori function.
Code:
Output:

Note: The data frame records each row as a transaction, and the items that were purchased in the transaction will be recorded as True.
Apriori and Association Rules
- The Apriori Algorithm will be used to generate frequent item sets. We will be specifying the minimum support to be 6 out of total transactions. The association rules are generated and we filter for Lift value > 1.5.
Code:
Output:

Visualizations
- To visualize our association rules, we can plot them in a 3D scatter plot. Rules that are closer to top right are the rules that can be the most meaningful to be further dived in.
Code:
Output:

- Another type of visualizations to look at the relationship between the products is via Network Graph. Let’s define a function to draw a network graph which can specify how many rules we want to show.
Code:
Output:

Business Application
Let’s say the grocery has bought up too much Whole Milk and is now worrying that the stocks will expire if they cannot be sold out in time. To make matters worse, the profit margin of Whole Milk is so low that they cannot afford to have a promotional discount without killing too much of their profits.

One approach that can be proposed is to find out which products drive the sales of Whole Milk and offer discounts on those products instead.
Code:
Output:

For instance, we can apply a promotional discount on Brandy, Softener, Canned Fruit, Syrup and Artificial Sweetener. Some of the associations may seem counter-intuitive, but the rules state that these products do drive the sales of Whole Milk.
Takeaway
By implementing the Apriori Algorithm and analyzing the association measures, businesses can derive dozens of data-driven strategies to boost their revenue and profits. These association rules are critical in data mining for analyzing consumer’s purchasing behavior. Some of the most important strategies of a retailer, such as Customer analytics, Market Basket analysis and Product Clustering are able to derive valuable insights through association rule mining.
Finally, thank you so much for reading to the end. I hope you enjoyed this piece of writing!

References
[1] M.Mohammed and B. Arkok. An Improved Apriori Algorithm For Association Rules. (2014). International Journal on Natural Language Computing. 3. 10.5121/ijnlc.2014.3103.
[2] D.H. Goh, R.P. Ang. An introduction to association rule mining: An application in counseling and help-seeking behavior of adolescents. (2007). Behavior Research Methods 39, 259–266
[3] S. Raschka. Machine Learning Extensions Documentation. (2021). Retrieved from: https://rasbt.github.io/mlxtend/
[4] A. Hagberg, D. Schult, P. Swart. NetworkX Reference Release 2.7.1. (2022). Retrieved from: https://networkx.org/
[5] H. Dedhia. Groceries Dataset licensed under GPL 2. (2020). Retrieved from: https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset