The world’s leading publication for data science, AI, and ML professionals.

Another Apriori Article but With a Small Twist…

Your boss wants to know the estimated financial impact of doing a product promotion and gives you a big file of transaction data. Now what?

Photo by Stephen Dawson on Unsplash
Photo by Stephen Dawson on Unsplash

There are already a lot of good articles on the details of Apriori. I am going to mainly focus on a realistic business scenario that might come up around the application of Apriori and Market Basket Analysis.

But it doesn’t hurt to do a quick and high-level refresher on the Apriori algorithm.

Those that bought X also bought Y.

Apriori is an association rules mining algorithm that finds what items tend to go with what other items. Think of a supermarket where chips and dip are usually placed adjacent to each other in an aisle (or even across the store to make you walk through it more and potentially impulse buy more).

There are three key metrics to look at in the output of the Apriori algorithm when doing a market basket analysis – Support, Confidence, and Lift.

Support -> Fraction of transactions that have both A and B.

Confidence -> How often A & B are bought together given the number of times A occurs.

Lift -> Indicates the strength of a rule over random chance. For example, a lift of 3.2 means that if a customer buys A they are 3.2x more likely to buy B.


Back to the scenario of your bosses request…

In this scenario, you’re given that big file with transaction data for the retail side of the business. Your boss says that marketing is wanting to do a promotion on Greeting Cards by giving away Candy Bars. You’re assigned with validating the assumption and estimating a financial impact.

I figure greeting cards have a significant margin and I enjoy candy bars. Anyways, back to the scenario…

Taking a peak we can see there are 459,258 records that ultimately represent 200,000 transactions.

Item Frequency Plot (from the arulesViz library in R) (Image by author)
Item Frequency Plot (from the arulesViz library in R) (Image by author)

My main focus here is on "what’s the impact?" in a real business scenario. I am going to gloss over the EDA, and a lot of the Apriori part itself at this point as it’s well covered in other articles and not my main focus.

After going through the data, you see that it’s not too horrible of an idea. We see that almost 30% of the time, a candy bar purchase happens with a greeting card purchase (confidence). We also see that those that buy greeting cards are 1.74x more likely to buy a candy bar (lift).

LHS and RHS are 'Left Hand Side' and 'Right Hand Side' respectively. The LHS is the product(s) that tend to lead to the RHS product(s) purchase. (Image by author)
LHS and RHS are ‘Left Hand Side’ and ‘Right Hand Side’ respectively. The LHS is the product(s) that tend to lead to the RHS product(s) purchase. (Image by author)
Summary statistics by item in the transactions dataset. (Image by author)
Summary statistics by item in the transactions dataset. (Image by author)

Given the summary statistics by product and glancing at the raw data, it would be safe to say there are 1 of 2 things likely happening here (possibly a combination).

  1. A fair amount of this company’s transactions are bulk sales.
  2. There is noise in the data from miscellaneous transactions.

    • i.e. System was down for a day at a store and they entered a summary transaction to balance inventory.

The dataset is good enough for a quick example though, so lets look at Candy Bars and Greeting Cards.

A subset of the item summary table looking at what we're interested in. (Image by author)
A subset of the item summary table looking at what we’re interested in. (Image by author)

Assumptions

Let’s assume that:

  • Greeting Cards have a very healthy margin, 1 dollar to buy and sell for 5.
  • Candy Bars have a decent margin, buy for 0.67, sell for a dollar.
  • The data represents 1 year’s worth of transactions.

Safe to say at that rate we want to push Greeting Card sales and can use a sale on candy bars as incentive.

We see in our summary table that the average greeting card transaction has 2.53 greeting cards.

Now the big question:

If we can boost the average card quantity from 2.53 to 3 with a promotion of "Buy at least 3 greeting cards and get a free candy bar." … what would be the impact?

With a mean of 3.0 less the current mean of 2.531 that gives a net mean increase of 0.469 greeting cards per transaction.

We can take that delta, or the net mean increase, and multiply it by the profit to estimate how much of a gross profit increase we can see on greeting cards. Simple, 0.469 * a per card profit of $4 giving us $1.876.

Now we’re getting somewhere. If we boost the average to 3, we can expect to see another $1.876 per transaction. But hold on, the candy bars aren’t free though (remember, they cost 0.67), so we need to factor that in. That gives us a net profit increase per transaction of $1.206.

So If we apply that to the total number of greeting card transactions (40,292) we see an estimated net profit boost of $48,608!

Side Note:

When doing these kinds of analyses where someone in the business is looking for a number, it is important to set expectations. Any time I am asked to figure out the impact of some change, I always frame it as an estimate and list out my assumptions that lead to that estimate. Most of the time it is understood that there is no crystal ball, but there is a chance someone thinks there is. Or more likely they know it’s an estimate but they didn’t know any assumptions you may have made.


So at first, the scenario might seem daunting, especially to a junior analyst (and that’s OK!). But, if we break down the problem into smaller chunks it becomes a lot easier.

Conclusion

Thanks for taking the time to read this. I hope you got something useful out of it that you can apply to a real world scenario. If you have any questions or feedback please feel free to comment.

If you enjoyed this article, you might also enjoy my article on using PuLP for a simple optimization problem.

Python + Pulp Optimization: A Simple Logistics Example

Keep learning!


Related Articles