Small Bakery, Data-driven Management

Use transaction data to evaluate and empower bakery business

I don’t know if any of you have ever dreamed the same thing with me. When I was a child, I always dreamed of owning a small local business, like a cafe, a flower shop, a grocery or a bakery. I don’t want to turn it into a big franchise like Starbucks, Whole Foods etc, but I want to make the best of the local, bringing happiness to the whole neighborhood.

After I grew up, I jumped into big cities, learning new knowledge, meeting different people and experience all kinds of exciting life. However, still, that inspiration keeps in my heart, which motivated me to write my third blog using the bakery transaction data from Kaggle. I believe even small business can utilize data to uncover business and customer insights.


Background and Defined Questions

Variables Description: This transaction datasets is really straightforward, it provides four information

Date: It includes transaction from 2016–10–30 to 2017–04–09, roughly 6 months transaction activities
Time: When customer buy these items are recorded at the same time
Transaction: Each transaction has a unique ID, altogether there were about 9k transactions over the 6 months
Items: Specific name of the items, there are about 94 different products sold in the bakery

Problems: As usual, we defined three key question we would like to understand from our analysis

Business Performance: What are our core products that brought us most revenue? How is the growth of our core products?
Traffic Insight: Is there peak pattern when certain pattern were heavily purchased around certain time? 
Market Basket: What kind of products are more likely to be purchased together? What’s is our growth opportunity?

Business Performance

Since we only know the number of transaction in this dataset but not the revenue of the dataset, we could only look at how many times each products were purchased during the six months.

Overall, we could identify that coffee was sold most over the past six months. 5.4k cups of coffee were purchased, which account for 26.7% of the total transaction. Bread comes the second place (3.3k, 16.2%) and Tea was the third mostly purchased. (1.4k, 7%) The rest of the product transaction constructed a long tail.

When we look at the sales of our core products by month, we find that most of them stay relatively flat, most of them experienced a little bit drop between December and January and then the number climbed up little bit again. It could be that during Christmas or New Year, the bakery cut down its operation time. Or lots of people choose stay at home and not went to work, therefore less coffee were sold.

However, when we look at our second tier of core products (top 7–12), we find that some of the products performance is quite alarming. For example, Farm House and Medialuna experienced an executive five-month decline. More context needs to examine in order to figure out if there’s some change in our supply or some product issue.

Traffic Insight

Secondly, I calculated the average sales each hour by each top product category to identify if there’s products that customers tend to purchase them on a specific time, so that we could set up specific in-store campaign to boost our sales.

Based on our visualization, we find that:

  • Bread and Coffee seem to share similar pattern. Both products gained a peak around 10 am (Before work) and also a small peak around 2 pm (After Lunch Break)
  • Pastry shown one peak at 10 am as well. What’s more, there’s a small peak around 5 pm as well (after work before dinner)
  • Sandwich seems to a choice of lunch for many people. It reached its peak around 1pm and then fell down gradually
  • Tea was chosen more often in the afternoon after lunch.

As we look through our second-tier products, we find that morning peak (around 10 pm) and afternoon peak (2–3 pm) were quite prevalent across different product. Besides that, there are also some interesting points worth pointing out:

  • Hot chocolate demonstrated on peak around 6pm, which was quite different from other products. One hypothesis could be people would like to drink something to warm it up but don’t want drink with caffeine.

Market Basket Analysis

We analyzed how many items were sold for each product and when they were sold. Finally we came to answer the question, which item is more likely to be purchased together with another item.

I won’t cover specific methodology of market basket analysis. For those who are interested in this techniques, feel free to one Kaggle kernel created by Xavier Basically, there are three metrics evaluating the market basket:

  • Support: how frequently the item set appears in the data set
  • Confidence: the percentage in which Y is bought with X
  • Lift: how much X, Y are bought together more likely than X, Y are independent with each other
Top 10 rules (With Coffee, min 0.01 support, 0.4 confidence)

After calculating these three metrics of all different combination, we selected top 10 item sets by the order of lift, controlling the minimum threshold of support and confidence. We find that of all association rules, items are connected with coffee. The redder the circle is, the more likely these two items are purchases together, which indicates that toast and coffee are most likely to be bought together (lift = 1.47). The bigger the circle is, indicating that the set happened more frequently. Here, cake and coffee were most frequently bought together. (support = 0.054)

Coffee located in the center of the association network is quite as what we expected since it takes 26.7% of the transaction. But besides that, I wonder if we exclude coffee in our analysis, will we find any interesting co-consumption pattern between other two product? Even though right now there’re not many transaction for them, but we could turn it into a growth opportunity?

Top 10 rules (Without Coffee, min 0.001 support, 0.4 confidence)

After I exclude all of the coffee records, the association rules network looks more diversified even though we have to lower its level of support. And I find some interesting connection that may be worth further researching:

  • Salad + Extra Salami or Feta: People usually would like to personalize their salad recipe. By adding extra add-on, we could potentially increase the average price of salad.
  • Cookies + Alfajores + Juice: People who eat cookies/ alfajores, besides choosing coffee, would choose juice as their second choice.
  • Coke + Juice + Sandwiches: People who eat sandwiches would often choose juice or coke. Here we see a strong connection between food and beverages.

Final Thoughts

There are obviously more topics we could explore on this datasets. And if we see this bakery business from a higher level, I think there are more information that needs to be included:

  • Price: price of each SKU, by joining the transaction data and price data, we could analyze the revenue of our product, and think about increasing our premium products.
  • Cost: from a operational and financial perspective, we also need to analyze the inventory and profitability of our products list. Probably there are some products we could cut down or include some new SKU.
  • Customer: to understand who are our target and most profitable customers, it would be ideal for us to know who bought what kind of products.

More problems can be worked out from this dataset. For the full analysis of this work, please check out my Github for the R code.

This is my third posts. Feel free to leave any comment or feedback. And it is hoped that it is starting point of my Medium Blog and Data Sciences Life.