What Type of Starbucks Customer Are You?

Payton Soicher

Follow

Published in

Towards Data Science

15 min readNov 15, 2018

--

Cluster analysis to show Starbucks customers fall into 4 distinct categories.

Everyone has preferences. Some people believe that nothing can ever beat Mexican food, while others are in constant search for the next best Italian restaurant. You have your favorite burger joint. You have your favorite ice cream parlor. Most preferences on what people do and do not like change depending on who you’re talking to…unless you’re talking about coffee. With all the coffee brands in the world, almost everyone agrees that Starbucks is the spot that you will never turn down. You might not go to McDonalds to get a burger anymore, but you better believe you will be standing in line for your expresso any day of the week at your nearest Starbucks location.

Trying not to sit comfortably in my sports world, I became interested when I came across Starbucks data related to their customers, transactions, and promotional deals that they have received. Promotional data is interesting because people have very specific feelings towards what they will do to get a discounted purchase. They’re extremely important to any company to help get customers to try your products and for customer retention. My favorite place to eat is Chipotle and I will go there at least 3 times a week no matter what obstacles are in my way. I participate in their buy one get one (BOGO) deals, discounted meals when they throw in chips and guac, and if they didn’t, I would still go every single day. Noodles and Company is another food chain that I like but I wouldn’t go out of my way to get a meal there; however, if they gave me a BOGO deal you would see me in the front of the line.

Two different restaurants, two different feelings towards my buying purchases. So what are the different types of customers that Starbucks has? Do people react to different discount promotions? Can Starbucks make more money on promotions if they cluster their customers appropriately? My analysis of Starbucks customers will answer what type of people make purchases at their stores and what promotional offers are most appropriate for the cluster they belong to.

As always, you can see my code at https://github.com/anchorP34/Starbucks-Customer-Clusters. Please leave any comments or questions to topics that you would like to know more about or ideas that I didn’t explore.

Data Cleanup

The Starbucks dataset is split up into three different files: profile, which has low level information about customers, portfolio, which has information about different promotional offers that can be received, and transcript, which has all purchase history and information on when the customer received, viewed, and completed their promotional offers.

PROFILE: For cleaning up the profile data, I needed to figure out how I wanted to handle genders and income that were NULL, and change the become_member_on field to an actual date and not just a string. Looking at the missing genders, those records were responsible for NULL incomes as well. I decided to make missing genders have the value of “U” for unknown instead of deleting those records (they made up about 15% of the data). For figuring out an appropriate selection for income, I looked at the distribution of income for the whole population:

This needs to be cleaned up a little nicer

The black line shows where the mean of the distribution is, and that seemed like a good selection since it doesn’t lean too far one way or the other on our income distribution. After creating the member_date field, which was derived from the became_member_on field, I also split up that date into the member_year, member_month, and member_day to see if that would help give any more additional information. Here is what the final output looked like:

PORTFOLIO: Each promotion has an array that shows the different ways someone could receive the promotion. It’s good practice to transform this into bit flags.

TRANSCRIPT: The value column in the transcript dataframe is a dictionary that has a key and value associated with it. The different key values are offer id and amount. Offer id is associated with the receiving, viewing, and completing of an offer, while amount is just the transactional amount. I changed the value field to show the value of the promotion id or transaction amount, and the value_type field then became the key of the dictionary.

Exploratory Analysis

First thing to look at customer data is to see what type of people make up the customer segments without having to go too in depth. Starbucks customers are roughly 50% male, 35% female, and the rest being other or unknown.

M — Male, F — Female, U — Unknown, O — Other

When looking at the income distribution, females have a wider distribution of incomes and also have higher income than males and other genders. Since we replaced all unknown gender records with the average income of the whole population, there is no distribution to analyze.

Females are holding up the weight for income distribution and are causing the overall mean income to be higher than both the mean male income and mean other income.

Another piece of interesting information to look at is the trend of when customers became Starbucks members. Looking at the count of customers who signed up over the last few years, there are two large jumps of increased user amounts. I would be more curious to see why these dates are so relevant, whether that be app upgrades or universal wide promotions (download the app and we will give you a free coffee mug, etc).

Number of daily customer sign ups over time

To make things a more clear, I changed up the graph to show customer sign ups at a monthly level with year being the line color. I starred the two major jumps that we saw in the graph above, and they look like they took off at exactly the same time. That has to be due to something done by the marketing department. Starbucks also has a downward trend in monthly subscriptions this year, almost wiping out any progress that they had made from the large surge in customer sign ups from the previous year.

Customer subscriptions per month with year being the color code

Customers seem to have similar patterns of when they sign up for being a Starbucks member and there are different income distributions based on their gender input, but that still doesn’t give a good representation of the type of buyer they are and what characteristics they have in common with other buyers. For that, we need to take a deeper look into their purchase history and how responsive they are to different promotions.

Transactional and Promotional Analysis

Looking through the transactional data, it is easy to see when a person receives a promotion, when they view it, and when they complete it. This example is a random person viewing a discount offer that day. The discount is spend $10, receive a $2 credit. You can see that the time is 0 for all of the different values, meaning they acted on this offer on the same day they received it. From this segment, we can try to figure the following information:

1. How many promotions have they received (BOGO, discount, informational)
2. What was the completion percentage of each type of promotion (BOGO and discount, every informational promotion is completed the same day it is issued)
3. How many total transactions have they made since becoming a Starbucks member
4. What is the average transaction amount spent
5. What is the average and median days between purchases for each customer
6. What is the average number of days it takes to complete an offer

Random customer receiving, viewing, and completing a promotional offer.

When trying to see people’s success rates in completing promotional offers, I have to join the transaction dataframe to itself, joining on person and value where the value_type says “offer id”. We also need to make sure that the offer is completed after or on the same day as the received time, and that it fits in a window of the duration of the offer. This can be tricky since a customer can receive the same promotion more than once (I found that out the hard way and it was a nightmare to go back and figure that out). This also raised other questions because there were instances where people completed their offer at a date later than the promotion expiration date. An example:

This promotion was offered to these people at day 576 to them, and both people completed the offer 18 days later. This doesn’t make sense because the duration was only for 5 days? So maybe I am misunderstanding what duration means, or duration doesn’t really matter. I ended up ignoring the duration column.

From here, we can aggregate the data to see at a high level the different promotions, their success rate, and their net reward to the consumer, and the net worth to Starbucks. From an overall perspective, the higher the difficulty (the more money you have to spend), the less likely people are going to complete the offer. Even though the $5 for $5 BOGO and the $10 for $10 BOGO have the same net reward for the consumer, people are more inclined to get the $5 for $5 probably due to convenience. The most successful offer is the $3 for $7 discount, which actually makes Starbucks money in the end. We also see that the $5 for $20 discount is a complete waste of time, only being completed 10% of the time. For the average time it takes to complete the offer, people are more likely to get the $5 for $5 BOGO quicker than any other offer, while the $5 for $20 discount promotion takes the longest to complete, probably due to the larger amount that needs to be spent.

Completed Offers is the success percentage of that offer being completed, Total Completions is the number of times that promotion was completed, Avg Days To Complete is the average days it takes for that promotion to be completed, Net Reward is the reward minus the difficulty, and Net Worth is the Net Reward * Completed offers. This shows how much money Starbucks makes on that promotion.

Net worth should be the most important segment to Starbucks. With BOGO’s, Starbucks doesn’t make any money off of the deal. They’re just biting the bullet to get people to get their reward and then hopefully spend more money in the future. So, for both BOGO deals, the net worth will always be $0 any time they send out that promotion; on the other hand, discounts do give some monetary value to Starbucks. The net worth of discount deals are the total value that can be made off of the discount, multiplied by the percentage chance that the promotion is completed. The $5 for $20 promotion might favor Starbucks by $15, but if only 10% of people complete the promotion, they should only expect to receive $1.35 * the number of people who actually receive the promotion. So in terms of net worth, the $2 for $10 discount is the best promotion to give their customers.

The structure of the discount and BOGO promotions should be pushed in such a way that is financially smart for Starbucks. Discount promotions should be given to customers that will come back on a much more constant basis and do not need to be highly incentivized to come back. If they’re going to come back anyway, giving them a BOGO doesn’t make sense because you’re not making money on the deal, but you will make more money back if you give them discounts, it’s a win-win for both Starbucks and the customer. People who are not guaranteed to come back consistently should be given BOGO’s to reengage their interest and get them to become more likely to spend more money at their stores. The next step is to break out the customers into more appropriate clusters based on their purchasing habits and make better decisions on who gets what promotional deals.

Clustering Segments

Using machine learning with clustering algorithms is one of the more interesting analysis that data scientists encounter due to the fact that there is no correct “answer”. Predicting stock prices or predicting fraudulent bank transactions can be looked at historical yes / no answers, but clustering doesn’t have a label associated to it. It just shows which values are most similar to each other and groups them together.

You can technically pick any number of clusters that you would like to represent your data, but one way to evaluate whether your cluster is performing at a high level is to run the elbow method. In the elbow method, we are looking for significant drops in the sum of squared errors (SSE) from each point to it’s affiliated centroid. Since each increase in K (number of clusters) will create more clusters with fewer number of points, there will be an overall decreasing trend of SSE and K increases. Since a lower number of clusters is easier to decipher and analyze, we want K to be the last significant drop in SSE before it starts to flatten out, hence, looking for the “elbow” in the graph.

Elbow method chart looking at the number of clusters that should be analyzed.

From the chart, 4 clusters is an appropriate number of clusters to analyze on the data. That fits well with our problem since the 4 clusters can be broken up into these types of customers:
1. Won’t react to any promotional deals
2. Will favor BOGO’s over discounts
3. Will favor discounts over BOGO’s
4. Will respond to body BOGO’s and discounts

The input data would be based off a customer matrix that included the following features:

discount_total_offers, discount_completion_pct,
discount_min_completion_days, discount_max_completion_days, discount_completed_offers,
discount_avg_completion_days, discount_avg_net_reward,
bogo_total_offers, bogo_completion_pct, bogo_completed_offers bogo_min_completion_days,
bogo_max_completion_days, bogo_avg_completion_days,
bogo_avg_net_reward, informational_promotions, age, gender,
income, total_transactions, min_transaction_day,
max_transaction_day, avg_transaction, total_transaction_amount,
median_days_between_purchases, avg_days_between_purchases

This is a combination of personal attributes (age, gender, income), BOGO and discount attributes (percent completed, average net reward, average number of days to complete, the fastest time it took to complete, the largest time it took to complete), informational promotions (how many they received), and overall transaction trends (number of transactions made, total spent, average and median days between transactions). If any of the values were NULL, they were replaced with a 0 since the customer didn’t participate in that specific sector of interest (never made a transaction, never completed a promotional offer, etc). I also had to turn the gender field from a categorical variable to 4 dummy variables since the clustering algorithm only takes numeric values.

There is no testing and training set with clustering due to there being no right or wrong answers and no before and after values. Once the clusters were appended to the customers matrix, I wanted to visualize different plots of distributions that can help identify what types of customers were clustered together.

Cluster 2 had the largest group size at 33% of the total population (17,000 customers which comes out to about 5,694 customers), while cluster 3 was the smallest segment at roughly 15%. I am most interested in cluster 3 because that seems to be a very small niche market compared to the other clusters.

I first started with looking at information that didn’t have to do with promotional success values to get an idea of what type of person fits into these clusters. I looked for plots that seemed to be very color segmented with little to no overlap.

Seaborn pair plot that cross analyzes different distributions and color codes them based on the associated cluster.

Looking at the different plots above, the plots most segmented seem to be any plot that has to do with income. It looks like cluster 3 is people who have a small income, while cluster 0 looks like it has people who are making a much larger income. This makes sense why cluster 3 was the smallest group of all the clusters because most Starbucks income lower quartiles were right around $45K, but this group’s income tops out at around that amount.

Income distributions separated out by Cluster

Now looking at transactional data, it’s much harder to separate out the clusters, but we can still take away some information. If we take a look at completion percentages of BOGO’s and discounts, it looks like cluster 3 seems range all over the place with both promotional offers, but cluster 0 seems to stay in the success rate above .5 for both types of promotions.

Seaborn pair plots on transactional information

Looking at total transactions vs total transaction amounts, it looks like cluster 0 is people who spend more money per transaction and don’t go as often, while cluster 3 is the opposite. Clusters 1 and 2 are both in the middle, which makes sense, they don’t really lean one way or the other since they make up a majority of the population.

Total transactions vs total transaction amount shows how many times customers make transactions and how much they spend total.

But so what…

These all are interesting facts about the similarities and differences of these clusters, but how would Starbucks make more money than just blindly giving people different promotions throughout the year?

We can establish if our clustering algorithm is helpful in our customer segmentation by the expected return of our promotions. The expected return takes into account the total possible value Starbucks makes from the discount multiplied by the probability of the event (in this case someone completing their promotional deal). We want to evaluate the net worth of the overall population compared to our cluster to determine if our cluster is more fit for one type of promotion or the other.

If we compare how each cluster did for each reward compared to the overall population average, it can help leverage which promotion we should focus on. Remember, BOGO’s should be to excite the customer to get back to Starbucks and discounts should focus on customers who are consistently coming in to retain some of their purchases back. If we look at cluster 0, they responded to the $5 for $20 promotional deal at a 34% success rate! The whole population success rate was just 9%, so why in the world would you offer a BOGO deal to them? You can make $5.10 on each customer when you give them that deal as opposed to $1.35 that you would make on average for the whole population. Clusters 2 and 3 didn’t respond as well to the discount promotions as clusters 1 and 2, so it would be smarter to give them BOGO promotions to reengage them back into making “free” purchases at Starbucks as opposed to keeping them away by sending them discounted deals.

Breakdown of how each cluster responded to each of the promotional deals. The BOGO_or_Discount field is based off a function that looks at the cluster completion percentage and the overall population completion percentage, and if the cluster value is greater than the overall population completion percentage (with a 5% buffer), then you should offer them a discount deal. If not, you should offer them a BOGO deal to get them reengaged.

By focusing on cluster 0 and 1 as being the “discount promotional clusters”, you will have much higher expected returns on your discounts which will help make up for offering clusters 2 and 3 BOGO deals. You’re retaining your everyday customers that will continue to come back no matter what with small rewards that they will appreciate, while you focus on getting clusters 1 and 2 into the door and enjoying your product. Starbucks makes customers happier from their promotions, and they also retain more money in the process.

Reflection and Conclusion

This is a classic problem companies have to deal with, and I thought that the 4 clusters was the best approach to solving this problem. Of the different datasets that I got to work with, I wanted to keep as much untouched information for our process. This raises problems since there were people who had their listed age being 118 years old or people completing their offer after that deal expired. These would be things that I wish I would have more clarity on to have more clean data and better results.

Some improvement that could have worked more in my favor for this project would have been using a better tool like SQL databases to run some of my analysis. Had I used SQL to figure out rolling differences of people’s transactions all at once instead of doing it individually would have made the client matrix run much faster. Since I didn’t have that ability in python, it took roughly 15 minutes to run that section of the analysis alone. So if I messed up, it was going to take another 15 minutes to see the results again. I also wished I had more factual information to work with, like what what specific product(s) were purchased or rough living locations of where these customers were. Do people’s living area affect spending habits? Do people use their BOGO or discount deals for specific products? That information would help with the clustering results and possibly make better promotional decisions.

Utilizing Starbuck’s personal, transactional, and promotional data can make huge waves in their customers approval and spending habits at their stores. My clustering methods show that there are people who are low income and low spending customers who should have BOGO promotions, people who respond decent to both types of promotional deals, and high income, frequent spenders at Starbucks that should be offered more discount promotions. When affiliating these like minded people together correctly, Starbucks can make sure the right people are receiving the promotion that will elevate their status with Starbucks while continuing to reap the benefits of smart business decisions.