Yelp Reviews Analysis for Bubble Tea Shops

0
7 min readNov 14, 2020

—(I) Where should I open a bubble tea shop? (II) Listen to your customers with Natural Language Processing.

Photo by Rosalind Chang on Unsplash

Introduction

Bubble Tea (also known as pearl milk tea, boba milk tea, or simply boba) is a Taiwanese drink invented in Taichung in the 1980s. The tea is mixed with milk or fruits and topped off with tapioca pearls and sugar.

The bubble tea market in the U.S. has witnessed significant growth over the years. According to Google Trends, the popularity of bubble tea in the U.S. is increasing exponentially since 2010.

Therefore it is intriguing to study the bubble tea shops using our data science approach. Specifically, I wonder if it’s still the right timing to open a new bubble tea shop. And how to make my customers happy if I decide to run one.

This post will explore several aspects of the bubble tea shop by analyzing the data from Yelp Fusion API.

In particular,

- Can we find a quantitive model to predict the number of bubble tea shops?

- Which factors are essential in determining the number of bubble tea shops in a city?

- What’re the most popular flavors and drinks for bubble tea shop?

- What prompts the customer to give a 1-star review on Yelp for a bubble tea shop?

To answer the first two questions, I’ll analyze the bubble tea shops’ location along with the demographic data available in the U.S. Census Bureau.

For the rest of the questions, I will explore the Yelp reviews and ratings for the bubble tea shops in New York City.

Exploratory Data Analysis for bubble tea shops’ location

Let’s first take a look at the bubble tea shops’ location.

From the Yelp fusion API, we can get a lot of information about the bubble tea shop like its name, address, price, category…etc. (for more details, see the documentation).

Since we want to analyze the bubble tea shops’ location, we only keep the bubble tea shop’s address in the dataset for simplicity.

We can visualize the location of the bubble tea shops on the map:

Most of the bubble tea shops are located in the major cities of each state. Especially there are a massive amount of bubble tea shops in Los Angeles and San Francisco.

— Population

If we compare the number of bubble tea shops with each state’s population, we can see the positive correlation between the number of bubble tea shops and the population.

Given that bubble tea is so prevalent in the East Asian countries, we should also consider the east-Asian population in each state.

Using the U.S. census bureau’s demography data, I plot the number of bubble tea shop v.s. east-Asian population. As expected, we get a beautiful linear relationship!

— Age Group

In addition to the population, we can ask if the bubble tea shop is related to a particular age group.

Interestingly, the number of bubble tea shops has a positive correlation with the percentage aged 25–44 but seems to be uncorrelated with the Aged 14–24 group since I would naively expect that most of the customers are teenagers and young adults.

To summarize this section, I present the correlation matrix between the number of bubble tea shops, the percentage of different age groups, and the population.

Modeling

We see that several features are seemingly the crucial factors in determining the number of bubble tea shops in a city since these features positively correlate with the number of bubble tea shops. However, I want to remind readers that it’s dangerous to conclude by merely comparing the correlation! Because correlation does not always imply causality! Especially when we use two features that are causally related, i.e., population and east-Asian population. (For more details, see Spurious relationship)

To understand each feature’s importance and predict the number of bubble tea shops, I build a model through support vector regression (SVR) and use the grid search algorithm for tuning the hyper-parameters.

The dataset is split into two parts: a training set and a test set. I train the model using the training set and use the test datasets to evaluate the model.
The comparison between the number of bubble tea shops in the test dataset and the model’s prediction is shown in the left figure.

Overall, it seems that we can make a reasonable prediction by using the trained model.

To gain further insight from the model, I compute the permutation importance for each feature. The permutation importance quantifies how important a feature is by measuring how the fitting score decreases when a feature is not available. We see that the East-Asian population, the total population of a state, and the percentage of aged adults 25–44 give significant contributions.

The model for predicting the number of bubble tea shops based on demography data is a useful tool. Especially when we want to know whether the bubble tea market in a place is an oversupply or undersupply. So, don’t forget to check it before starting your own bubble tea business.

Having examined the spatial distribution of bubble tea shops, now, let’s turn our attention to the Yelp reviews in New York City.

Exploratory Data Analysis for bubble tea shops’ review

The dataset contains 10,516 Yelp reviews and ratings for 97 bubble tea shops in NYC since 2004.

— The Distribution of Ratings

Around 40% of customers give a 5-star rating. It means that either 40% of customers are happy with the drink or service,
or for some reason, the customers’ ratings are biased toward 5-star.

There is still much debate over how the objectivity of the star rating system of review websites. But of course, it’s a subject beyond our scope.

— Number of Reviews over Time

The left panel above shows the number of reviews over the year. The exponential growth coincides with the Google Trend plot in the previous post. It shows that bubble tea is getting more and more popular recently.
In the right panel above, we can observe a substantial seasonal variation of the number of reviews and reach their maximum during summertime. Who doesn’t love a refreshing iced bubble tea on a hot summer day, after all?

What’re the most popular flavors and drinks for bubble tea shop?

Inspired by Yelp’s post: Discovering Popular Dishes with Deep Learning, I would like to know the popular flavors and drinks in NYC’s bubble tea shop.

Instead of performing name entity recognition from menus, photos, and reviews with deep learning, I prepare a bubble tea-related lexicon and extract the items mentioned in the 5-star reviews.

We see that bubble tea is one of the most favorable choices. People also like adding pudding, cheese foam for their drinks. Taro milk tea and black milk tea are also popular.

We can get more information by inspecting the word cloud. The left panel is the word cloud for the reviews with 5-start, and the right panel is for those with 1-star.

For positive reviews, most people are probably satisfied with the place, excellent drinks, delicious food, and the bubble tea shops’ location.

The negative reviews mentioned words like “order” and “time.” People were probably complaining about waiting too long for their drinks.

What prompts the customer to give a 1-star review on Yelp for a bubble tea shop?

By just looking at the word cloud, it’s difficult to digest the information from negative reviews. To find out what aspects a bubble tea shop should focus on improving, I present the topic modeling with Latent Dirichlet Allocation (LDA).

LDA model allows us to find the topics of the reviews and the crucial words for each topic.

By modeling the reviews with 1-star through LDA, I found three different topics that people care about :

For topic 1, customers mentioned order, time, minute, line, wait, long. From these keywords, we can easily see that they complain about waiting too long in the queue.

In topic 2, we see that taste, sweet, worst, and water. It probably means that the drinks are terrible.

As for topic 3, there are service, customer service, rude, attitude, and bad. The customers must be disappointed in the bubble tea staff.

By putting all the pieces together, we see that to win customers’ hearts; there are three tips. First, we should make sure customers getting their drink as quickly as possible. Secondly, keep the quality of drinks good and finally, be friendly to customers!

To learn more about the analysis, see the link to my Github.

--

--