The world’s leading publication for data science, AI, and ML professionals.

Where should I eat after the pandemic? (Part 2/2)

Decision Making with Aspect-Based Sentiment Analysis using Transformers.

In the last article, I trained a model on the ABSA task from the SemEval-2014 dataset and analyzed its performance, speed, and behaviors. This article details how I use this model to choose a restaurant to dine at from the Yelp dataset. Without further delay, let’s get started!

Yelp Dataset

We can download the Yelp dataset in one of two ways:

The first requires signing an agreement with Yelp, after which the dataset can be downloaded as a zip file. The Kaggle method requires an account with a username and API key setup locally at ~/.kaggle/kaggle.json. The following link can help you get set up with Kaggle:

If you use Yelp, you have to download the zip file, then upload it to Google Colab. If you use Kaggle, however, you can avoid the extra step and stream the data into Google Colab directly. For myself, I’ll use the Kaggle method for this tutorial. The following code downloads the dataset from Kaggle:

Polarity

To score each review, we need to map each label to a polarity ranging from [-1, 1].

Polarity Mapping
Polarity Mapping

Instead of taking the classifier’s hard classification, it will be more informative to deal with its soft. The soft classification will be informative on how positive or negative an aspect of a review might be. We can find an aspect’s expected polarity over a review by utilizing the probability vector produced by the model and the polarity map we’ve defined above. For example, let’s say we input a review to the model and evaluate along the aspect of food and get the following output:

Example Output
Example Output

Instead of assigning food to have a polarity of -1, we find the expected polarity over these labels as follows:

As we can see, this method is much more practical because we can get a gradient between positive and negative polarities, rendering a more representative sentiment for each category.

Tips

There is one important thing to note before we run our algorithm. There are two sources of reviews in the dataset: reviews and tips. Tips are more compact than reviews; they’re generally only one sentence. This is helpful because one of our model’s shortcomings is classifying larger bodies of text, as we saw in the previous article. Here is a comparison of the cumulative distributions for the number of words found in the text from each type of review:

Distribution Comparison
Distribution Comparison

Clearly, the number of words in a tip is much fewer than in a review. Thus, tips will work well within the bounds of the model. With this in mind, I’m also going to filter for restaurants that aren’t closed down and have greater than or equal to 100 associated tips. This increases the likelihood that there is sufficient information to generate a rating for each aspect. The following code will run batch predictions for all of our filtered tips and write them to a JSON file to be processed in the next step.

Additionally, I’m going to append one more aspect to our set: the restaurant’s overall star rating. Instead of using the stars directly, I’ll make an adjusted star system considering each user’s average stars.

In assessing the reviews, the bias distribution is skewed left. Meaning, people tend to give more stars, on average, than expected in a proper 1–5 star rating system.

Star Ratings Bias Correction
Star Ratings Bias Correction

After making this correction, we are now ready to run my personal importance weights over every tip to score the restaurants according to my preferences.

Results

After running the above code, I’ve finally come to my restaurant list 🎉 . It looks like I’ll be dining at the Singing Pandas Asian Restaurant & Bar!

Results
Results

After looking up the place on Google, I’m pretty excited to try it out when the time is right. Taking a glance at the reviews, I’ve seen only positive things said about this place. The following is a screenshot from the results of my Google search:

Source: Google Search for "Singing Pandas Asian Restaurant & Bar"
Source: Google Search for "Singing Pandas Asian Restaurant & Bar"

Now that I have my result, I want to play around with some different weights. Trying a few different configurations of importance weights, I got very similar answers, which made me question how much variation there is in the results. To test this, I generated 100,000 random importance weights from a Dirichlet distribution and ran the analysis. Here is what I came up with:

From these findings, it looks like a lot of you will likely be joining me at the Singing Pandas Asian Restaurant & Bar! Clearly, the caveat of using aspects weighted relatively the same by most people is the potential lack of diversity in the results. That being said, there are many ways you can get more personalized results. For example, you can try adding more aspects or try filtering on the type of food you want to eat by using the categories attribute in the dataset’s business table. As for me, I’m pretty happy with my result, so I’ll stop my analysis here before it becomes past my dinner time 😉 .

Conclusion

In summary, this article’s purpose was to show how we can use large amounts of reviews to our advantage if we employ machine learning methods to do our work for us.

An improvement that can be made in this kind of opinion mining would be developing models that are not domain-specific. In other words, having a framework that derives any product’s features while predicting each feature’s polarity from its corresponding reviews would be a clear advancement.

Whether it be what earbuds you should buy, what shoes you should wear, or what restaurant you should dine at, Aspect-Based Sentiment Analysis can be a great tool in Decision Making by saving us the time and energy that ordinary decision making leaves us with. That’ll be all for this article, and I hope everyone stays safe and healthy throughout this pandemic. When all is over, hopefully, you’ll catch me stuffing my face with Chinese food at Singing Pandas Asian Restaurant & Bar in Chandler, Arizona!

See you next time!

Code

Links

References

[1] Chi Sun and Luyao Huang and Xipeng Qiu, Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (2019), arXiv preprint arXiv:1903.09588


Related Articles