The world’s leading publication for data science, AI, and ML professionals.

The Most Favorable Pre-trained Sentiment Classifiers in Python

Inspecting the performance of Vader, Happy Transformer, TextBlob, and Google NL API, discussing their limitations and tips for selecting…

Photo by Mark Daynes on Unsplash
Photo by Mark Daynes on Unsplash

Sentiment analysis is a large field in natural language processing (NLP) that uses techniques to identify, extract and quantify emotions from textual data. In companies, methods of sentiment analysis help automatically understand customer feedback, evaluate social media conversations, and might also help prioritize communication with customers in customer care departments.

NLP and sentiment studies have also blended economic research in A-class journals over the last one or two decades. Antweiler and Frank (2005), for example, quantified the content in messages posted on internet stock message boards. Algaba et al. (2020) surveyed many sentiment method applications in a Journal of Economic Surveys paper.

In terms of methodology, data scientists generally have two options for building their sentiment classifiers that both have pros and cons. Training the model from scratch usually involves one or more of these components: making use of a widely accepted sentiment lexicon, scoring sentiment by human experts, labeling data by agency contractors or research assistants, and tuning the model that performs well on the rest of the dataset. This process may be costly and time-consuming.

On the other hand, using pre-trained classifiers saves a lot of time. These models are easy-to-use with a couple of lines of code, but the specificity of their training datasets might constrain them. This article will focus on the latter option and show the possibilities of four pre-trained sentiment classifiers implemented in Vader, Happy Transformer, TextBlob, and Google cloud NL API.

You’ll learn more about:

  • implementation in Python
  • training data and model architectures
  • pros and cons of each classifier.

To contrast the performance of all classifiers, I will use the data of news headlines, which differs from the original training datasets of all tested algorithms.

Data

A Million News Headlines contains data of news headlines published over eighteen years from the Australian Broadcasting Corporation (ABC). The dataset is available from Kaggle under the public license.

Over 1,2 million records contain the date of publishing and the news headlines. Here are a couple of examples:

Image 1 - A Million News Headlines data (image by author)
Image 1 – A Million News Headlines data (image by author)
Photo by Possessed Photography on Unsplash
Photo by Possessed Photography on Unsplash

Vader

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based Sentiment Analysis tool that is specifically designed to detect sentiments expressed in social media.

Twitter tweets supplemented the LIWC, ANEW, and the General Inquirer lexicons to tune the model’s accuracy. In the original paper, Hutto and Gilbert (2014) use the 8-step methodology to construct and validate the classifier:

Image 2 - Methodology of VADER construction (source: Hutto and Gilbert , 2014, draw.io)
Image 2 – Methodology of VADER construction (source: Hutto and Gilbert , 2014, draw.io)

These scores are the VADER’s output:

  • pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1 or close to it with float operation).
  • The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and 1 (most extreme positive). This is the most useful metric if you want a single measure of sentiment for a given sentence.

Here is the Python implementation:

To the best of my knowledge, VADER is the best pre-trained sentiment classifier for social media conversations from networks such as Facebook or Twitter. It is free and easy to use. Its methodology is clearly described in the original paper so that everyone can cite it and see how it works. On the other hand, social media use specific language, and using the classifier on some specific dataset might cause bias in the results.

Photo by Ralph Hutter on Unsplash
Photo by Ralph Hutter on Unsplash

TextBlob

TextBlob is a simple Python library for processing textual data and performing tasks such as sentiment analysis, text pre-processing, etc.

The sentiment property provides of tuple with polarity and subjectivity scores. The polarity score is a float within the range [-1.0, 1.0], while the subjectivity is a float within the range [0.0, 1.0], where 0 is very objective and 1 is very subjective.

  • Polarity, in simple terms, means emotions expressed in a sentence – negative vs. positive
  • Subjectivity expresses some personal feelings, views, or beliefs – objective vs. subjective

We need to go into the source codes in terms of methodology to find out how TextBlob is constructed. Aaron Schumacher (2015) looked deeper into the commit codes for each sentiment indicator. It heavily relies on NLTK and Pattern libraries and implements many of their functionalities.

A user can choose from two built-in classifiers:

  • [Pattern](https://github.com/clips/pattern/) Analyzer – default classifier that is built on the Pattern library
  • Naive Bayes Analyzer – NLTK classifier trained on a movie reviews corpus

Here is the implementation in Python using the default classifier:

Let’s mention the pros and cons now. Textblob is a simple method of sentiment analysis everyone with limited programming skills can use. Considerable limitations come if we want to use it for more advanced and scientific projects. There is no paper that we can use to cite the methodology, and it takes some time to find out more about the internal architecture if we don’t want to use it as a black-box model.

Photo by Tony Hand on Unsplash
Photo by Tony Hand on Unsplash

Happy Transformer

Happy Transformer is a package built on top of Hugging Face’s transformer library. The official documentation is clear and useful, including many tutorials and code examples. It was first presented in the proceedings from the 2020 CUCAI conference, where it received the best paper award.

The output of the text classifier are these scores:

  • label– "positive", "negative", or "neutral" describes the polarity of sentiment
  • score– float value in the range [0:1] reflecting the intensity of sentiment

With Hugging Face’s transformers, we can import many pre-trained models that are tuned for specific purposes. For example, FINBert is designed to tackle the NLP tasks in the financial domain. On the Hugging Face project website, you’ll find detailed information about model architectures, including links to original papers, which is a great thing.

A good video is worth a thousand words. Check this one for an overview of implementation in Python.

Happy Transformer is technically highly advanced while still very easy to implement NLP library. On the side of possible constraints, let’s mention that the text classifier does not support multi-class probabilities and presents only a single value for labelland score. Some other classifiers (VADER, for example) also separately show the sentiment scores for the other two categories. This is important for some sort of analyses where we look not only at the overall sentiment but specifically at negative or neutral sentiments.

Photo by David Pisnoy on Unsplash
Photo by David Pisnoy on Unsplash

Google Cloud Natural Language API

Google provides sentiment analysis, and entity sentiment analysis, as part of its cloud services. Google Cloud NL API is a highly developed infrastructure that draws from the work of many talented engineers and research scientists.

To access the cloud technology, we can use Google cloud client libraries remotely on the computer, or the Cloud Shell, an online development environment accessible anywhere with a browser.

Sentiment analysis identifies the overall attitude with numerical score and magnitude values.

  • score of the sentiment ranges between -1.0 and 1.0 and corresponds to the overall emotional leaning of the text
  • magnitude indicates the overall strength of emotion (both positive and negative) within the given text, between 0 and +infinite.

This sentiment analysis tutorial contains a complete python implementation. Here is the summary of positives and negatives, as I see them from my perspective:

Image 3 - Summary of positives and negatives of Google NLP API (source: draw.io)
Image 3 – Summary of positives and negatives of Google NLP API (source: draw.io)
Photo by Brett Jordan on Unsplash
Photo by Brett Jordan on Unsplash

How to select the suitable classifier for my project ?

The methodological aspects, datasets, and domains they are designed to focus on predetermine the models’ effectiveness in classifying sentiment in other projects. Before deciding on the sentiment method, it is always important to validate that it works on the data we work with.

A good technique is to run the text classification on a solid piece of our data, and look at tails, both positive and negative ones. I took the first 1000 records of my ABC dataset and ran text classification with Vader, TextBlob, and Happy Transformer. Next, I sorted the data and checked if the sentiment results of the first 10 negative and positive records made sense.

Here is what the results look like:

Image 4 - Validation of TextBlob, Vader and Happy Transformer
Image 4 – Validation of TextBlob, Vader and Happy Transformer

Vader’s compound scores don’t markedly differ from how a human rater would likely label the data. Both positive and negative scores reflect the prevalent emotion from the headlines ("wins", "dreams", "ambitious" vs. "kill", "terrorist"," war"). Happy Transformers also shows scores (HT label and HT score) firmly in line with my own scoring.

As a human rater for TextBlob, the sentence "families confront korean president elect over" would not receive a score of 0.8 (=very positive). I cannot see any indication of positive emotion in this short text. This indicates that TextBlob might not fit well for this particular dataset and that Vader and Happy Transformer might be better options for the ABC data.

Conclusion

This article focuses on models that classify sentiment on the scale [-1,1]. Other sentiment classifiers that deal with specific emotions, such as fear, anger, or happiness (Text2emotion, NRCLex, among others), and another popular text classifier, Flair, go beyond the scope of this article.

To choose the suitable classifier, always consider:

  • data/domain the model was initially developed for
  • availability of citable resources, in case you need it
  • validate the results as a human rater

Jupyter notebook for the performance comparison is available for download on my GitHub.

Did you like the article? You can invite me for coffee and support my writing. You can also subscribe to my email list to get notified about my new articles. Thanks!

References:

Antweiler, W., Frank, M. Z. (2005). Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards. The Journal of Finance, Vol. 59(3), pp. 1259–1294.

Algaba, A., Ardia, D., Bluteau, K., Borms, S. (2020). Econometrics meets sentiment: an overview of methodology and applications. Journal of Economics Surveys, Vol. 34(3), pp. 512–547.

Schumacher, A. (2015). TextBlob Sentiment: Calculating Polarity and Subjectivity. Retrieved from https://planspace.org/20150607-textblob_sentiment/.

Hutto, C., Gilbert, E. (2014). VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. Retrieved from https://ojs.aaai.org/index.php/ICWSM/article/view/14550.


Related Articles