The world’s leading publication for data science, AI, and ML professionals.

Comparing VADER and Text Blob to Human Sentiment

NLP Tools vs Humans… go!

Comparison of NLP Tools to Human Sentiment for Tweets

I recently worked on a text classification project which used Tweets that included sentiment labels. I was curious how these human-provided labels would differ from popular sentiment detection tools. I chose two tools, VADER and Text Blob, and ran a little experiment. You can find code for the complete experiment here, but I’ll include a few code snippets as I go along.


VADER Sentiment

Let’s start by loading the labeled tweets and creating a new column for the VADER sentiment.

import pandas as pd
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
tweets = pd.read_csv('../data/prepped_sxsw_tweets.csv')
# Get the VADER sentiments
def get_vader_sentiment(analyzer, tweet):
    tweet = tweet.replace('#','')  # include hashtag text
    vader_scores = analyzer.polarity_scores(tweet)
    compound_score = vader_scores['compound']
    vader_sentiment = None
    # using thresholds from VADER developers/researchers
    if (compound_score >= 0.05):
        vader_sentiment = 'positive'
    elif (compound_score < 0.05 and compound_score > -0.05):
        vader_sentiment = 'neutral'
    elif (compound_score <= -0.05):
        vader_sentiment = 'negative'
    return vader_sentimentanalyzer = SentimentIntensityAnalyzer()
tweets['vader_sentiment'] = tweets.apply(lambda row: get_vader_sentiment(analyzer, row['tweet_text']), axis=1)
tweets.head(3)

Cool! We just got the sentiment from VADER. I’m using the compound score to label the tweet as Positive/Neutral/Negative per the VADER documentation.

  • positive: compound score >= 0.05
  • neutral: (compound score > -0.05) and (compound score < 0.05)
  • negative: compound score <= -0.05

Text Blob Sentiment

Now, let’s get the sentiment from Text Blob using the code below.

from textblob import TextBlob
def get_text_blob_sentiment(tweet):
    polarity = TextBlob(tweet).sentiment.polarity
    # The polarity score is a float within the range [-1.0, 1.0]. 
    textblob_sentiment = None
    if (polarity &gt; 0):
        textblob_sentiment = 'positive'
    elif (polarity == 0):
        textblob_sentiment = 'neutral'
    elif (polarity &lt; 0):
        textblob_sentiment = 'negative'
    return textblob_sentiment
tweets['text_blob_sentiment'] = tweets.apply(lambda row: get_text_blob_sentiment(row['tweet_text']), axis=1)
tweets.head(3)

Great! We just got the sentiment from Text Blob. I’m using the sentiment polarity score to label the tweet as Positive/Neutral/Negative. The polarity score is a float within the range [-1.0, 1.0].

A Deeper Look

Now let’s compare the human-labeled sentiment to the tool-labeled sentiment. I used bar plots to visualize the comparison with totals at the top of each bar for clarity. If you just looked at the bar heights and didn’t pay attention to the "Number of Tweets" on the side of each subplot, you could be misled.

The two tools do seem to be pretty comparable, with similar breakdowns for Negative, Neutral, and Positive tweets.

  • 12.90% VADER Negative, 13.70% Text Blob Negative
  • 41.70% VADER Neutral, 37.00% Text Blob Neutral
  • 45.40% VADER Positive, 49.30% Text Blob Positive

To no surprise, there is a notable difference between the human and tool sentiment labels. 47.10% of tweets had differing human and VADER sentiments. 51.50% of tweets had differing human and Text Blob sentiments.

The dataset provider mentions that the human-labeled sentiment "directed at" a brand or product. This may account for the difference. Let’s try with another, similar dataset.

# Get the VADER sentiments
analyzer = SentimentIntensityAnalyzer()
apple_tweets['vader_sentiment'] = apple_tweets.apply(lambda row: get_vader_sentiment(analyzer, row['text']), axis=1)
# Get the Text Blob sentiments
apple_tweets['text_blob_sentiment'] = apple_tweets.apply(lambda row: get_text_blob_sentiment(row['text']), axis=1)
apple_tweets.head(3)

In this case, we notice some larger differences between the two tools. VADER and Text Blob have similar numbers of Positive Tweets but differ quite a bit on Negative and Neutral Tweets.

  • 33.00% VADER Negative, 24.80% Text Blob Negative
  • 34.90% VADER Neutral, 42.80% Text Blob Neutral
  • 32.10% VADER Positive, 32.40% Text Blob Positive

Again, no big surprise here. We see another notable difference between the human and tool sentiment labels. 39.70% of tweets had differing human and VADER sentiments. 42.70% of tweets had differing human and Text Blob sentiments.

Second opinion on Neutral?

While we probably will trust the human sentiment labels over the tool sentiment labels, we could use the tools to get a "second opinion".

Human opinions on Neutrals might vary from human to human and even as a single human goes about labeling neutral text. Also, we might want to create a binary, Positive/Negative classifier. These are both compelling scenarios for using the tools to get a second opinion.

How might we do that? One way is to get the sentiment from both tools and if both agree, use that sentiment in place of the human Neutral label. I used this approach to get the following guidance on re-labeling Neutrals:

---- Apple and Google @ SXSW Tweets -----
28.60% of Neutral Tweets could be re-labeled.
24.50%  of Neutral Tweets could be re-labeled to Positive.
4.10%  of Neutral Tweets could be re-labeled to Negative.
---- Apple Tweets -----
21.00% of Neutral Tweets could be re-labeled.
18.50%  of Neutral Tweets could be re-labeled to Positive.
2.50%  of Neutral Tweets could be re-labeled to Negative.

Conclusions

Humans are the ideal text labeling solution but sentiment tools like VADER and Text Blob are certainly useful. Here are two specific cases.

CASE 1You need sentiment but don’t have the resources to have humans label your Test/Train data.

I would certainly consider using VADER and Text Blob together to help me label text data by hand, especially as a solo effort. I would consider using the tool sentiment where both tools agreed and then hand label the cases where the two tools differed.

CASE 2You want a second opinion on Neutral text.

Human opinions on Neutral text might vary from human to human and even as a single human goes about labeling neutral texts. Getting a "second opinion" would be helpful to "standardize" on Neutral sentiment OR if you wanted to perform binary classification on Positive or Negative and didn’t want to just drop the Neutral texts.


Related Articles