SENTIMENTAL ANALYSIS USING VADER

interpretation and classification of emotions

Aditya Beri
Towards Data Science

--

Sentiment analysis is a text analysis method that detects polarity (e.g. a positive or negative opinion) within the text, whether a whole document, paragraph, sentence, or clause.

Sentiment analysis aims to measure the attitude, sentiments, evaluations, attitudes, and emotions of a speaker/writer based on the computational treatment of subjectivity in a text.

Source : MonkeyLearn ,Image Link:https://bit.ly/2X806dW

Why is Sentiment Analysis difficult to perform?

Though it may seem easy on paper, Sentiment Analysis is a tricky subject. A text may contain multiple sentiments all at once. For instance,

“The acting was good , but the movie could have been better”

The above sentence consists of two polarities!!!

VADER

VADER ( Valence Aware Dictionary for Sentiment Reasoning) is a model used for text sentiment analysis that is sensitive to both polarity (positive/negative) and intensity (strength) of emotion. It is available in the NLTK package and can be applied directly to unlabeled text data.

VADER sentimental analysis relies on a dictionary that maps lexical features to emotion intensities known as sentiment scores. The sentiment score of a text can be obtained by summing up the intensity of each word in the text.

For example- Words like ‘love’, ‘enjoy’, ‘happy’, ‘like’ all convey a positive sentiment. Also VADER is intelligent enough to understand the basic context of these words, such as “did not love” as a negative statement. It also understands the emphasis of capitalization and punctuation, such as “ENJOY”

Polarity classification

We won’t try to determine if a sentence is objective or subjective, fact or opinion. Rather, we care only if the text expresses a positive, negative or neutral opinion.

Document-level scope

We’ll also try to aggregate all of the sentences in a document or paragraph, to arrive at an overall opinion.

Coarse analysis

We won’t try to perform a fine-grained analysis that would determine the degree of positivity/negativity. That is, we’re not trying to guess how many stars a reviewer awarded, just whether the review was positive or negative.

Broad Steps:

  • First, consider the text being analyzed. A model trained on paragraph-long reviews might not be effective. Make sure to use an appropriate model for the task at hand.
  • Next, decide the type of analysis to perform. Some rudimentary sentiment analysis models go one step further, and consider two-word combinations, or bigrams. We will be going to work on complete sentences, and for this we’re going to import a trained NLTK lexicon called VADER.

DATASETS TO USE

For this model you can use a variety of datasets like amazon reviews, movie reviews, or any other reviews for any product.

import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

VADER’s SentimentIntensityAnalyzer() takes in a string and returns a dictionary of scores in each of four categories:

  • negative
  • neutral
  • positive
  • compound (computed by normalizing the scores above

Let us analyze some random statements through our sentimental analyzer

a = 'This was a good movie.'
sid.polarity_scores(a)
OUTPUT-{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}a = 'This was the best, most awesome movie EVER MADE!!!'
sid.polarity_scores(a)
OUTPUT-{'neg': 0.0, 'neu': 0.425, 'pos': 0.575, 'compound': 0.8877}

Use VADER to analyze Reviews

import numpy as np
import pandas as pd

df = pd.read_csv('../TextFiles/reviews.tsv', sep='\t')
df.head()
df['label'].value_counts()
OUTPUT-neg 5097
pos 4903
Name: label, dtype: int64

Clean the data (optional)

This step to clean any blank spaces within the reviews.

# REMOVE NaN VALUES AND EMPTY STRINGS:
df.dropna(inplace=True)

blanks = [] # start with an empty list

for i,lb,rv in df.itertuples():
if type(rv)==str:
if rv.isspace():
blanks.append(i)

df.drop(blanks, inplace=True)

Adding Scores and Labels to the DataFrame

Now we’ll add columns to the original DataFrame to store polarity_score dictionaries, extracted compound scores, and new “pos/neg” labels derived from the compound score. We’ll use this last column to perform an accuracy test. The reviews in this method will be classified into negative, positive and, neutral ratio.

df['sc ores'] = df['review'].apply(lambda review: sid.polarity_scores(review))

df.head()

Now will call out compound as a separate column and all values greater than zeroes will be considered a positive review and all values less than zero would be considered as a negative review.

df['compound']  = df['scores'].apply(lambda score_dict: score_dict['compound'])

df.head()
df['comp_score'] = df['compound'].apply(lambda c: 'pos' if c >=0 else 'neg')

df.head()

So now we have got a complete analysis of every review as either positive or negative.

Now let us pass some new reviews to test how our model performs!

# Write a review as one continuous string (multiple sentences are ok)
review = 'The shoes I brought were amazing.'
# Obtain the sid scores for your review
sid.polarity_scores(review)
OUTPUT-
{'neg': 0.0, 'neu': 0.513, 'pos': 0.487, 'compound': 0.5859}
review='The mobile phone I bought was the WORST and very BAD'# Obtain the sid scores for your review
sid.polarity_scores(review)

OUTPUT-
{'neg': 0.539, 'neu': 0.461, 'pos': 0.0, 'compound': -0.8849}

Conclusion

The results of VADER analysis don’t seem to be only remarkable but also very encouraging. The results show the advantages which will be attained by the utilization of VADER in cases of web sites wherein the text data could be a complex mixture of a range of text.

ADDITIONAL RESOURCES

There are two of my other articles published in Towards Data Science publication on the related topics for this blog. Do have a read on those for better understanding in Natural Language Processing

Stemming vs Lemmatization — https://link.medium.com/JWpURpQjt6

Word vectors and Semantics — https://link.medium.com/tuVCswhYu6

This was just a small sneak peek into what sentiment analysis is and how VADER works.
Feel free to respond to this blog below for any doubts and clarifications!

--

--