The world’s leading publication for data science, AI, and ML professionals.

Sentiment Analysis: Evaluating the Public’s Perception of the COVID19 Vaccine

A quick, easy, and effective guide on utilizing NLP to run a sentiment analysis in Python on the COVID19 vaccine

Photo by author, Kyle Hum
Photo by author, Kyle Hum

Introduction

With the recent introduction and deployment of the COVID19 vaccines, many of us are wondering what the general consensus or opinion of the vaccine is. As this seems to be a widely debated and questionable topic in all news outlets and social media. This article aims to show how sentiment analysis can be applicable to solve modern day questions. Using the Twitter API, natural language processing (NLP), and Data Visualization, we will run a series of sentiment analyses in Python to determine the percentage of tweets that are positive, neutral, or negative regarding the vaccine

Required packages

This analysis requires the following Python packages: pandas, tweepy, re, textblob, and matplotlib

With the recent changes to Twitter’s API, I found it easiest to create a developer account to extract the data rather than using twint, twitter-scraper, or any other twitter scraping package

Step 1: Login to the Twitter API

import pandas as pd
import tweepy
import re
from textblob import TextBlob
import matplotlib.pyplot as plt
#login credentials for twitter API
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# Setting your access token and secret
auth.set_access_token(access_token, access_token_secret)
# Creating the API object while passing in auth information
api = tweepy.API(auth)

For security purposes, I did not include my Twitter API access keys

Step 2: Create a query to extract tweets from the Twitter API

#extracts the most recent tweets containing the keyword vaccine from the Twitter API
results = api.search(q='vaccine', count=500)
#converts the json output to a data frame saved as a CSV file
json_data = [r._json for r in results]
df = pd.json_normalize(json_data)
df.to_csv('vaccine_tweets.csv')

During this process, I ran into the limitations of the API’s rate limits as it only allows a free dev account to extract a certain amount of tweets every 3 hours. To combat this and extract a larger sample size, I ran the same query on 3 separate days.

Once the tweets were extracted from the API, the JSON output is converted into a data frame and saved as a CSV file.

Step 3: Cleaning and Manipulating the Data

#loads the CSV file and selects only the column relevant to this analysis
df = pd.read_csv("vaccine_tweets.csv", usecols = [4])
#creates a function to remove all @'s, hashtags, and links
#Then applies it to the dataframe
def cleanUpTweet(txt):
    # Remove mentions
    txt = re.sub(r'@[A-Za-z0-9_]+', '', txt)
    # Remove hashtags
    txt = re.sub(r'#', '', txt)
    # Remove retweets:
    txt = re.sub(r'RT : ', '', txt)
    # Remove urls
    txt = re.sub(r'https?://[A-Za-z0-9./]+', '', txt) 
    #removes stop words
    txt = re.sub(r'the', '', txt)
    txt = re.sub(r'and', '', txt)
    txt = re.sub(r'to', '', txt)
    return txt
df['text'] = df['text'].apply(cleanUpTweet)

For this analysis, information such as date, user id, and other meta data were not relevant. Therefore, when loading the CSV file into Python, I selected to use only column 4, which contained the user’s tweet text.

Once the data frame was manipulated to only select the user’s tweet, the data needs to be cleaned and tokenized as it contains some unnecessary characters. A function was created and applied to remove all the @’s, hashtags, links, and stop words from the data frame

Step 4: Determining Sentiment

#creates a function that determines subjectivity and polarity from the textblob package
def getTextSubjectivity(txt):
    return TextBlob(txt).sentiment.subjectivity
def getTextPolarity(txt):
    return TextBlob(txt).sentiment.polarity
#applies these functions to the dataframe
df['Subjectivity'] = df['text'].apply(getTextSubjectivity)
df['Polarity'] = df['text'].apply(getTextPolarity)
#builds a function to calculate and categorize each tweet as Negative, Neutral, and Positive
def getTextAnalysis(a):
    if a < 0:
        return "Negative"
    elif a == 0:
        return "Neutral"
    else:
        return "Positive"
#creates another column called Score and applies the function to the dataframe
df['Score'] = df['Polarity'].apply(getTextAnalysis)

Once the data was cleaned, a function was created, from the TextBlob package, to determine the subjectivity and polarity. Polarity is essential as it evaluates the emotions expressed in the tweet and assigns a numeric value. A polarity of greater than 0 indicates a positive sentiment, a polarity of exactly 0 equals a neutral sentiment, and a polarity of less than 0 indicates a negative sentiment. With the polarity and subjectivity calculated, a new column was created and applied to categorize each tweet into their appropriate 3 categories.

Step 5: Summarizing and Visualizing the data

#visualizes the data through a bar chart
labels = df.groupby('Score').count().index.values
values = df.groupby('Score').size().values
plt.bar(labels, values, color = ['red', 'blue', 'lime'])
plt.title(label = "Vaccine Sentiment Analysis - 12/17/2020", 
                  fontsize = '15')
#calculates percentage of positive, negative, and neutral tweets
positive = df[df['Score'] == 'Positive']
print(str(positive.shape[0]/(df.shape[0])*100) + " % of positive tweets")
positive = df[df['Score'] == 'Neutral']
print(str(positive.shape[0]/(df.shape[0])*100) + " % of neutral tweets")
positive = df[df['Score'] == 'Negative']
print(str(positive.shape[0]/(df.shape[0])*100) + " % of negative tweets")
Photos by author, Kyle Hum
Photos by author, Kyle Hum

Lastly, the data was visualized using a bar chart and a calculation was run to determine the exact percentage of positive, negative, and neutral tweets. As previously mentioned, in order to extract a larger sample size, this process was repeated 2 more times during December.

The results during all 3 analyses indicated around a 40%- 42% positive sentiment, 41%- 48% neutral sentiment, and 12%- 16% negative sentiment.

Conclusion

Although the sample size was relatively small, it seems as though Twitter user’s perception of the COVID vaccine is positive and neutral. With the beginning of 2021 a few days away, I remain hopeful that these vaccines will be the solution to end the COVID19 pandemic.

I greatly appreciate all viewership of this article. If you enjoyed it and would like to connect, my LinkedIn is found here.

Full Python code can be found on my GitHub here.


Related Articles