
In my module "Developing Meaningful Indicators" , a classmate (Nicole) used Azure Machine Learning on Excel to conduct sentiment analysis on over 4000 Pfizer-related tweets in order to determine how most people felt about the vaccine.
She discovered that the sentiments of tweets were quite evenly spread out, with "negative" having a slight gain over "positive". This means that the opinion amongst Twitter users are relatively mixed, having a healthy mix of both positive and negative opinions about the Pfizer vaccine. This can be seen in the graph that she posted below.

Coincidentally, I had previously also used AWS Comprehend to do sentiment analysis on a bunch of Reddit comments. In fact, I documented that process here. That actually got me asking, "Which of the AI models is more accurate?" As a computing major, I had some faith in the world of technology, and my initial hypothesis was that there will be some general trend where the two will agree with each other. In other words, in general, tweets that are lower on the sentiment scale for one AI would also be relatively low for the other, and vice versa.
The Technical Bits
My first step to answering this question was to run my own sentiment analysis using Aws Comprehend. For those who has not seen my previous post, I had used an existing AWS Educate account where I had free credits to use the services offered by AWS. The first function written was copied from my previous project, which connected to the AWS server, and send a piece of text for sentiment analysis. The response is then processed and saved. This function was actually guided by the AWS documentation here.
## script for sentiment analysis
def get_sentiment(text):
comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')
response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
response = dict(response)
result = {}
result["sentiment"] = response["Sentiment"]
result["positive"] = response["SentimentScore"]["Positive"]
result["negative"] = response["SentimentScore"]["Negative"]
result["neutral"] = response["SentimentScore"]["Neutral"]
result["mixed"] = response["SentimentScore"]["Mixed"]
result["sentiment_score"] = (result["positive"] - result["negative"]) / 2
return result
Secondly, code was written to process the data that was in the CSV file, and write the results into another CSV file.
## get sentiment results
try:
result = get_sentiment(body)
except botocore.exceptions.ClientError as e:
print(e)
result = 'no-go'
while (result != "ok" and result != 'skip'):
print("type 'ok' or 'skip' to continue")
result = input()
if (result == 'skip'):
skipped += 1
print("error occurred with sentiment analysis, skipping row")
continue
result = get_sentiment(body)
except:
skipped += 1
print("error occurred with sentiment analysis, skipping row")
continue
## write to csv
row = ['0','1','2','3','4','5','6',result["sentiment"], result["positive"], result["negative"], result["neutral"], result["mixed"], result["sentiment_score"]]
print(row)
writer.writerow(row)
print("scanned and accessed data index", count)
Along this process, a huge stumbling block I had faced was the presence of "unclean data". The CSV data set, I believed, was directly scraped off Twitter. However, that meant that there were an abundance of trash characters, and those that neither AWS nor Python could properly process. As seen from the screenshot below, there were plenty of trash characters that affected the Python script.

After some searching, I learnt how to check for legal characters from this link, and implemented a function to help clean the data before sending them for processing.

Finally, the code worked and successfully analysed all the tweets. I copied and pasted the results into one consolidated CSV file and proceeded to analyse the results.
The Visualisation Bits
In total, I created 2 graphs which I believe were meaningful. For the first, I recreated the graph that Nicole had made, but now based on the results from AWS Comprehend. Below are the two graphs side by side.

There are a lot of differences to pick out here. The first is that AWS produced a mixed sentiment whilst Azure did not. The difference between mixed and neutral that AWS produced is that a neutral sentiment means that there is no positive and negative words in the comment, but a mixed sentiment means that the comment has a mix of positive and negative comments. This is a purely algorithmic issue in the sense that this is how the AI are designed. While it may produce some insight into how well the algorithms were created, I don’t believe that it is worth considering since the "mixed" score is relatively minimal.
The second, and more obvious one, is the significantly high neutral sentiment in AWS compared to Azure. This is also due to the algorithms in question. My theory is that the Excel sentiment analysis is merely a plugin, and accuracy may not be as high as AWS. AWS Comprehend runs on Amazon servers which has the capacity to handle and manage a massively larger and more complex machine learning algorithm.
This can also be seen when comparing several comments. For example, from the picture below, there are a few comments which I believe a more accurately scored by AWS than Azure. The comment in row 11, being just a hashtag, was marked as negative by Azure (cell F11), but was (I believe) better categorised as neutral by AWS with a 99% confidence (cell K11).

The second graph that I created attempted to prove/disprove my earlier hypothesis that there is some sort of agreement between the two AIs. I first calculated the sentiment scores by putting it against a scale of -1 to 1, where -1 scores indicate negative sentiment, 0 scores indicate neutral sentiment, and 1 scores indicate positive sentiment. I then plotted the AWS sentiment score against the Azure sentiment score on a scatter graph, and hoped to observe some sort of trend. The graph is shown below.

As seen, there is absolutely no trend whatsoever between the two, which is surprising to me because this totally went against my hypothesis. In fact, the huge dispersion of the plots indicate that both AIs disagree on most of the tweets.
Reflecting on this, this might be due to natural language processing still being a very new in the tech world, and even the best sentiment analysis machine learning algorithms are unable to provide 95% accuracy. Ultimately, this behaviour might be expected and not too unusual after all.
Conclusion
Thinking a bit more about how this is meaningful, I believe that such information is interesting and unique for AI enthusiasts in comparing the technologies out and there and looking toward improving such technologies. The first part to this findings is the acknowledgement and realisation that we are still very far behind perfect in the AI game. Not saying that this field is easy, but most laymen will believe that sentiment analysis is indeed one of the most accurate algorithms out there. This simple data graphing shows that we still have a long way to go.
However, I am hoping that this will help to inspire more people (and myself) to realise that there is so much room for innovation and improvement in AI. We must look to the future and improve this technology so that it can achieve its fullest potential. In a modern world where technology is so rapidly improving, I suspect it’s only a matter of time before another graph like this will look extremely different from what I have now.