Getting Started
A View into Why Sentiment Analysis Needs to Understand How We Think Before it Can Tell us How we Feel
So you’ve been pouring hours and hours into developing hot marketing content or writing your next big article (kind of like this one) and want to convey a certain emotion to your audience. You want to know whether your content is going to resonate with your audience and draw a particular feeling whether that be joy, anger, sadness all to understand how different people react to your content.
Text analytics, more specifically Sentiment Analysis isn’t a new concept by any means, however it too has gone through several iterations of models that have gotten better over time. First we started with a bag of words approach to understand whether certain words would convey a certain emotion. We then moved to RNN/LSTMs that use far more sophisticated models to help us understand emotion though require significant training tho lack parallelization making it very slow and resource intensive. In 2017, researchers at google brought forward the concept of the transformer model (fig 1) which is a lot more efficient than its predecessors. First, the input embedding is multi-dimensional in the sense that it can process complete sentences and not a series of words one by one. Second, it has a powerful multi-headed attention mechanism that enables sentences to maintain context and relationships between words within a sentence. It performs this attention analysis for each word several times to ensure adequate sampling. Finally, it uses a feed forward neural network to normalize the results and provide a sentiment (or polarity) prediction. To learn more about the transformer architecture be sure to visit the huggingface website
Now that we understand the transformer model, let’s double click on the crux of this article and that is performing a sentiment analysis on a document and not necessarily a sentence. The input embeddings that are consumed by the transformer model are sentence embeddings and not total paragraphs or documents. For us to analyze a document we’ll need to break the sentence down into sentences. To do this, I use spacy and define a function to take some raw text and break it down into smaller sentences.
Take for example the sentence below. We would take this sentence and put it through a spacy model that would analyze the text and break it into grammatical sentences as a list. Here is a function to help us accomplish this task and the output
class Sentiment:
# Constructor with raw text passed to the init function
def __init__(self, raw_text):
self.raw_text=raw_text.lower()
def breakSentence(self, text_content):
self.text_content=text_content
nlp = English()
nlp.add_pipe(nlp.create_pipe('sentencizer'))
doc = nlp(self.text_content)
sentences = [sent.string.strip() for sent in doc.sents]
return sentences
Once you have a list of sentences, we would loop it through the transformer model to help us predict whether each sentence was positive or negative and with what score. You would end up with a result that provides something similar to below (fig 3)
Now once we have these sentences, one can assume that you just average out your positives and negatives and come with a final polarity score. There are a few challenges with this assumptions. First we assume each sentence holds the same weight, which isn’t always the case (more on that later) and second, we are including sentences that the model had a relatively low confidence in identifying as negative (60% negative, 40% positive). For my research I wanted to filter out any sentence that didn’t have at least a 90% score either as negative or positive. So here is some code I developed to do just that and the result. In this code I also define a before and after result which helps me understand how many sentences I started with and how many were filtered out. Finally it returns the appropriate sentences and a matrix with how each filtered sentence was categorized, 1 for positive and -1 for negative.
def findRawTextPolarity(self):
confidence_level=0.9
nlpSA = pipeline("sentiment-analysis")
sentences=self.breakSentence(self.raw_text)
print('Before: ', len(sentences))
result = [{'sentence' : sentences[i],'label':nlpSA(sentences[i])[0]['label']} for i in range(len(sentences)) if nlpSA(sentences[i])[0]['score']>confidence_level]
print('After: ', len(result))
sentences= [result[i]['sentence'].lower() for i in range(len(result))]
labels= [result[i]['label'] for i in range(len(result))]
map_polarity={'NEGATIVE': -1, 'POSITIVE': 1}
matrix_result=[map_polarity[k] for k in labels]
return sentences, matrix_result
Ok so to this point we should have a list of filtered sentences with at least 90% prediction either way and a matrix of polarities. Now comes the interesting part around reading psychology.
When readers read a document they tend to remember more of what they read towards the end of the document and less towards the beginning. Second, readers tend to remember the peak or climax of the document. What did the writer want the reader to remember? These statements are true if you consider the peak end rule. The peak end rule states "it is the theory that states the overall rating is determined by the peak intensity of the experience and end of the experience. It does not care about the averages throughout the experience"
So understanding what peak end rule means and linking that to our use case, it’s true that when we give the model a large corpus of text, we endeavor to understand the peak of the article and give it slightly more weight as well as identify a mechanism to provide more weight to sentences that come later in the document. How do we do this?
To identify the peak of the article, my hypothesis is that we would need to understand how a machine would classify the climax and one such way is to use text summarization. Text summarization extract the key concepts from a document to help pull out the key points as that is what will provide the best understanding as to what the author wants you to remember. Second, we need to define a decay factor such that as you move further down the document each preceding sentence loses some weight. Ok so let’s define the function to do each of these tasks.
First let’s take a corpus of text and use the transformer pre-trained model to perform text summary. This function returns to the peak sentences.
def findPeak(self):
summarizer = pipeline("summarization")
peak = (summarizer(self.raw_text))[0]['summary_text']
peak_sentences=self.breakSentence(peak)
return peak_sentences
Next we’re going to find the position of these peak sentences in the article list of sentences defined earlier in this article. If a sentence is part of the peak we will retain a value of 1 but if it’s not a peak sentence we’ll drop it down. I’ve used 0.9 but you can test something that works for your use case. The following function can accomplish this task.
def getPeakposition(self):
peak_weight_red=0.9
peak=self.findPeak()
sentences = self.findRawTextPolarity()[0]
matches=[[1 if operator.contains(s.replace(' .', ''),p.replace(' .', '')) else 0 for s in sentences] for p in peak]
match_filter=[m for m in matches if sum(m)>0]
sum_matrix=np.sum(np.array(match_filter),0)
map_polarity={1: 1, 0: 1* peak_weight_red}
matrix_result=[map_polarity[k] for k in sum_matrix]
return matrix_result
Ok now we need to create a mechanism to introduce a decay factor that will remove some degree of weight as a sentence gets older to the human brain within an article. I’ve created a function that will take it down using a linear decay factor but i’ve also used exponential decay that works well.
def textWeights(self):
decay=0.01
matrix=self.findRawTextPolarity()
matrix_size=len(matrix[1])
decay_matrix=[1-(decay*i) for i in range(matrix_size)]
return decay_matrix
Alright we should now have three matrices
- Provides a decay weight factor
- Provides weight to peak sentences
- Polarity of filtered sentences
Now it gets easy. We multiply the three together which will give us a weighted result for each sentence in the document. Now that these are weighted we can take the weighted average for a final score for the entire document. I’ve gone ahead and defined my own categorization scale but you can define whatever makes sense for your own use case.
To get the final score here is the code I developed followed by the result I received.
def getFinalScore(self):
peakposition=self.getPeakposition()
decay=self.textWeightsexp()
sent_polarity=self.findRawTextPolarity()[1]
fin_score = [a*b*c for a,b,c in zip(peakposition,decay, sent_polarity)]
fin_sent_fct = lambda x: 'POSITIVE' if x>0.5 else ('NEUTRAL' if 0.49>x>-0.5 else 'NEGATIVE')
fin_sent=fin_sent_fct(np.mean(fin_score))
print('This document is categorized as {} with a final score of {}'.format(fin_sent, np.mean(fin_score)))
So is this the end? No. Sentiment analysis is actually a very tricky subject that needs proper consideration. First, sentiment can be subjective and interpretation depends on different people. For example, I may enjoy the peak of a particular article while someone else may view a different sentence as the peak and therefore introduce a lot of subjectivity. Second, we leveraged a pre-trained model but the model should be trained with your own data and particular use case. There are various models you can leverage, a popular one being BERT, but you can use several others again depending on your use case. Sentiment analysis again is a great way for you to analyze text if done right and can unlock a plethora of insights to help you better make data drive decisions.
To see a video example of this please visit the following the link on youtube
https://youtu.be/sZOV5pD4ELg
Sources:
- The Transformer architecture as present in the Attention is all you need paper by Google