Millions of tweets are posted every second. It helps us know how the public is responding to a particular event. To get the sentiments of tweets, We can use the Naive Bayes classification algorithm, which is simply the application of Bayes rule.
Bayes Rule
Bayes rule is merely describing the Probability of an event on prior knowledge of the occurrence of another event related to it.

Then the probability of occurrence of event A given that event B has already occurred is

And for the probability of occurrence of event B given that event A has already occurred is

Using both these equations, we can rewrite them collectively as

Let’s take a look at tweets and how we are going to extract features from them
We will be having two corpora of tweets, positive and negative tweets.
Positive tweets: ‘I am happy because I am learning NLP,’ ‘I am happy, not sad.’
Negative tweets: ‘I am sad, I am not learning NLP,’ ‘I am sad, not happy.’
Preprocessing
We need to preprocess our data so that we can save a lot of memory and reduce the computational process.
- Lowercase: We will convert all the text to lower case. so, that the words like Learning and leaning can be taken as same words
- Removing punctuations, URLs, names: We will remove the punctuations URLs and names or hashtags because they don’t contribute to sentiment analysis of a tweet.
- Removing stopwords: The stopwords like ‘the’, ‘is’ don’t contribute in sentiment. Therefore these words have to be removed.
- Stemming: The words like ‘took’, ‘taking’ are treated as the same words and are converted to there base words, here it is ‘take’. This saves a lot of memory and time.
Probabilistic approach:
In order to get the probability stats for the words, we will be creating a dictionary of these words and counting the occurrence of each word in positive and negative tweets.

Let’s see how these word counts are helpful in finding the probability of the word for both classes. Here the word ‘i’ occurred three times, and the total unique words in the positive corpus are 13. Therefore, the probability of occurrence of the word ‘i’ given that the tweet is positive will be


Doing this for all our words in our vocabulary, we will get a table like this:

In the Naive Bayes, We will find how each word is contributing to the sentiment, which can be calculated by the ratio of the probability of occurrence of the word for positive and negative class. Let’s take an example; We can see that the probability of occurrence of the word ‘sad’ is more for negative than positive class. So, we will find the ratio of these probabilities for every word by the formula:


This ratio is known as the likelihood, and its value lies between (0, ∞). The value tending to zero indicates that it has very low probability to occur in a positive tweet as compared to the probability to occur in a negative tweet and the ratio value tending to infinity shows that it has very low probability to occur in a negative tweet as compared to the probability to occur in a positive tweet. In other words, the high value of ratio implies positivity. Also, the ratio value 1 means that the name is neutral.
Laplace Smoothing
Some words might have occurred in any particular class only. The words which did not occur in the negative class will have probability 0 which makes the ratio undefined. So, we will use the Laplace smoothing technique to pursue this kind of situation. Let’s take on how equation changes on applying Laplace smoothing:

By adding ‘1’ in the numerator makes the probability non zero. This factor is called alpha-factor and is between (0,1]; specifically, when we set this alpha-factor to 1, the smoothing is termed as Laplace smoothing. Also, the sum of probabilities will remain at 1.
Here in our example, the number of unique words is eight gives us V= 8.
After Laplace smoothing the table of the probability will look like this:

Naive Bayes:
To estimate the sentiment of a tweet, we will take the product of the probability ratio of each word occurred in the tweet. Note, the words which are not present in our vocabulary will not contribute and will be taken as neutral. The equation for naive Bayes in our application will be like this:

Since the data can be imbalanced and can cause biased results for a particular class, we multiply the above equation with a prior factor, which is the ratio of the probability of positive tweets to the probability of negative tweets.

Since we are taking the product of all these ratios, we can end up with a number too large or too small to be stored on our device, so here comes the concept of log-likelihood. We take the log over our equation of Naive Bayes.

After taking the log of the likelihood equation, the scale will be changed as follows:

Let’s see an example. Tweet: ‘I am happy because I am learning.

Hence, the value of the overall log-likelihood of the tweet is greater than zero, which implies that the tweet is positive.
Drawbacks:
- Naive Bayes algorithm assumes that the words are independent of each other.
- Relative frequencies in the corpus: Some times the people blocks particular type of tweets which might be offensive, etc. which leads to an imbalance of data
- Word order: By changing the order of words, the sentiment might change, but with Naive Bayes, we can not encounter that.
- Removal of punctuations: Remember in data preprocessing, we removed punctuations, which might change the sentiment of the tweet. Here is an example: ‘My beloved grandmother 🙁 ‘
Conclusion:
The Naive Bayes is a straightforward and powerful algorithm, knowing the data one can preprocess the data accordingly. Naive Bayes algorithm is also used in many aspects of society like spam classification, Loan approval, etc.