NYT Sentiment Analysis with TensorFlow

Anne Bode
Towards Data Science
5 min readJan 9, 2022

--

Photo by Jon Tyson on Unsplash

Now that we are one year into the Biden Administration, I started to wonder how positive news coverage was during his first year in office vs. previous presidents’ first years. To find the answer, I decided to perform sentiment analysis on NYT article abstracts for each month of the past four presidents’ first years in office. Below, I discuss/display code for the following steps:

  1. Using the NYT API pynytimes
  2. Sentiment Analysis with TextBlob
  3. Training my own model for sentiment analysis with TensorFlow
  4. Comparing the models
  5. Visualizing the results

For the full code, download the Jupyter Notebook here.

Step 1. Using the NYT API pynytimes

We’ll import the required packages, connect to the API, create a dictionary to save our results, pull the relevant data, save our dictionary to a JSON file so we don’t have to pull the data again, and close our connection to the API.

Step 2: Sentiment Analysis with TextBlob

Once again, we’ll import the necessary packages. We’ll then test the model out on some randomly selected abstracts, to sanity check it.

When I ran this spot-check, I noticed TextBlob was pretty inaccurate. For example, the following abstract was labeled “positive”: A fire at a high-end Bangkok nightclub killed at least 59 people and injured more than 200 shortly after midnight as revelers were celebrating the new year, the police said.

Step 3. Training my own model for sentiment analysis with TensorFlow

Because TextBlob seemed to be doing a not-so-great job, I decided to practice my ML skills and build a sentiment analysis model using TensorFlow (this tutorial was very helpful). First, we’ll import the required packages and load the dataset we’ll be using for training/testing. I trained my model on a dataset of 1.6MM labeled tweets (labeled positive or negative). Then, we’ll randomly split the data for train/test with an 80/20 split and reformat the tweets and their labels as numpy arrays so we can load them as inputs when we train our model.

Next, we’ll create a Sequential model with keras. The first layer of our model will take sentences as inputs and convert them into vectors of numerical values (this is called “word embedding”). Fortunately, someone has created a model that does this, which can be downloaded from tensorflow-hub. The model I used for this layer can be found here. Then we’ll add two hidden layers and an output layer. Our hidden layers have 16 and 8 nodes, respectively, and both use the ReLU activation function. Our output layer has 1 node because this is a binary classification problem, and we use the sigmoid activation function. Finally, we’ll compile our model using the Adam optimizer, calculate loss using BinaryCrossentropy, and calculate accuracy using BinaryAccuracy with a 0.5 threshold (if our model predicts the likelihood that the sentence is positive is ≥0.5, we will classify the sentence as positive).

Next, we’ll set aside some of our training data to be used for validation during the training process. We’ll then train the model, evaluate the results, and visualize how well our model performs on test data with a confusion matrix.

So, with our test dataset we are 79% accurate. From our confusion matrix, we can see that most of our mistakes occur when the tweet is positive but we are predicting that it is negative. So, we have a bit of a negative bias to our model. However, let’s see if this 79% appears to be better than TextBlob can do.

Step 4. Comparing the Models

If we use TextBlob to classify the same test dataset, we achieve an accuracy of only 62%. Note: TextBlob predicts “neutral” sentiment as well as positive and negative. Therefore, this isn’t a direct comparison, but it is helpful nonetheless. In our test below, we randomly reclassify TextBlob’s “neutral” predictions as either “positive” or “negative.” 79% accuracy with TensorFlow is significantly better than 62% accuracy with TextBlob.

A direct comparison of the two models’ performance on a sample of abstracts can be found below. Luckily, with our TensorFlow model we can now accurately classify the top headline about fire, death, and injury as “negative.” Generally, it does seem to be more accurate in classifying our abstracts, although still imperfect.

Step 5. Visualizing the results

First, we’ll use our model to predict sentiment for all of the abstracts we pulled in Step 1. We’ll then calculate the percentage of positive/negative sentiment for each month and add that to our dictionary.

Then, we’ll reformat our data into dataframes containing only the key stats we want to visualize. We’ll then create a few charts to better understand the results! (Download the notebook to see how I created these charts with seaborn)

Conclusion

This project was a really helpful way for me to get more familiar with building models in TensorFlow. TextBlob just didn’t cut it! The model I built in TensorFlow was significantly more accurate, although it clearly has a bit of a negative bias (as we learned from the confusion matrix).

The results I got using my TensorFlow model are pretty interesting. Basically, the news was most negative during Bush’s first year in office. This is true in terms of ALL news (only 25% positive) and news abstracts that directly mention “Bush” (only 28% positive). On the other end of the spectrum, the news was generally most positive during Trump’s first year (34% positive), and direct news coverage was most positive for Obama (63% positive). Interestingly, abstracts with direct mentions of Biden are more negative (57%) than abstracts with direct mentions of Trump (52%). This is a bit surprising, since the NYT was pretty publicly opposed to Trump.

I intend to perform additional analyses with the data I pulled and classified, including looking at most commonly used words, to get a better understanding of these unexpected results. For instance, the fact that general news coverage was most positive during Trump’s first year might be due to the fact that we had fewer crises in 2017 than in 2001 (dot com bubble burst, 9/11 attacks), 2009 (Great Recession), and 2021 (lingering COVID-19 pandemic). I am looking forward to exploring this data some more!

--

--