😭The Saddest Day on Twitter: Sentiment Analysis & Engagement Trends in Company’s Tweets

Data analysis & visualisation of weekly trends in engagement metrics and sentiment values in tweets from 6 months.

Marta
Towards Data Science

--

Photo by Marcos Paulo Prado on Unsplash

If a tweet is sent and no one is around to check analytics, does it make an impact? Ok, this is a rather loose paraphrase of the question about a tree falling in the woods. 🌳 But, you know what I’m getting at.

I’ve been tweeting as JAM (Twitter makingjam) for more than a year. The time has come to see in detail what insights could be drawn from this exercise in copywriting and communication. Apart from mentions, hashtags, emoji, and text, I was interested to see how our tweets perform depending on the day of the week.

Because why bother tweeting on Sundays if everyone is on a phone-free brunch?🤷‍♀️

Time for datetime

Here is how the first few rows of the data frame look like before we started.

40 columns of various degrees of usefulness…

We had 529 rows and 40 columns, but in an initial stage of the analysis we removed some of them (you can see the full data analysis in this notebook).

The first step is to handle our favourite variable which is datetime! Anyone else still confuses strftime with strptime? 🙋‍♀️ It’s like confusing the left and the right side, maybe we’ll never cure ourselves of it.

We’ll split the column into date and hour and extract the name of the day of the week.

What this allows us to do is to group the data frame by day and extract means for each day of the week.

Notice how I’ve so cleverly grouped by day_name (the human friendly name), but sorted by day which is a numerical representation. If you sorted by day_name you’d get days sorted alphabetically, and that’s not helpful. ☝️

Output:

This is enough progress for us to plot the values. We can create a nice function that will let us plot chosen variables.💁‍♀️

Of course, we have to be a little careful which ones we plot on the same chart. Different values have different scales, compare for example impressions with engagement rate, and with a stretched out y-axis the chart might end up not very informative.

Let’s plot something, say mean engagements, retweets, and likes.

plot_means_by_weekday(‘engagements’, ‘retweets’, variable_3=’likes’)

Output:

What do we have here? A big weekend dip! 🤔

To better see the trends, we can normalize these values.

Now, let’s plot again.

plot_means_by_weekday(week_mean_norm_df, 'engagements', 'retweets', variable_3='likes')

Output:

Still a dip. But, now it’s clear that all values follow the same trend.

The big dip might be partially caused by very little data from the weekend, since we simply don’t tweet a the weekend, with an assumption people aren’t on Twitter so much.

🎺 Impressive Mondays?

Let’s get some insights about average daily impressions. We can easily get the mean values of impressions and the count of all tweets sent on specific weekdays by using aggregator function.

The `week_impressions` data frame looks like this:

Which we can plot.

Output:

If we just look at mean impressions, it seems like Monday, Tuesday, and Wednesday are the best days to tweet. But, what makes this chart unclear is that there are diffrent numbers of tweets sent each day and different mean values of impressions. What would make it clearer is to calculate and plot ratios of: number of tweets sent to number of impressions received per tweet.

Output:

Now what we clearly see that the ratio of tweets to impressions indeed is the highest in the middle of the week.

😠 Get sentimental & see what negativity looks like

One thing we can do with these tweets is also calculate text sentiment. To do this we can use VADER (Valence Aware Dictionary and sEntiment Reasoner) a lexicon and sentiment analysis tool specifically designed to analyse sentiments expressed on social media.

After downloading the vader lexicon we apply the Sentiment Intensity Analyser to the text column. It will return a dictionary with four values.

Note: You’ll notice that the sentiment columns were already present in one of the gists above. As it comes to the chronology of this data analysis project, this step 👆 (calculating sentiment,) was completed earlier.

Output:

Needless to say, the column will need some cleaning.🧹 First let’s see how many positive, negative, and neutral tweets we have overall.

Output:

{‘positive’: 380, ‘negative’: 69, ‘neutral’: 80}

69 negative tweets?! And I thought I was such a hippie-positive tweet fairy. We’ll examine them in a second.

First let’s create four separate columns, for each type of sentiment value represented as a number. This will be helpful for later calculations.

Now, we can filter for negative tweets. Let’s select the ones with the highest negativity scores to see what “negativity” looks like according to VADER.

The output is the text of two tweets with the highest negativity scores:

#1

👌 Be your product's worst critic, said @susanavlopes from @onfido at #JAMLondon 2019.

What's the latest criticism you've given to your #product? https://t.co/SxJbmrtQQu

#2

Connect with #product designers from Barcelona! 🇪🇸🎨

How?

Follow this list (Come on, who doesn't like a good list! ) 👉 https://t.co/gy2bZjyXmy

Anyone missing? Tweet at us so we can add them! #pmot #design #prodmgmt

My assumption is that it’s the presence of negation, negative superlatives, and words like “criticism” or “missing” that contribute to the sentiment analyser classifying the tweets as negative.

Which days are the sources of positive and negative tweets?

The top value of positive sentiment was observed on Saturday (2.108), and top value of negative sentiment on Tuesday (1.04).The lowest value of positive sentiment was observed on Tuesday (-1.217), and the lowest value of negative sentiment on Sunday (-1.976).

Let’s plot this for more clarity, using the function we wrote before:

Output:

#plot sentiment for week day
plot_means_by_weekday(week_mean_norm_df,'sentiment_positive', 'sentiment_negative', 'sentiment_compound')

Clearly, Tuesday needs some cheering up! 😐

It has the lowest values of positive sentiment and the highest values of negative sentiment. It might be that the same “negative” tweet from our evergreen library gets sent consistently on the same day.

That weekend is cheerful is probably a combination of: few tweets sent on the weekend overall, and if any, they tend to be lighter—less “brainy” and less transactional.

🤔 Correlations: Can we violate the laws of math?

Do any engagement metrics and variables correlate with each other? There is only one way to check. Calculate correlations.

As the standard interpretation of these values has it: 0.7 would be a strong positive correlation, 0.5 moderate, and 0.3 weak (add a minus and it would make it negative correlations).

We can plot the correlation values, although with this many values it might create less rather than more clarity. It would make a nice bathroom tile pattern though. 🤷‍♀️

#plot correlations
fig, ax = plt.subplots(figsize=(15,15))
_ = sns.heatmap(corr, annot = True, ax=ax)
plt.title("Correlation matrix of engagement metrics", fontsize=16)
plt.show()

Output:

The print results from the previous gist are easier to interpret:

You can read these results as:

  • The more impressions the more engagements, the more engagements the more profile clicks.
  • The more likes the more impressions, the more impressions the more likes, etc.

These results are rather not surprising. What would surprise me is if words count correlated negatively with character count, which would violate the laws of math.

🤠 Next steps!

I’ll be collecting more data from Twitter Analytics, and perhaps eventually the sample size would be big enough to get statistically significant results.

If I had more patience, I’d attempt creating larger, bootstrapped samples. Just because (I think?) I can, and because it increase sample size and help get more significant results without needing to wait for more data.

--

--

📈 Aspiring data scientist. Rationality fan. EA. Vegan. Working to improve global mental health at MindEase.io