The world’s leading publication for data science, AI, and ML professionals.

Using Reddit to explore the mental health effects of the COVID-19 era.

And using Tableau to tell a story with data

Photo by Andraz Lazic on Unsplash
Photo by Andraz Lazic on Unsplash

Preliminary research by the CDC has indicated that the Covid-19 pandemic has affected our health, both physically and mentally. Anxiety, depression, suicidal thoughts, and substance abuse disorders appear to be on the increase.

Because I have mined and analyzed reddit data before (to study emoji use, check it out here if you’re interested!), I thought it might be interesting to look at subreddits specific to mental health issues and see what topics people are discussing the most and to see if there is increased engagement with these communities now that more people are distanced from physical sources of support and information. I was also really interested in the lived experience of people with mental health challenges during COVID-19.

My goal in this project was to get more experience with using Tableau since we had started using it at Upptic. I was also really interested in the lived experience of people with Mental Health challenges during COVID-19.

I think that a good dashboard is worth a thousand words, so I’d invite you to check out what I made here! In the meantime, here are some insights I learned from this exploration (and a bit about how to visualize those insights in Tableau).

/r/depression is a weird outlier

The visualization you see below didn’t always tell the truth, which is that in general people are posting more on this subset of 11 subreddits than they did in 2019.

All Subreddits: Slow, stable increase from year to year, but overall higher post volume in 2020.
All Subreddits: Slow, stable increase from year to year, but overall higher post volume in 2020.

Even this isn’t the complete truth. Depending on which subreddit you look at, you might get a completely different story. /r/SuicideWatch for example has an extremely strong positive trend for this year, much stronger than 2019.

Posts per week for /r/suicidewatch
Posts per week for /r/suicidewatch

And certainly, this is a very different, and more grim picture than the previous trendlines indicate! The issue here is that engagement with /r/depression has been flatlining in 2020 instead of increasing even compared to 2019.

Posts per week on /r/depression
Posts per week on /r/depression

There are a couple of insights we can derive here. First, is that user behavior on /r/depression and /r/SuicideWatch is very different. This is a call to action to segment data by subreddit, as you will notice that all visualizations in the dash all have the ability to do that. Next, is that we have an unbalanced dataset. It’s clear that /r/depression simply has more activity, and therefore it has an outsized effect on the result when we view everything in aggregate.

Count of posts in each subreddit
Count of posts in each subreddit

I learned how not to check for anonymous accounts

Using a "throwaway" account, in reddit parlance refers to the act of creating an anonymous account simply for the purposes of making a one-time post. So, one thing I thought might be interesting is to check for any usernames which contain the string ‘throwaway’. Since all accounts on reddit are pretty anonymous already (well, as anonymous as you want them to be), this turned out to be not especially useful.

Percent of 'throwaway' posts by subreddit.
Percent of ‘throwaway’ posts by subreddit.

What I would suggest in this kind of situation and what would need to be done for this kind of analysis is to go directly to each individual user(which involves a different API call) and see how many posts they have made. This information would especially be very interesting as you can see that certain subreddits tend to contain more throwaway users (Domestic Violence and Suicide Watch stand out in particular).

People aren’t really talking about COVID that much

The insight for this section is pretty light. When comparing discussion topics for 2019 and 2020, it seems people are talking about pretty much the same issues in 2019 and 2020. (though /r/HealthAnxiety is a notable outlier in multiple ways).

The red section is COVID-19 related posts
The red section is COVID-19 related posts

The interesting part here is how I was able to assign these topics. My first impulse as a data scientist is to build a language model and try to cluster it and initially I did try this, but the results were generally pretty bad and I wasn’t really looking at whether people were talking about the pandemic, which was one of the key questions this project was asking and seeking to answer. Instead, I took an approach that used keywords and the substring matching function in pandas to assign topics in a quick and dirty way.

covid = ['virus', 'COVID', 'pandemic', 'quarantine']
health = ['hospital', 'insurance', 'illness', 'medication']
events = ['death', 'dying', 'divorce', 'breakup', 'moving']
family = ['sibling', 'brother', 'sister', 'mom', 'dad', 'parent', 'daughter', 'son', 'spouse', 'husband', 'wife']
social_life = ['friend', 'concert', 'parties', 'bar', 'restaurant', 'date', 'boyfriend', 'girlfriend']
career = ['work', 'fired', 'interview', 'co-worker', 'manager', 'boss', 'career']
money = ['bills', 'debt', 'money', 'broke', 'poor']
def generate_topic_col(keyword_list, dataframe, posts_column, new_column_name = 'topic_present'):

    #Takes a list of keywords, and the dataframe where post data is and writes a new column with 
    #boolean indicating if the keyword is present or not

    regstr = '|'.join(keyword_list)
    df[new_column_name] = df[posts_column].str.contains(regstr,flags=re.IGNORECASE, regex=True)
    print('New column {} written to dataframe'.format(new_column_name))

This approach in my opinion ended up fitting the situation better and gave me some pretty good results. I am curious and very interested in hearing what others have done in terms of quick and dirty text analytics!

In general, people are understandably less positive in 2020

One of the theories I had upon learning that the /r/depression had reduced user activity was that perhaps people were actually in better moods. This was something I at least remember being told often during the initial phases of quarantine. People had more free time, less social obligations, perhaps less stress if they are able to work from home. I was doubtful this was the case, but the only way to check is to look at sentiment analysis. I used NLTK’s VADER for this, but with a twist. Since most posts in these subreddits tend to be from individuals who are at a minimum struggling and worst case in a crisis situation the overall sentiment tends to be very negative. Therefore, I elected to look at the overall positive sentiment expressed instead of compound and see if there was a significant difference between 2019 and 2020.

Bar graph of Mean Postive Sentiment between 2019 (blue) and 2020 (orange)
Bar graph of Mean Postive Sentiment between 2019 (blue) and 2020 (orange)

I also found that people generally aren’t using more positive language to describe their experiences with mental illness. However, /r/DomesticViolence is a notable exception. I wanted to look at the posts in question and see if they generally were more positive.

Positive sentiment for 2019's top posts (top) and 2020's top posts(bottom) in r/DomesticViolence
Positive sentiment for 2019’s top posts (top) and 2020’s top posts(bottom) in r/DomesticViolence

Looking at the posts in 2019 and 2020 with the most upvotes in this particular subreddit, we can see that the most upvoted posts tended to be more positive in 2019. My guess is that looking at the median may be more appropriate for analysis here as some very positive posts in 2020 might be shouting over the posts that are less positive.

Please reach out for help if you need it

I do want to finish up this piece by saying that it’s completely understandable to be struggling with your mental health right now. Personally, my spirit hasn’t always been the best through this year, therefore it was very comforting to see that I wasn’t alone.

If you or someone you know are having suicidal thoughts, you can find a list of hotlines through the list maintained on /r/SuicideWatch:

hotlines – SuicideWatch

There’s also a bunch of other resources on the subreddit, so please do have a look! It’s more important than ever to prioritize your mental health in 2020, and we will only get through this by helping each other.


Related Articles