B.C. Politics: Sentiment Analysis Predicts Horgan Win

Serena McDonnell
Towards Data Science
5 min readMay 8, 2017

--

The candidates: The NDP’s John Horgan, the Liberal’s Christy Clark and the Green Party’s Andrew Weaver. Adapted from the Vancouver Courier. Photo by Dan Toulgoet.

Sentiment analysis of Reddit posts suggests that NDP leader John Horgan will win the B.C. election this Tuesday, May 9th. The study suggests that while Reddit users in the r/Vancouver and r/BritishColumbia subreddits post more about Christy Clark and the B.C. Liberal Party, they hold a significantly more positive sentiment towards Horgan and the B.C. New Democratic Party.

What is sentiment analysis?

Sentiment analysis, also known as opinion mining, aims to identify the feeling and attitude of a speaker or writer in a given text. The technique can determine whether a text is positive, negative, or neutral based on the words used by the writer. Words with positive sentiment incline the text to a positive score; negative words, negatives. A sentiment analysis program can therefore quickly and efficiently scan vast numbers of comments, posts, or pages and classify them as expressing ranges of sentiment towards a topic.

Sentiment analysis stands to grow in importance in the coming years for two reasons; first, traditional polling methods are rapidly losing credibility and effectiveness as land lines are abandoned, and ever-fewer people answer pollsters’ phone calls. And second, the exponential rise in conversations on the internet on sites like Twitter, Facebook, Instagram, and Reddit means that literally millions of individual opinions are suddenly available for review by these techniques.

In the recent American election, social media analysis was better able to predict Trump’s win than the polls were.

The field is in its infancy, but is likely to grow rapidly and become a standard component of public-opinion research in a very near future, perhaps even supplanting polling itself as the standard means of evaluating political contests.

The Opinion Of Reddit

Sentiment analysis of the most recent one thousand posts from the two most politically active British Columbia related subreddts, r/Vancouver and r/BritishColumbia, suggests that Horgan will come out victorious this Tuesday.

Sentiment towards the B.C. Liberal Party. Net Positive Sentiment: 2.7%
Sentiment towards the B.C. NDP. Net Positive Sentiment: 13%

The charts show the favourability of the parties, according to Reddit. According to the study’s results, the Net Positive Sentiment for the NDP is 13%, and the net positive sentiment for the Liberal Party is 2.7%.

As any observer of B.C. politics would attest, this sounds about right. You’ll find a pretty close match in any given dinner party between advocates and opponents of either party. There’s a bit more of a gap when it comes to the leaders:

Sentiment towards Christy Clark. Net Positive Sentiment: 6%
Sentiment towards John Horgan. Net Positive Sentiment: 17%

According to this test, Horgan leads Christy personally in the Net Positive Sentiment race, 17% to 6%. This is something that political pundits can sink their teeth into: Horgan has a clear lead, and we can declare him the prospective winner.

But that would of course be somewhat premature.

A Few Caveats

Sentiment analysis is able to identify positive posts related to the candidates, such as “Say what you will about Christy Clark, she’s still a brilliant campaigner,” and “Yes, BC NDP Leader John Horgan is angry — but in a good way.” Negative posts are also well identified, such as “Will money and arrogance cost Christy Clark the B.C. election?” and “BC NDP Leader Horgan unclear on some issues, unspecific on funding.” But few posts are so clear. A great many posts are ambiguous, resulting in the large numbers of posts that have to be labelled “neutral”.

Further, a question arises: can we trust the sentiment of users on Reddit? The sample size is small, and people who post on Reddit are not necessarily representative of the overall population. Are their feelings towards the candidates an accurate representation of the overall population? Do Reddit users even vote?

In addition, the study does not include non-English-language sentiment- quite a significant absence, given the make-up of certain ridings.

Even more important, the relationship between sentiment and voting intentions is not clear. It’s quite possible that a voter could be sympathetic to a party, and still not vote for them. Those net positives for Horgan, that show him far ahead of Clark? Well, Weaver’s B.C. Green Party has a net positive too. And that is 100%.

Sentiment towards the B.C. Green Party. Net Positive Sentiment: 100%
Sentiment towards Andrew Weaver. Net Positive Sentiment: 37.5%

While it’s pretty obvious the Green Party is not going to win the election outright- they seem to have Reddit’s vote! Those are some stellar net positives.

Pointing The Way

The crucial questions, then, are sample size, sample composition, and the relationship of sentiment to voting intention. The first two are likely to be less of a problem in future as the numbers of items available for analysis increase exponentially, The relationship of sentiment to intention will always be something of a black box. It’s possible that particular gap could be filled by polling on a small scale to determine the exact relationship in each individual case.

For the time being, all that is safe to say is that according to Reddit, we will have a new government in this province by this time next week.

To read more about sentiment analysis and the Python package that I used, called VADER, check out my github.

--

--

Always curious about math. Senior Data Scientist @ Delphia - views are my own. Check out my personal website: serena.mcdonnell.ca.