How frustrated are Premier League fans with VAR? A look at their reactions using Sentiment Analysis in R

What is VAR?

Published in

Towards Data Science

7 min readNov 20, 2019

Video Assistant Referees (VAR) has been introduced by FIFA to support the decision-making process of referees. There were many contentious incidents in the past which led to the introduction of VAR. One such was when England were trailing 2–1 to Germany in the knockout round of the 2010 World Cup when Frank Lampard’s effort crashed off the underside of the bar and bounced a yard or so over the line. But the referee didn’t see it. Germany went on to win the game 4–1, and England were knocked out. Another is the infamous Thierry Henry’s 2009 handball against Republic of Ireland. VAR has been set in a way that it will be used only in game changing situations like goals, penalty decisions and red cards. In that way, they are supposed to have minimum interference on the game.

Why is VAR in the news for wrong reasons?

VAR has been introduced in the premier league from this season (2019–20). Since the start of season, VAR has bought more controversies than solutions to the premier league. Two incidents (handball by Trent Alexander-Arnold and offside decision for Mo Salah goal) during the Liverpool vs Manchester City game on November 10th has been the talk of the town in the premier league recently. There are similar incidents this season which made fan question the use of VAR in the premier league.

Using Sentiment Analysis on premier league fans tweets about VAR, I tried to understand:

1. Whether the premier league fans have positive or negative sentiments to VAR use in general?

2. What kinds of emotions they exhibit in their tweets about VAR?

3. What are the common words used by the premier league fans while tweeting about VAR?

4. Is there any difference in fans reactions to VAR based on the club they support?

Why use Twitter for Sentiment Analysis?

Social media has transformed from being a platform where people talk to one another and became a medium in which people share views, express dissatisfaction and praise or criticize institution and public figures. Among social media platforms, Twitter is the primary platform which people use to review or complain about products and events and talk about personalities. It is easy to get the pulse of a topic in Twitter as users stay focused on their key message because of the size restriction, unlike other platforms where people write long stories.

How to do Sentiment Analysis in R?

There are many packages available for analysis of Twitter sentiment in R. Step-by-step process for one of the widely used package is given below:

Create Twitter App

Twitter has developed an API which can be used to analyze tweets posted by users. This API helps us extract data in structured format which can easily be analyzed. The process to create a twitter app is given here. The following four keys are generated when you create a twitter app which is required to extract tweets during analysis:

· Consumer key (API key)

· Consumer secret (API Secret)

· Access Token

· Access Token Secret

Install packages and extract tweet

Install the packages required for sentiment analysis.

# Install packages
install.packages("twitteR")
install.packages("RCurl")
install.packages("httr")
install.packages("syuzhet")# Load the required Packages
library(twitteR)
library(RCurl)
library(httr)
library(tm)
library(wordcloud)
library(syuzhet)

TwitteR package provides access to the Twitter API. RCurl and httr packages provides functions that allow one to compose HTTP requests and process the results returned by the Web server. Syuzhet package is used for extraction of sentiment and sentiment-based plot arcs from text. The nrc lexicon under syuzhet package let users to calculate the presence of eight different emotions and their corresponding valence in a text file apart from the two sentiments.

Since free twitter app allows users to extract only tweets of the last 7 days, I analyzed premier league fans reaction to VAR for the matches of Gameweek 12 played during 9th and 10th November 2019. The following are the results of Gameweek 12:

Next step after installing the necessary packages is to set the Twitter API, invoke twitter app and extract data using keywords. The keywords used for extracting tweets are “VAR” and “epl”.

# authorisation keys
consumer_key = "XXXXXXXXXXXXXXXX" 
consumer_secret = "XXXXXXXXXXXXXXX" 
access_token = "XXXXXXXXXXXXXXX" 
access_secret ="XXXXXXXXXXXXXXX" 

# set up
setup_twitter_oauth(consumer_key,consumer_secret,access_token, access_secret)# search for tweets in english language
tweetVAR = searchTwitter("VAR epl", n = 10000, lang = "en")# store the tweets into a dataframe
tweetsVAR.df = twListToDF(tweetVAR)

Data cleaning
Gsub function removes unwanted content like hashtags, digits, stopwords and URLs from the tweets so that the tweets are ready for analysis.

#cleaning tweets
tweetsVAR.df$text=gsub("&amp", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("&amp", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("@\\w+", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("[[:punct:]]", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("[[:digit:]]", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("http\\w+", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("[ \t]{2,}", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("^\\s+|\\s+$", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("penalty", "", tweetsVAR.df$text)
tweetsVAR.df$text = gsub("football", "", tweetsVAR.df$text)tweetsVAR.df$text <- iconv(tweetsVAR.df$text, "UTF-8", "ASCII", sub="")

Analyzing sentiments and visualization

Syuzhet scores each tweet based on 2 sentiments and 8 emotions. In the next step, I used a bar graph to visualize what type of emotions are dominant in the tweets.

# Emotions for each tweet using NRC dictionary
emotions <- get_nrc_sentiment(tweetsVAR.df$text)
emo_bar = colSums(emotions)
emo_sum = data.frame(count=emo_bar, emotion=names(emo_bar))
emo_sum$emotion = factor(emo_sum$emotion, 
                         levels=emo_sum$emotion[order(emo_sum$count, decreasing = TRUE)])# Visualize the emotions from NRC sentiments
var <- ggplot(emo_sum, aes(x=emotion, y= count, fill = emotion))+
  geom_bar (stat = "identity")+
  ggtitle("Sentiments and Emotions about VAR in EPL")var

Findings from sentiment analysis

What are the sentiments about VAR and what kind of emotion are exhibited?

Premier league fans have an overall negative sentiment about how VAR is being used as the number of negative sentiment counts is higher than the positive sentiment counts. In terms of emotions expressed, negative emotions like sadness, anger and fear dominate over emotions like joy and surprise. It might be the case that the negative emotions are particularly high this Gameweek as there were many controversial decisions in few of the matches particularly Liverpool vs Manchester City and Tottenham Hotspur vs Sheffield United.

What are the common words used by the premier league fans while tweeting about VAR?

I used wordcloud package to understand what words contributed for different type of emotions. The comparison.cloud function enables comparison of frequency in which different words are used across different categories. In this case, I compared the frequency of words under different type of emotions.

# Create comparison word cloud visualizationwordcloud_tweet = c(
  paste(tweetsVAR.df$text[emotions$anger > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$anticipation > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$disgust > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$fear > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$joy > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$sadness > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$surprise > 0], collapse=" "),
  paste(tweetsVAR.df$text[emotions$trust > 0], collapse=" ")
)# create corpus
corpus = Corpus(VectorSource(wordcloud_tweet))# remove punctuation, convert every word in lower case and remove stop words
corpus = tm_map(corpus, tolower)
corpus = tm_map(corpus, removePunctuation)
corpus = tm_map(corpus, removeWords, c(stopwords("english")))
corpus = tm_map(corpus, stemDocument)# create document term matrix
tdm = TermDocumentMatrix(corpus)# convert as matrix
tdm = as.matrix(tdm)
tdmnew <- tdm[nchar(rownames(tdm)) < 11,]# column name binding
colnames(tdm) = c('anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust')
colnames(tdmnew) <- colnames(tdm)
par(mar = rep(0, 4)) 
comparison.cloud(tdmnew, random.order=FALSE,
                 colors = c("#00B2FF", "red", "#FF0099", "#6600CC", "green", "orange", "blue", "brown"),
                 title.size=1, max.words=250, scale=c(2.5, 0.4),rot.per=0.4)

When I ran the code initially, I found that words like ‘penalty’ is classified under fear category of emotion and ‘football’ is classified under joy category. For sentiment analysis of general text, this makes sense as penalty means paying fine and football means playing a sport which can be categorized as joy. But for this specific analysis, these classifications don’t make sense and hence these words are removed from the tweets using the gsub function.

Is there any difference in fans reactions to VAR based on the club they support?

I hypothesized that premier league fans reactions to the use of VAR depends upon their club’s performance in that particular Gameweek and whether use of VAR resulted in positive or negative outcome for their team. Hence, I created separate data frames for four clubs (Liverpool, Manchester United, Arsenal and Tottenham Hotspur) for extracting tweets using ‘VAR’ and respective club’s official twitter id as keywords. Then I ran the sentiment analysis for each of the four clubs.

As hypothesized, the overall sentiment about VAR was positive among Liverpool and Manchester United fans whose team have won the match in this particular Gameweek while the overall sentiment was negative among Arsenal and Tottenham fans who lost their respective matches.

Before being introduced in premier league this season, VAR has been used in the FIFA World Cup 2018 and European club competitions. It proved crucial in many occasions helping referees make correct decisions. In the Champions League semi-final in April 2019, VAR ruled out a last-minute goal from Raheem Sterling that would have put Manchester City through. The goal was disallowed for offside and Tottenham progressed instead. The premier league referees have not once used the pitchside monitors for assistance which is surprising. The recent shareholders meeting where the premier league and PGMOL committed to improvements in implementation of VAR and regular use of pitchside monitors by referees is expected to reduce the controversies around VAR and change the sentiments around it.