The world’s leading publication for data science, AI, and ML professionals.

Analysis of Uganda’s Social Media Data Regarding the 2021 General Presidential Elections

Understanding Uganda's Presidential Nomination Week through the eyes of Social Media

Photo by Merakist on Unsplash
Photo by Merakist on Unsplash

The first week of November was an eventful week for Uganda as the country held its presidential nominations. A record 11 candidates including the current president Yoweri Kaguta Museveni alongside 25-year-old John Katumba, and a popular pop star Robert Kyagulanyi (aka Bobi Wine) were nominated. As expected, this activity generated a lot of buzz on social media platforms such as Twitter and Facebook. We extracted all this Data to perform text analytics in an attempt to answer some of the questions below;

  • What is the sentiment attached to the social media posts about a particular candidate?
  • How has that score changed over time?
  • What emotions are attached to social media posts about a candidate?
  • What are the most popular words appearing in posts about some of the candidates?
  • Who is publishing these posts/tweets and where are they located?
  • Which accounts have published the most tweets about a particular candidate?
  • Are any of these accounts (influencers) bots?

This report is part of a series of reports published by [The Finder’s Lab](https://twitter.com/TheFindersLab), an initiative that seeks to use Data Science techniques to understand social media feedback by citizens. This includes areas like service delivery, policy decisions, etc. At The Finder’s Lab, we know that the internet promotes the interaction among politicians, bureaucrats, and citizens in form of platforms like social media and that all the comments of social media users can be profitably used to extract meaningful information that supports the action of policymakers across the policy cycle.

To answer the above questions, the Twitter API was used to extract all English tweets mentioning a popular presidential hashtag, a name of a party, the names of a candidate, and other popular word groups. A total of 83,055 tweets were obtained for the nomination week and labeled according to a candidate. To ensure that relevant data is obtained, trending topics/keywords in the East African region were also included in the search queries.

library(rtweet)
#Getting trending topics in the region
whatstrending <- get_trends(lat = 1.3707295, lng = 32.3032414)
View(whatstrending)
#Sample basic search query
CandidateA <- search_tweets("Museveni OR M7", include_rts = FALSE, lang ="en", since = "2020-11-01")

This data was then cleaned to remove tweets fetched by a keyword or letter combination included in the search query but not directly linked to any of the candidates.

In terms of tweet volume, candidates Robert Kyagulanyi and Yoweri Kaguta Museveni were mentioned in the majority of the tweets – each contributing 43.7% and 42.8% respectively and the remaining candidates altogether accounted for the remaining 13.5%.

The collected tweets were then analyzed using a lexicon-based approach to determine the sentiments of the public using a technique called "sentiment analysis". Sentiment analysis is the process of retrieving information about the perception of a product, service, or brand. How this works is; a sentence such as "I love @CandidateA" would fetch a positive sentiment for that candidate while "I hate @CandidateA" would generate a negative sentiment for the candidate. All the sentiments attached to comments of a particular candidate were then extracted and compiled to come up with a final average score for a given period e.g. a day. On average, all of the candidates received positive sentiments and the overall average sentiment score was 0.08.

library(sentimentr)
Average_sentiment <- sentiment_by(All_candidates$text)

In terms of daily sentiment scores, numerous candidates received fluctuating sentiment scores with some scoring negative sentiments on certain days and positive on others. During day 1 of nominations, Joseph Kabuleeta had the highest sentiment score while Nobert Mao had the lowest. During the second day of nominations, Nancy Kalembe had the highest sentiment score while Partick Amuriat had the lowest. Explore the sentiments for each day using the graph below;

The most popular candidates by tweet volume were Yoweri Kaguta Museveni and Robert Kyagulanyi. Exploring the tweet content of both these candidates revealed the following popular words for each of the candidates;

Note: In order to explore the words contained in the tweets, the tweets were compiled in a grouped corpus, cleaned to remove stop words, figures, and other unnecessary words, and the resulting words dataset of was plotted.

Image by Author
Image by Author
Image by Author
Image by Author

In both scenarios, the candidate’s respective name or nickname came up as the most used word with "Kaguta Museveni" combined as the most used word for posts talking about YK Museveni while "HEBobiWine" combined was the most popular word used in posts mentioning Robert Kyagulanyi. (This is also his Twitter handle) The above charts also revealed that most of the tweets mentioning one of the candidates mentioned the other – which explains why both the tweet volume and the sentiment scores for the two candidates were close to each other.

To check the top locations of the authors of the tweets, the "table" function in base R was applied to the location column. This returned results showing that the majority of the accounts had no location included. For those that had shared their location information, the majority of the tweets came from Uganda, Kenya, Tanzania, Rwanda, the US, England, United Arab Emirates. Within Uganda, the bulk of the tweets were from Kampala, Wakiso, Jinja, and Mukono.

To check for bot activity, the most active accounts (Accounts with the largest number of tweets) were subjected to a machine learning algorithm that compares an account’s activity to tens of thousands of previously labeled accounts and returns a bot score. A low score indicates higher chances of a human account while a high score indicates bot behavior. For this particular task, the Botometer machine learning algorithm was used. Read more about how Botometer works here!

The most popular user names posting about Yoweri K Museveni were "@ugnews24", "@begumasz", "@jordanshirumat2", "@Jesssie_M7", "@nbstv", and "@kamukamafredie". When subjected to the botometer algorithm, the following scores were returned;

Image by Author
Image by Author

One of the usernames "@_JessieM7" showed the most bot-like characteristics after scoring 4.1/5 (82%) on the bot scale.

When looking at the top accounts (Influencers) posting about Robert Kyagulanyi, the top 6 usernames/accounts were "@begumasz", "@ugnews24", "@LeCrownedPrince", "@ssebunyashaf", "@PromoterTymz" and "@ghettoradioug". When these were subjected to the botometer algorithm, the following scores were returned;

Image by Author
Image by Author

One of these usernames scored a 5/5 (100%)on the bot score. This was the "@ghettoradioug".

Another bot checking machine learning algorithm that can be used directly in R is the TweetBotOrNot algorithm. To use this algorithm, the "TweetBotOrNot" package is loaded in R as shown below;

Further steps

This data will continuously be extracted and compiled to form a large dataset comprising of all tweets and posts made during this election period. Further analysis will also be carried out using other techniques like network analysis to understand how the various accounts are connected.

Conclusion

There are many other tweets that aren’t included in this analysis because of the language used. This analysis only compiled tweets made in English mainly because the currently available tools/algorithms for scoring sentiments are English based. In addition to this, sentiment analysis is a good tool for extracting insights from textual content but one should always remember that when data is extracted from social media platforms, opinions obtained might not actually represent those of the general population, and this technique doesn’t recognize sarcasm.


Related Articles