Using Twitter Data to Determine Top African Influencers to Drive a Marketing Strategy.

Introduction
Twitter is one of the most popular social media platforms. It’s a pool of constantly updating information streams where trends, likes, hobbies, communities, and news can be studied. Among the millions of users is a small percentage of influencers both politically and socially. These small groups can be determined, studied, and used for various campaigns and marketing strategies. The aim of this research is to determine and rank top influencers and government officials in Africa using Python.
To achieve this we’ll use 3 metrics score:
- Popularity score – likes and retweets
- Reach score – in-degree influence
- Relevance score -mentions and reply counts
Data Collection
We start by collecting the twitter handles from
- https://africafreak.com/100-most-influential-twitter-users-in-africa
- https://enitiate.solutions/top-18-african-heads-of-states-on-twitter/.
A web scraping technique is used to acquire the needed twitter handles. Tweepy is then used to extract users’ information needed for the analysis.
Python Libraries
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
import pandas as pd
import re
import numpy as np
import os, sys
import fire
import tweepy
Web Scrapping
We scrape the listed websites to acquire the twitter handles of top African personalities and Government Officials.
def simple_get(url):
"""
Attempts to get the content at `url` by making an HTTP GET request.
If the content-type of response is some kind of HTML/XML, return the
text content, otherwise return None.
"""
try:
with closing(get(url, stream=True)) as resp:
if is_good_response(resp):
return resp.content #.encode(BeautifulSoup.original_encoding)
else:
return None
except RequestException as e:
log_error('Error during requests to {0} : {1}'.format(url, str(e)))
return None
def is_good_response(resp):
"""
Returns True if the response seems to be HTML, False otherwise.
"""
content_type = resp.headers['Content-Type'].lower()
return (resp.status_code == 200
and content_type is not None
and content_type.find('html') > -1)
def log_error(e):
"""
It is always a good idea to log errors.
This function just prints them, but you can
make it do anything.
"""
print(e)
def get_elements(url, tag='',search={}, fname=None):
"""
Downloads a page specified by the url parameter
and returns a list of strings, one per tag element
"""
if isinstance(url,str):
response = simple_get(url)
else:
#if already it is a loaded html page
response = url
if response is not None:
html = BeautifulSoup(response, 'html.parser')
res = []
if tag:
for li in html.select(tag):
for name in li.text.split('n'):
if len(name) > 0:
res.append(name.strip())
if search:
soup = html
r = ''
if 'find' in search.keys():
print('findaing',search['find'])
soup = soup.find(**search['find'])
r = soup
if get_ipython().__class__.__name__ == '__main__':
fire(get_tag_elements)
Below is a sample output after passing one of the listed websites:
['@EswatiniGovern1',
'@MalawiGovt',
'@hagegeingob',
'@FinanceSC',
'@PresidencyZA',
'@mohzambia',
'@edmnangagwa',
'@MinSantedj',
'@hawelti',
'@StateHouseKenya',
'@PaulKagame',
'@M_Farmaajo',
'@SouthSudanGov',
'@SudanPMHamdok',
'@TZSpokesperson',
'@KagutaMuseveni']
Using tweepy APIs we can pass the list acquired from the scrapped websites to scrap the user information from twitter. For our case, we need users’ screen_name, number_of_tweets, following, followers, likes, retweets, hashtags, and mentions for each account.
Below is a sample code:
df = pd.DataFrame(columns=['screen_name','description','number_of_tweets','following', 'followers',
'likes', 'retweets', 'hashtags', 'mentions'])
def get_data(account_list):
for target in account_list:
item = api.get_user(target)
name = item.name
screen_name = item.screen_name
description = item.description
number_of_tweets = item.statuses_count
following = item.friends_count
followers = item.followers_count
# age of account
account_created_date = item.created_at
delta = datetime.utcnow() - account_created_date
account_age_days = delta.days
if account_age_days > 0:
avg_tweets_per_day = float(number_of_tweets)/float(account_age_days)
# tweets (hashtags and mentions)
global hashtags, mentions, replies, comments # making them global in order to intergrate them to the df later
hashtags = []
mentions = []
comments = []
retweet_count = []
likes_count = []
replies = []
tweet_count = 0
end_date = datetime.utcnow() - timedelta(days=180)
for status in Cursor(api.user_timeline, id=target, include_rts=False).items():
tweet_count += 1
if hasattr(status, "entities"):
entities = status.entities
process_status(status)
#hashtags
if "hashtags" in entities:
for ent in entities["hashtags"]:
if ent is not None:
if "text" in ent:
hashtag = ent["text"]
if hashtag is not None:
hashtags.append(hashtag)
#mentions (will fetch other users but will later use to do mention counts between the involved users)
if "user_mentions" in entities:
for ent in entities["user_mentions"]:
if ent is not None:
if "screen_name" in ent:
name = ent["screen_name"]
if name == target:
if name is not None:
mentions.append(name)
A data frame is returned as the output. Below is a sample of the output.

Analysis
The analysis is done in line with the metrics to determine top Influencers in Africa. Matplotlib is used to plot graphs and visualize the analyzed data. Below is a bar graph of Top African Government officials’ popularity Score.

A bar graph of top African personalities based on the in-degree influence.

Conclusions
Based on the analysis as indicated by the above plots, the top African influential personalities are:
- Trevor Noah
- Julius S Malema
From the analysis the top African influential Government officials are:
- M Buhari
- Kaguta Museveni
Digital Marketing is there new norm for marketing strategies. Partnering with the right influencers bridge the gap between the commodities and the right market. Data Science can be used in businesses to drive key marketing decisions.
References
[1] Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy [Ebook]. Retrieved from http://twitter.mpi-sws.org/icwsm2010_fallacy.pdf
Full code available at the GitHub repository.