The world’s leading publication for data science, AI, and ML professionals.

African Influencers: Twitter Users’ segmentation

Introduction

Using Twitter Data to Determine Top African Influencers to Drive a Marketing Strategy.

Photo by Luke Chesser on Unsplash
Photo by Luke Chesser on Unsplash

Introduction

Twitter is one of the most popular social media platforms. It’s a pool of constantly updating information streams where trends, likes, hobbies, communities, and news can be studied. Among the millions of users is a small percentage of influencers both politically and socially. These small groups can be determined, studied, and used for various campaigns and marketing strategies. The aim of this research is to determine and rank top influencers and government officials in Africa using Python.

To achieve this we’ll use 3 metrics score:

  • Popularity score – likes and retweets
  • Reach score – in-degree influence
  • Relevance score -mentions and reply counts

Data Collection

We start by collecting the twitter handles from

A web scraping technique is used to acquire the needed twitter handles. Tweepy is then used to extract users’ information needed for the analysis.

Python Libraries

from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
import pandas as pd
import re
import numpy as np
import os, sys
import fire
import tweepy

Web Scrapping

We scrape the listed websites to acquire the twitter handles of top African personalities and Government Officials.

def simple_get(url):
    """
    Attempts to get the content at `url` by making an HTTP GET request.
    If the content-type of response is some kind of HTML/XML, return the
    text content, otherwise return None.
    """
    try:
        with closing(get(url, stream=True)) as resp:
            if is_good_response(resp):
                return resp.content  #.encode(BeautifulSoup.original_encoding)
            else:
                return None

    except RequestException as e:
        log_error('Error during requests to {0} : {1}'.format(url, str(e)))
        return None

def is_good_response(resp):
    """
    Returns True if the response seems to be HTML, False otherwise.
    """
    content_type = resp.headers['Content-Type'].lower()
    return (resp.status_code == 200 
            and content_type is not None 
            and content_type.find('html') > -1)

def log_error(e):
    """
    It is always a good idea to log errors. 
    This function just prints them, but you can
    make it do anything.
    """
    print(e)

def get_elements(url, tag='',search={}, fname=None):
    """
    Downloads a page specified by the url parameter
    and returns a list of strings, one per tag element
    """

    if isinstance(url,str):
        response = simple_get(url)
    else:
        #if already it is a loaded html page
        response = url

    if response is not None:
        html = BeautifulSoup(response, 'html.parser')

        res = []
        if tag:    
            for li in html.select(tag):
                for name in li.text.split('n'):
                    if len(name) > 0:
                        res.append(name.strip())

        if search:
            soup = html            

            r = ''
            if 'find' in search.keys():
                print('findaing',search['find'])
                soup = soup.find(**search['find'])
                r = soup
if get_ipython().__class__.__name__ == '__main__':
    fire(get_tag_elements)

Below is a sample output after passing one of the listed websites:

['@EswatiniGovern1',
 '@MalawiGovt',
 '@hagegeingob',
 '@FinanceSC',
 '@PresidencyZA',
 '@mohzambia',
 '@edmnangagwa',
 '@MinSantedj',
 '@hawelti',
 '@StateHouseKenya',
 '@PaulKagame',
 '@M_Farmaajo',
 '@SouthSudanGov',
 '@SudanPMHamdok',
 '@TZSpokesperson',
 '@KagutaMuseveni']

Using tweepy APIs we can pass the list acquired from the scrapped websites to scrap the user information from twitter. For our case, we need users’ screen_name, number_of_tweets, following, followers, likes, retweets, hashtags, and mentions for each account.

Below is a sample code:

df = pd.DataFrame(columns=['screen_name','description','number_of_tweets','following', 'followers',
                          'likes', 'retweets', 'hashtags', 'mentions'])
def get_data(account_list):
    for target in account_list: 
        item = api.get_user(target)
        name = item.name
        screen_name = item.screen_name
        description = item.description
        number_of_tweets = item.statuses_count
        following = item.friends_count
        followers = item.followers_count
        # age of account
        account_created_date = item.created_at
        delta = datetime.utcnow() - account_created_date
        account_age_days = delta.days
        if account_age_days > 0:
            avg_tweets_per_day = float(number_of_tweets)/float(account_age_days)
        # tweets (hashtags and mentions)
        global hashtags, mentions, replies, comments  # making them global in order to intergrate them to the df later
        hashtags = []
        mentions = []
        comments = []
        retweet_count = []
        likes_count = []
        replies = []
        tweet_count = 0
        end_date = datetime.utcnow() - timedelta(days=180)
        for status in Cursor(api.user_timeline, id=target, include_rts=False).items():                
            tweet_count += 1
            if hasattr(status, "entities"):
                entities = status.entities
        process_status(status)
            #hashtags
            if "hashtags" in entities:
                for ent in entities["hashtags"]:
                    if ent is not None:
                        if "text" in ent:
                            hashtag = ent["text"]
                            if hashtag is not None:
                                  hashtags.append(hashtag)
            #mentions  (will fetch other users but will later use to do mention counts between the involved users)                   
            if "user_mentions" in entities:
                for ent in entities["user_mentions"]:
                    if ent is not None:
                        if "screen_name" in ent:
                            name = ent["screen_name"]
                            if name == target:
                                if name is not None:
                                    mentions.append(name)

A data frame is returned as the output. Below is a sample of the output.

Analysis

The analysis is done in line with the metrics to determine top Influencers in Africa. Matplotlib is used to plot graphs and visualize the analyzed data. Below is a bar graph of Top African Government officials’ popularity Score.

A bar graph of top African personalities based on the in-degree influence.

Conclusions

Based on the analysis as indicated by the above plots, the top African influential personalities are:

  • Trevor Noah
  • Julius S Malema

From the analysis the top African influential Government officials are:

  • M Buhari
  • Kaguta Museveni

Digital Marketing is there new norm for marketing strategies. Partnering with the right influencers bridge the gap between the commodities and the right market. Data Science can be used in businesses to drive key marketing decisions.

References

[1] Cha, M., Haddadi, H., Benevenuto, F., & Gummadi, K. (2010). Measuring User Influence in Twitter: The Million Follower Fallacy [Ebook]. Retrieved from http://twitter.mpi-sws.org/icwsm2010_fallacy.pdf

Full code available at the GitHub repository.


Related Articles