
Faced with growing public and political pressure, social media giants like Facebook and Twitter have been forced to step up the crack down on state-backed disinformation campaigns on their platforms.
FB has taken action against offending accounts originating in Iran and Israel, while Twitter has released millions of tweets identified as the work of state-backed operators in Russia, Iran, Venezuela and Bangladesh.
What remains unclear, however, is the process by which these accounts and social media posts are identified. How might one go about unmasking such operators on social media, say, in Singapore?
I tackled this question for my final project at General Assembly (Singapore), where I was enrolled in a 12-week Data Science bootcamp. Using a combination of data science techniques, I believe the following three steps can be used to help uncover the work of state operators on social media:
- Step 1: Establish the trolls’ "digital fingerprint" – Exploratory data analysis and visualisation can be used effectively to identify the state operator’s digital trail and modus operandi.
- Step 2: Build a machine learning model and web app to classify the suspicious tweets/posts more efficiently.
- Step 3: Use SHAP to gain granular insights into where the model’s predictions have gone right, or wrong. Insights from this step can be fed back to Step 1, creating a virtuous cycle in the investigation.
UPDATE – SEPT 2020
I’ve uploaded a new repo on how you can use a fine tuned transformer model to detect state troll tweets. Detailed notebooks and data can be found here. A broad overview of the steps and tests involved can be found in this Medium post.
FIRST THINGS FIRST
My project is focused on Twitter, though I think the approach can be broadly applied to FB or Instagram posts. This post is a broad summary of the results, and I won’t delve into the code.
My repo for the project can be found here: http://bit.ly/StateTrolls (best to clone or download the notebooks. Their huge file size make them a pain to load online.)
I have also built a simple web-app for detecting Russian state trolls that you can try here: http://bit.ly/TrollDetector
The state-backed tweets (officially released by Twitter) used in this project can be downloaded here.
BACKGROUND
So, who are these state-backed trolls and what are their objectives? The best-known perpetrator to date is Russia’s Internet Research Agency (IRA) and its multi-year/platform effort to influence the 2016 US election.
A detailed discussion of their activities can take up several Medium posts, so the prudent thing to do here is to highlight two of the more extensive reports on this subject:
- Online intelligence company New Knowledge’s in-depth report on the tactics and tropes of the IRA.
- Bloomberg’s long-form report on state-trolling operations around the world.
IRA’s success has spawned copycat acts around the world, including similar operations by Iran, Venezuela and Bangladesh. The problem is only going to get worse before a comprehensive solution is in place.
DATA USED IN THIS PROJECT

The chart above summarises how I collected and filtered the data for this project.
The key challenge in such projects has always been the difficulties in establishing the "ground truth", that is, how do you know which are the state-troll tweets? In my case, I relied on the trove of state troll tweets officially released by Twitter (which has not disclosed its internal process for identifying these tweets/accounts).
I filtered out the non-English tweets, as well as retweets, so that the classification model won’t end up differentiating tweets on the basis of language. This is one limitation of my project clearly, as language and RTs are a key aspect of state troll behaviour on Twitter.
For real tweets, I used Tweepy to scrape over 81,500 tweets from 35 verified accounts – including news outlets, politicians like Trump and Hillary Clinton, as well as active users in the US.
The cleaned-up data was used to build two classification models (more on this later), one to detect Russian state troll tweets and another to pick out Iranian ones. Each model was trained on 50,000 tweets – with an even split of real/troll tweets. The models were also tested on unseen test sets of tweets that contained varying proportions of state troll tweets from Russia, Iran and Venezuela.
STEP 1: ESTABLISHING A ‘DIGITAL FINGERPRINT’

State trolls may operate deep behind the scenes, but their end-goals force them to leave a clear digital trace on social media, as the t-SNE plots (above) of tweets from 10 troll and real accounts show.
From my exploratory data analysis of the Russian IRA tweets, it is clear that there are clear structural patterns in terms of time, activity, and mode-of-disguise. While some traits are unique to the IRA’s goals during the 2016 election, I would argue that state trolls operating on an open platform like Twitter would leave behind broadly similar trails.
On their own, none of these trails would qualify as "smoking gun" evidence. But taken together, they paint a picture of fairly consistent behaviour. Let’s look at some of these key characteristics:
1.1 ACCOUNT CREATION DATE

By now, Twitter is a mature platform where user-growth is slowing. Its popularity peaked in 2009, meaning sudden sharp spikes in new user sign-ups in mature markets like the US should set off alarm bells.
Parsing the account creation dates for the Russian troll accounts, it is clear that a suspicious majority was created in 2014. An industrious state-operator can hide this aspect of their trail by slowly spreading out their account creation dates. But one should never underestimate the human capacity for sloppiness.
1.2 STRUCTURAL CHARACTERISTICS OF REAL VS STATE-BACKED TWEETS/ACCOUNTS

If you understand the user experience on a social platform, discerning suspicious behaviour becomes easier. In the context of Twitter (and the firehose nature of content on the platform), no normal real user would follow thousands, much less tens of thousands, of other users, as that would completely ruin the user experience.
The chart above shows the number of followers that the top 10 Russian troll accounts in my dataset had (left), versus the suspiciously high number of accounts they were following (right). The follower-to-following ratio is thus a key tell-tale sign of troll accounts.
The pattern stands in sharp contrast with that for real users, as the chart below shows:

Tweets from the state trolls are also predominantly shorter, consisting of about 11–14 words, and fewer than 70 characters. See my notebook for a more detailed breakdown of the structural characteristics of the troll tweets compared to the real ones.
1.3: HIDING IN PLAIN SIGHT
Perhaps the most interesting aspect of the Russian state trolls’ modus operandi is the way they disguise themselves – in this case by trying to pass off as local news outlets in the US. Look at the similarities between the results for the topic modelling of real versus troll tweets.


At first glance, it seems odd that the topic modelling for real and troll tweets would look so similar. And why would innocuous terms like "showbiz", "politics" and "newyork" figure highly in the list of 30 most relevant terms in the topic model for troll tweets?
But it makes perfect sense when you think about it from a user’s perspective. Breaking news and informational updates form a large part of how Americans use Twitter.
As such, one of the best ways for Russian trolls to slip in well-targeted propaganda is by fooling real users into thinking a troll account is a legit source of news and useful information (which can be sports or entertainment updates). Ultimately, their goal is not to overwhelm regular users with a barrage of troll content, but to mix the fake with the real so that the users can’t tell the two apart after a prolonged period of time.
The New Knowledge report on the Russian Disinformation campaigns dubbed this the "media mirage". Look at the set of tweets below. Unsuspecting users might not immediately realise these are troll tweets by Russian operators.


A considerable amount of in-market research will obviously be required to complement these EDA techniques and charts. The insight into an unusual spike in account creation in a particular year or month, for instance, would only be meaningful if paired with knowledge of the political calendar.
But bottomline is clear: There’s a method to the madness of these state trolls. Their goals and the open nature of social platforms will force them to leave behind a digital trail and a distinctive pattern of behaviour.
Piecing together their "digital fingerprint" is an essential first step ahead of any attempts to use a Machine Learning model to separate suspicious looking tweets from the real ones.
STEP 2: BUILD + TRAIN-TEST CLASSIFIER

How fast can you sort out the six tweets above into troll and real tweets? What about 600 or 6,000 tweets? Spotting and sorting troll tweets manually won’t be a practical matter at scale. This is where a classifier comes into the picture.
You can build a really complex model to discern the troll/real tweets using a combination of features gleaned from Step 1 as well as the contents of the tweet text. But that would really complicate the building of the companion web app (more on this later). It would also make it harder to analyse the individual predictions in Step 3.
For these reasons, I’ve kept the design of my models for this project as simple as possible. I’m using just 1 predictor -a "clean tweet" column where the original tweet text had been cleaned for punctuation etc – against a target column of "bot_or_not" (0 for real tweets, 1 for troll tweets).

The above chart summarizes the workflow for the Russian troll detector model in notebook 3.0, as well as the Iranian troll detector model in notebook 3.2. It’s a straightforward process involving the standardised cleaning of the raw, English tweets before running them through a pipeline that includes a CountVectorizer, a TFIDF transformer, and a classifier.
In this project, I opted for the more common classifiers— Naive Bayes, Logistic Regression and Random Forest. You can pick the more complex classifiers, but the time needed to complete the pipeline run could grow exponentially given the size of the training set (50,000 rows).
On balance, the Logistic Regression model emerged as the best of the three models that I tried. It had the best f1 score – the balance of precision and recall – and the fastest mean fit time.
Let’s start with Model 1, which was trained with Russian troll tweets, and look at how it performed against 3 different unseen test sets of 100 tweets, where the proportion of troll tweets was gradually reduced from 50% to about 10%:



Model 1 was surprisingly good at picking out new Russian troll tweets amid the real ones, even as the proportion of troll tweets got progressively reduced. Its f1 score stayed at 0.8-0.9 throughout the tests. The model correctly picked out the vast majority of unseen troll tweets, even scoring a perfect recall score of 1.0 in the 90–10 test set (extreme right, above).
But is it any good against troll tweets by other state operators? I tested Model 1 against unseen test sets with troll tweets from Iran and Venezuela, with predictably terrible results:


Model 1’s recall scores tanked in the tests against unseen test sets with Iranian and Venezuelan troll tweets. While it continued to pick out the real tweets by American users really well, it failed to catch the majority of the Iranian or Venezuelan troll tweets.
These results seem to suggest that the work of each state-backed operator is quite specific. Instead of trying to build a massive "global" model that might catch all state-operators, it seems to make more sense to build smaller, more nimble models that can better identify specific state operators.
I built a second model to test this argument, this time training the model on Iranian troll tweets in combination with real tweets from verified American and international users. Here are the results under similar test conditions, with the proportion of Iranian troll tweets gradually reduced from 50% to about 14%:



Model 2 turned in a sterling performance as well, in terms of its ability to pick out new Iranian troll tweets which it had not seen before. Its f1 score was above 0.9 for all 3 tests.
And as the 3 confusion matrices above showed, the model was exceedingly good at picking out the troll tweets, scoring a perfect recall score in the 90–10 set where it picked out all 14 troll tweets and misclassified just 1 out of 100 tweets.
The conclusion is obvious (though not entirely apparent at the outset): A model trained on a particular state-backed operator’s tweets won’t generalise well. To catch the state trolls, you’ll need highly tailored solutions for each market where they are found.
STEP 2.1: USING A WEB APP FOR QUICK CHECK-INS

Catching these state-backed operators requires team effort, and not everyone involved will have the skills to run a large number of suspicious tweets through a machine learning model. Neither is it efficient to run a model each time you want to check on a few potentially suspicious tweets.
To this end, I built a simple web app – http://chuachinhon.pythonanywhere.com/ – where the user only needs to key in the text of a suspicious tweet to get a quick check on whether it could be a Russian troll tweet or not. The app is simple and you can easily build 10 different versions if you need to put them in the hands of teams in 10 different countries or markets.
It won’t be as accurate as the latest model on the data scientist’s computer, but it serves as a quick diagnostic tool that would complement other tools used in Step 1 to identify a state troll’s digital fingerprint.
STEP 3: ANALYSING PREDICTIONS WITH SHAP
There are at least two ways to further analyse the model’s predictions, such as by examining the predicted probabilities or plotting the frequencies of key words which appear most often.
But neither method offers the level of granularity that SHAP can provide, in terms of shedding light on what features prompted the model to predict whether a tweet is a real or troll tweet. SHAP can also be used to gain insights into where the model’s predictions went wrong, and what could have caused the incorrect classifications – an essential insight for updates to the model as the trolls update their tactics.
A detailed explanation of SHAP, or SHapley Additive exPlanations, is beyond the scope of this post. In the context of this project, it is perhaps easier to illustrate how SHAP works with a few examples.
SHAP EXAMPLE 1
Here’s a tweet that the Model 1 accurately classified as a real tweet: "An announcement of a summit date is likely to come when Trump meets with Chinese vice premier Liu He at the White House."

Each model has a unique base value, based on the average model output over the training dataset passed. In this case, Model 1’s base value is 0.4327.
Different vectorized features will push the model’s predictions in different directions. If the eventual output is below the base value, it is classified as a real tweet. If the output is above the base value, it is considered a troll tweet (in the context of how I’ve labelled the output in this project).
In the example above, we can see that factual words like "chinese", "summit", "premier" pushed the model towards classifying the tweet as a real one, while interestingly the words "trump meets" were pushing the model in the opposite direction.
SHAP EXAMPLES 2 AND 3
Let’s look at two more tweets, where Model correctly classified as Russian troll tweets: "@HillaryClinton #HillaryForPrison" and "@HillaryClinton Fuck off".


In the first tweet above, the hashtag "hillaryforprison" was the strongest feature in pushing the model to classify this as a troll tweet. In the second tweet, the swear word and its combination with Hillary Clinton’s name were the strongest factors in pushing the model towards classifying it as a troll tweet.
While the model has no innate understanding of American politics or news, it has been fed with enough examples of real and troll tweets to be able to make a distinction between the two sets.
The model can be defeated by troll tweets, of course. Let’s look at some examples where Model 1 got its predictions wrong.
SHAP EXAMPLES 4 AND 5
Model 1, the Russian troll detector, classified this tweet wrongly, predicting it as a troll tweet (above base value) when it is in fact a real tweet: "When Clinton got caught with her private email server, most of the folks i knew in the NatSec community were pissed…"

The words "pissed", "private email", and "caught" pushed Model 1 towards classifying this as a troll tweet – when it was in fact written by Dan Drezner, a Professor at The Fletcher School and a columnist for the Washington Post.
Model 1 also failed on numerous occasions when exposed to Iranian troll tweets, which it was not trained on. It classified this tweet as real, when it was in fact a troll tweet: "Spain, Italy warn against investing in Israeli settlements."

Short and factual tweets written like a news headline seem to trip up the machine learning model. Likewise, the model seems to struggle with slightly more complex tweets like the one involving the email server.
My takeaway from this is a simple one: Effective identification of state-backed disinformation campaigns on social media requires a good combination of human input/analysis with the smart use of machine learning and data analysis tools.
What seems obvious to a human may not be so for a machine with no geopolitical knowledge, while a machine can be far more efficient in spotting patterns which would take a human a long time to sort through manually.
LIMITATIONS

The chart above sums up some of the limitations of my approach to unmasking state-backed trolls on Twitter. Language is perhaps the trickiest issue to deal with, from the perspective of model building.
On the one hand, you don’t want your troll detector to become a glorified language classifier. On the other, you are missing out on a key trait of the state-backed trolls by training the model only on English tweets.
Data scientists familiar with deep learning techniques will perhaps have better solutions in this area.
The biggest limitation is the fact that this process is entirely reactive and diagnostic. There is no way to pre-empt the work of the state trolls, at least not with the tools available in public.
This is my first attempt at applying my nascent data science skills to a complex problem like online disinformation. Mistakes here and in the notebooks are all mine, and I would welcome feedback from experts in the field, or any corrections in general.
Finally, a big thank you to Benjamin Singleton for his help with this long-suffering project. Special shoutout also goes out to Susan Li for her excellent NLP and Flask tutorials, which helped me tremendously.
Here again are links to key resources for this project:
Github Repo: https://github.com/chuachinhon