
With the 2020 US election around the corner, concerns about electoral interference by state actors via social media and other online means are back in the spotlight in a big way.
Twitter was a major platform that Russia used to interfere with the 2016 US election, and few have doubts that Moscow, Beijing and others will turn to the platform yet again with new disinformation campaigns.
This post will outline a broad overview of of how you can build a state troll tweets detector by fine tuning a transformer model (Distilbert) with a custom dataset. This builds on my earlier project using "classic" Machine Learning models and a simple bag-of-words approach to detect state troll tweets.
I’ll also compare the results from the fine tuned Distilbert model with those from a Logistic Regression and an XGBoost model, to see if a transformer model can indeed perform better in a practical use case.
Spoiler alert: The fine tuned transformer model performed significantly better than the Log-Reg and XGBoost models (which weren’t slouches either, to be sure), and held up much better when exposed to state troll tweets from a third country. Jump ahead to Section 4 for the results.
1. GROUND TRUTH, DATA SOURCE, MODEL AND REPO
First things first: How did I ascertain whose tweets are deemed the work of state influence campaigns? In this case, the ground truth is established by Twitter’s election integrity team.
The state troll tweets used in this project are those which have been identified by Twitter and progressively released to the public since 2018. I chose six sets of state troll tweets from Twitter – three each from China and Russia – which I cleaned up, combined and downsampled to 50,000 rows.
I created an equivalent set of 50,000 rows of real tweets by using Tweepy to scrape 175 accounts, which comprise a mixture of verified users and those which I’ve personally checked for authenticity. The resulting combined dataset of 100,000 rows of state troll-plus-real tweets was further split into the usual train-test-validation sets in the standard proportion of 70:20:10 respectively.
Full details are in my notebooks, and I won’t repeat them here for brevity’s sake. The full repo is available here. Fine tuning was done on a Colab Pro account, and took about five-and-a-half hours.
The fine tuned Distilbert model is too big for Github, but I’ve uploaded a copy to Dropbox for those who just want to experiment with the model. The six raw CSV files containing state troll tweets have to be downloaded from Twitter if you wish to create a larger training set.
2. DATA PREPARATION
It ought to be pointed out that key assumptions in the data cleaning and preparation process will affect the outcomes. The assumptions are necessary to keep the scope of this project practical, but if you disagree with them, feel agree to slice a different version of the data based on your own preferred cleaning rules.
My main data cleaning rules for this project:
- Exclude non-English tweets, since the working assumption is that the target audience is English-speaking. I also wanted to prevent the model from making predictions based on language.
- Exclude retweets.
- Exclude tweets which have fewer than three words after text cleaning.
I also kept the eventual combined dataset to 100,000 rows for practical reasons. My initial attempts to fine tune Distilbert with a 600,000 rows dataset resulted in repeated crashes of the Colab notebook and/or showed extremely long and impractical run times. A more ambitious version of this project just won’t be practical without access to more compute/hardware.
3. FINE TUNE THE DISTILBERT MODEL WITH CUSTOM DATASET
For this project, I picked the distilbert-base-uncased model (smaller and more manageable) and used Hugging Face’s trainer for the task. I abandoned attempts to do a hyperparameters search after several trials which ran on for too long, and hope to return to this topic at a future date (check out the discussion here).
The steps involved are fairly straightforward, as outlined in notebook2.0 of my repo. The code is mostly based on the excellent examples and documentation on Hugging Face [here](https://huggingface.co/transformers/master/main_classes/trainer.html) and here.
The immediate results of the fine tuning certainly look impressive:
'eval_accuracy': 0.9158421345191773,
'eval_f1': 0.9163813100629625,
'eval_precision': 0.9098486510199605,
'eval_recall': 0.9230084557187361
I ran a quick test against the validation set – 10,000 rows kept aside and which the model has not seen at all – and the excellent classification metrics barely budged:
'eval_accuracy': 0.9179,
'eval_f1': 0.9189935865811544,
'eval_precision': 0.9178163184864012,
'eval_recall': 0.9201738786801027
For a more thorough comparison and analysis, I trained two separate Logistic Regression and XGB classifier models using the same dataset used to fine tune the Distilbert model. Let’s see how they do in the same tests.
4. FINE TUNED DISTILBERT VS LOG-REG VS XGB
The pickled Log-Reg and XGB models are available in my repo’s "pkl" folder. My notebooks detailing their optimisation process are available [here](https://github.com/chuachinhon/transformers_state_trolls_cch/blob/master/notebooks/3.1_compare_xgb_cch.ipynb) and here. I won’t go into the details here, except to highlight that both models scored above 0.8 during training and grid search. While clearly lower than the fine tuned Distilbert model’s score, I think these two models did well enough to provide for an adequate comparison.
The chart below shows the detailed breakdown, via confusion matrices, of how all three models performed against the validation set (10,000 rows comprising 5,061 state troll tweets and 4,939 real tweets).

At a glance, it’s clear that the fine tuned Distilbert model (on far left) is the strongest performer, and accurately picked out more state troll and real tweets than the Log-Reg or XGB models. More significantly, the number of false positives and false negatives for the Distilbert model is about half that produced by the log-reg and XGB models.
So while the three models’ performance metrics might seem close, the gap in their classification capabilities become very stark when seen via confusion matrices that give us a better sense of how they performed in classifying thousands of tweets.
4.1 ADDITIONAL TEST FOR THE 3 MODELS
All three models were trained on a 50–50 split of real tweets and state troll tweets from China and Russia. In reality, we don’t know the actual proportion of real-versus-troll tweets on Twitter. More importantly, state troll tweets can come from any country and changes in tone, language and topics could significantly affect the classifier’s ability to separate real tweets from those originating from state-backed campaigns.
Which of the three models would hold up better if exposed to state troll tweets from a third country? To find out, I ran a new dataset – comprising 1,000 troll tweets from Iran and 993 real tweets from American users – through the three models. This new dataset was created from an earlier project I did on the same subject.

As expected, all three classifiers saw their performance drop significantly when exposed to Iranian state troll tweets which they’ve not seen before.
But the fine tuned Distilbert model still held up decently, in my view. Not only did it correctly pick out more troll and real tweets compared to the Log-Reg or XGB model, the number of false negatives (troll tweets that the model thought were real tweets) was not excessively large.
This is particularly problematic with the Log-Reg model, which classified half (501 out of 1,000) of the troll tweets as real tweets when they were in fact the work of state-backed operators. The XGB model is slightly better in this area, but not by far, with the number of false negatives (468) significantly higher than the number of false positives.
This was a particularly pronounced problem with the models I trained in my earlier project, meaning classifiers trained on one particular state actor’s troll tweets were quite good at spotting new, unseen tweets from said state actor. But once troll tweets from another state operator is injected into the mix, the classifier’s performance falters significantly.
The fine tuned Distilbert model doesn’t overcome this problem completely, but holds up well enough to hold out hopes for a model that can "generalise" better. If you have enough computing resources to train a transformer model on a bigger dataset comprising state troll tweets from all the countries identified by Twitter to date, it stands to reason that said model might do better in the sort of tests we’ve tried in this post.
Unfortunately that’s a hypothesis I’ll have to test another time.
5. CONCLUSION
Detection of state influence campaigns on Twitter involves more than just the examination of the tweet text, of course. State trolls often leave bigger tell-tale signs like the (coordinated) dates of account creation, or timing of their tweets. The increasing use of photos and memes also make the detection process trickier.
But spotting trends and hidden structures in their tweet text will continue to be one major area of focus. And it would appear that a fine tuned transformer model can perform significantly better in this task compared to the more popular or traditional classifiers out there.
There are trade-offs, of course, in terms of resources and time. The hardware needed to fine tune a transformer model on, say, a million rows of tweets, is not readily available to most users, to say nothing of whether that’s the most efficient way to tackle the task.
6. BONUS SECTION: WEB APP
I tried deploying the fine tuned Distilbert model as part of a simple web app but quickly found out that free hosting accounts don’t have enough disk space for pytorch installation on top of hosting the model.
But I’ve uploaded the necessary files to the repo for anyone who wants to try it out on their local machine. Just make sure to download the fine tuned model from Dropbox and move it to the "app" folder.

As always, if you spot mistakes in this or any of my earlier posts, ping me at:
- Twitter: Chua Chin Hon
- LinkedIn: www.linkedin.com/in/chuachinhon