FAILED SURGE: Analyzing Beijing’s Disinformation Campaign Surge On Twitter

China stepped up its disinformation campaign on social media as tensions rose sharply in Hong Kong from May 2019. Ironically, that likely set the stage for the operation’s eventual take down by Twitter.

Published in

Towards Data Science

11 min readSep 24, 2019

Bar chart race showing the surge in Chinese state Twitter troll account creation. Online version here.

When Twitter first exposed China’s disinformation campaign against the protest movement in Hong Kong, it wasn’t entirely clear how Beijing’s operation on the platform was uncovered in the first place.

The social media company had hinted that unusual IP activity could have given Beijing’s game away. Twitter is banned in China but troll accounts from the Chinese operation reportedly accessed the service using VPNs and unblocked IP addresses, according to Twitter’s press release.

But there was no obvious connection when looking through that first tranche of about 3.6 million tweets released by Twitter on August 19 2019. Most of the 890 troll accounts in this tranche were created in 2017 or earlier, and their activity peaked in 2017, especially in November and December that year (charts and analysis of this first tranche in my earlier Medium post on the subject).

Now, a major piece of the puzzle has finally emerged. On September 20, a month after its initial announcement, Twitter released a second tranche of tweets related to China’s state disinformation campaign against the HK protest movement.

In this bigger set of about 10.2 million tweets, I found a clear spike in activity among the Chinese state trolls between May and July 2019, a period when political tensions in Hong Kong escalated sharply. This “campaign surge”, I would argue, could ironically be one of the key reasons why the network set off alarm bells at Twitter and got taken down.

Some top line figures from my analysis:

54% (2,320) of the 4,301 troll accounts in this second tranche were created between May and July 2019. Bear in mind that Twitter is by now a legacy social media platform that no longer experiences exponential user growth. There is no way a sharp spike in account creation like this — and in China, of all places, where the service is blocked — would evade Twitter’s attention. An interactive version of the chart below is available here.

It wasn’t clear from the first set of Chinese troll tweets how the clandestine network came to Twitter’s attention only in 2019. Details from a second tranche, on the right, finally gave us a major clue.

The peak dates for new troll account creation fell on, or were near key milestones in Hong Kong when major political announcements, or violent clashes took place in parts of the city. The largest spike in troll account creation on a single day was on June 14 — one day before HK chief executive Carrie Lam suspended the extradition bill which sparked the protests. Coincidence or coordination? No way for me to tell, but the timing is notable.

Nominally, the Chinese state troll accounts sent out 87,369 tweets and retweets during this “surge” period. But there were in fact just 27,192 unique tweets, suggesting that many of the accounts were retweeting each other, or tweets from the lead trolls.

The tweets/retweets during this period were primarily in Chinese, with English tweets making up just 6%, or 3,723, of the 69, 402 state troll tweets in a filtered subset where I took out tweets involving a running war of words with fugitive Chinese billionaire Guo Wengui. An interactive version of the chart below is available here.

The difference in the volume of English Vs Chinese troll tweets during this surge period is stark. Did someone in the disinformation campaign decide that they had already lost the battle for the hearts and minds of those who read in English?

China’s decision to ban Twitter on the mainland has some unintended consequences here. Due to the ban, some of the state trolls tasked with manipulation on Twitter are not as familiar with the platform as they should be, resulting in some comically “noobish” behaviour.

For instance, upon creation, 43 new Chinese state troll accounts went on to tweet out that default first tweet — “Just setting up my Twitter. #myfirstTweet” — as suggested by Twitter, before going on a retweet spree to spread messages by the lead trolls (@RuthMiller916, @eituji1994 and @bindarsou feature quite prominently in this second tranche of Chinese state troll tweets).

Here’s a sample of what one troll account “‘Qduaf1s3ZCBagXgQdzLa3+CXVR4TP5SsCiEBdRqiNws=’” — its user name was hashed by Twitter — tweeted:

'Just setting up my Twitter. #myfirstTweet',
       '打扮好。恋爱去~🤣🤣🤣 https://t.co/jR3VqsJvtG',
       'RT @RuthMiller916: //醒一醒！唔好再用「是你教我和平示威是沒用」、去為自己暴力衝擊的違法行為合理化！//\n\n【短片】【堅守底線】馬逢國提醒示威者和平表達是公民責任、籲傳媒不要模糊社會道德規範：仲有幾多警員手指可以被咬斷？\n#HongKong #HK #香港…',
       'RT @feituji1994: 冲击警方，扔砖块，扔铁棍，用铁棍殴打警察，这哪还是普通市民，明明就是一群暴徒！此时此刻，那些沉默的大多数难道还要继续沉默？！所有热爱香港热爱家园的人都应该站出来，谴责暴徒，驱逐暴力，支持港警，守护家园！ #HK #HongKong #香港 #逃…',
       'RT @bindarsou: 「反修例」游行中頻頻發生的暴力事件，顯然不是偶然事件，而是有組織有預謀的事件。參與「反修例」的港人應當擦亮眼睛，看清披着「民主」外衣下丑陋而險惡的用心。一腔愛港熱血固然重要，但是是非必須明辨，正邪必須分清，應應當徹底與禍亂香港的暴動分子分清界限。#…'

Another troll account “YuAz3l8sU2izEg7D7mcdPoXc2Vavji5CrbPuDsg8s08=” tweeted:

'Just setting up my Twitter. #myfirstTweet',
       'RT @feituji1994: 那些游行示威的人要求香港民主、法治，殊不知自己的暴力行为，才真正破坏了香港法治的核心价值！😡😡😡😡😡#HongKong #HK #antiELAB #香港 #反修例 https://t.co/4SIu5Upsr9',
       'RT @feituji1994: 说好的理性、和平示威！暴徒却以伞为剑狂殴警察！哪个国家的警察可以被如此对待？！香港的混乱、荒诞让人震惊！#HongKong #HK #香港 #逃犯条例 #沙田示威 #游行 https://t.co/sZXGhfAYfu',
       'RT @feituji1994: 总是把自己包装的很“正义”，是“自由”、“民主”的代言人！看看香港人是怎么说的？！那一小部分人不能代表香港，他们只是最卑微“暴徒”！#HongKong #HK #香港 #逃犯条例 https://t.co/rg4bOvDzh1',
       'RT @feituji1994: 回顧 #香港 修例以來，頻繁 #遊行 到修例壽終正寢，到處處充斥著暴力破壞，到無辜民眾被毆打，到立法會、中聯辦被衝擊，國徽被玷污！反修例只是藉口，不顧民眾安危，搞亂香港，推動顏色革命呼之欲出！境外的黑手，#反對派 的醜惡，暴露無遺！熱愛香港的人…',
       'RT @feituji1994: 穿黑衣、戴头盔的年轻人，竟是反对派会议召集人毛孟静儿子？\n请全城市民认清他们的真面目！有法律界人士指，示威者已犯暴动、刑事毁坏两项罪名，每罪的最高刑罚均是10年监禁。支持警方严正执法，尽快将暴徒绳之于法！\n#搗亂 #立法會 #暴動刑毀 # 毛孟…',
       'RT @bindarsou: 當學校開始給學生洗腦，煽動他們荒廢學業轉而上街遊行時，這個城市還有未來嗎？\n這不僅是學校的問題，家長也是很關鍵的一環。為人父母，放任子女自甘墮落，這難道就是所謂的自由嗎？長此以往，香港如何進步？\n#香港清醒點吧 https://t.co/eS0Zk…',

That default “Just setting up my Twitter” message is a variation of the famous “first tweet” sent out by Twitter’s co-founder and CEO Jack Dorsey on Mar 21 2006. The trolls would send their final tweets soon enough, after their operation was uncovered by Twitter.

This is the third in a series of posts by me analyzing China’s state disinformation campaign against the Hong Kong protests. I’ll explain my approach in greater details in the sections below. My earlier posts on this subject are here and here.

DATA AND REPO

Here’s the repo for the project, and the notebook for this post. It is unlikely to display properly on Github due to its file size. You are better off downloading or cloning the repo and running it on your local machine. Warning though that the notebook is computationally expensive to run due to the heavy use of interactive graphics.

The CSV files are too big to be uploaded on Github. Download them directly from Twitter instead.

BROAD COMPARISON OF 2 TRANCHES OF CHINESE STATE TROLL TWEETS

Before filtering the second tranche of troll tweets to focus on the “surge” period, I first compared the two sets at the broad level:

The second tranche of Chinese state troll tweets released by Twitter was larger by far, comprising 10.2 million rows of data compared to 3.6 million for the first tranche.

The far higher number of unique users in the second tranche of tweets offer the first major clue as to where to start digging — in this case, the account creation date. The state trolls behind Beijing’s campaign could have done one of two things: acquire more existing accounts, or create new ones in large numbers. The latter is a major tell-tale sign of state sponsored disinformation campaigns, such as the one we’ve seen the Russians wage in the US ahead of the 2016 Presidential Election.

COMPARISON OF TROLL ACCOUNT CREATION DATES

The second tranche of Chinese troll tweets was not just bigger in all aspects, but as the 2nd chart on the right shows, a suspiciously large number of new accounts were created in 2019. The spike from May to July far outstripped what we saw in the first tranche of Chinese troll tweets (left):

The 2,320 accounts created between May and August 2019 accounted for nearly 54% of the total number of accounts in the second tranche of 4,301 troll accounts. Beijing likely felt that it had to step up its campaign in the face of escalating tension and protests in Hong Kong.

But this likely backfired as there’s no way Twitter’s monitoring team wouldn’t have noticed a suspiciously high number of accounts in a country that has banned Twitter (Twitter said the accounts were “based in the PRC”).

ANALYSIS OF “SURGE” ACCOUNTS

Due to time and resource constraints, I chose to focus my analysis on the Chinese state troll accounts created between May and Aug 2019. A number of of the top trolls in this second tranche, however, were created much earlier.

A deeper dive into the entire second tranche will have to be left to better resourced research institutes. Below are some of the interesting characteristics of the second tranche that I found:

A. LANGUAGE SETTING, TWEET LANGUAGE USED

The full set of tweets in both tranches comprised multiple languages, such as Bahasa Indonesia, and were stuffed full of spam content on sports, porn, global politics etc.

But the Chinese agency behind the disinformation campaign appeared to be more focused during the “surge” phase, creating troll accounts that were predominantly set to English and Chinese. These new troll accounts also tweeted primarily in Chinese and English, as the breakdown below shows:

B. TWEET ACTIVITY

As expected, the troll accounts created during this “surge” period were tweeting/retweeting most frequently in July, when the protests and clashes on the streets escalated to a level hitherto unseen. Let’s break it down in greater detail:

For about a third of the time in July, at least 400 troll accounts were tweeting on any given day. The peak was on July 9, when 420 troll accounts were tweeting/retweeting on the same day.

Worth noting what took place on July 9: HK CE Carrie Lam announced that the extradition bill was dead and that her government’s work on the legislation had been a “total failure”.

I broke the distribution of the tweets down further into specific day-hour-minute timeframes. You can explore the detailed breakdown by downloading the interactive version of the chart below here:

Medium doesn’t make it easy for interactive charts to be embedded directly, so best to download the repo and run this chart on your browser. Hover over the specific data points to see the detailed timings for when the troll tweets peaked.

ANALYSIS OF ‘SURGE’ TWEETS

A detailed and rigorous content analysis of the full corpora of the tweets is outside the scope of my abilities. I would recommend a separate look as well at the text of the retweets, as they often reflect the key messages that the trolls and their bosses are trying to spread.

A simple term frequency chart also re-confirmed the influential role of one of the lead troll accounts — @ctcc507, aka Dream News. Note how prominently the account’s screen name appeared in this list of top50 most used terms in the English retweets during the “surge” period:

Here’s a quick sample of the @ctcc507 tweets that the other troll accounts retweeted:

RT @ctcc507: Governing Hong Kong by law is the core value of Hong Kong. We don’t allow anyone to run roughshod over the law. https://t.co/P…',
       'RT @ctcc507: The legislative council belongs to the people of Hong Kong.Those people with ulterior motives indicated by forces hide behind…',
       'RT @ctcc507: The legislative council belongs to the people of Hong Kong.Those people with ulterior motives indicated by forces hide behind…',
       'RT @ctcc507: Governing Hong Kong by law is the core value of Hong Kong. We don’t allow anyone to run roughshod over the law. https://t.co/P…',
       'RT @ctcc507: Governing Hong Kong by law is the core value of Hong Kong. We don’t allow anyone to run roughshod over the law. https://t.co/P…'],

@ctcc507 was retweeted 465 times in this small subset alone. The sample above leaves no doubt as to the messages the troll accounts were trying to push. Elsewhere in the retweets, we see the trolls continue to promote the conspiracy theory about foreign intervention in the HK protests (several variations of ‘ulterior motives’ ranked near the top).

To deal with the far larger body of text in the Chinese tweets, I focused only on the tweets in July. The Chinese tweets that month was however the largest chunk by far during the surge period, and consisted of about 35,000 tweets. And like in the case of the English troll content, retweets made up the bulk of Chinese language troll tweets.

Top terms in Chinese “original” tweets:

反 對       314
社 會       277
對 派       248
反 對 派     248
瘟 龟       215
行 為       191
瘟 鬼       183
破 壞       178
香港 嘅      160
穩 定       158
香港 法治     151
佢 哋       148
严惩 暴徒     131
香港 警察     126
遊 行       120
嚴 懲       117
一定 要      116
應 該       114
维护 香港     108
勢 力       105
嘅 行       104
香港 和平     103
中 國       102

Top terms in Chinese retweets:

逃犯 条例            8182
香港 逃犯            7951
香港 逃犯 条例        7299
逃犯 条例 游行        6132
条例 游行            6132
香港 逃犯 条例 游行    5695
護 香港              4751
遊 行                4535
社 會                4257
反 對                4131
游行 暴徒             3885
守 護                3738
逃犯 条例 游行 暴徒    3354
条例 游行 暴徒        3354
反 對 派             3153
對 派                3153
守 護 香港            3011

What I’ve noticed from the term frequency counts above and a weekend of looking through the tweets is the higher usage of terms condemning the protestors who turned violent. The term 暴徒 (violent thugs) and the call to punish the violent protestors (严惩暴徒) came up more prominently in the second tranche of tweets, compared to the first.

But a fuller content analysis is needed to confirm if this indeed reflected a change in “narrative” focus in the disinformation campaign.

SCATTERTEXT PLOTS

I had previously used Scattertext plots to visualise the first tranche of Chinese state troll tweets. The visual value of such plots isn’t that high a second time round, I feel, but I kept the plots in the notebook for the useful search function at the bottom left corner of the plots.

If you download the html files, you can use use the search box to easily find tweets and retweets associated with the desired search terms, such as “ulterior motives”. This could be valuable for future research.

However, when you attempt to create Scattertext plots from a large corpora of Chinese text, the resulting files are huge and practically un-useable on consumer-level computers.

So for practical reasons, I opted to focus just on just 2 top trolls in the second tranche of state troll tweets — @bindarsou and @hk_observer. Both were retweeted regularly by the less active troll accounts, and account for about 45,000 tweets between them. @bindarsou is by far the larger of the two accounts.

A quick re-cap on interpreting the Scattertext plot:

Colour

The words in the chart are colored by their association. Those in blue are more associated with original tweets, while those in red are more associated with retweets. Each dot corresponds to a word or phrase mentioned.

Positioning:

Words that appear frequently in both tweets and rewteets, like “香港”(Hong Kong) and “暴力”(violence), appear in the upper-right-hand corner, reflecting Beijing’s main line of criticisms against the protests.
Words nearer the top of the plot represent the most frequently used words in the “original” tweets.

Key Areas:

Upper-left corner: These words appear frequently in the tweets but not the retweets.
Lower-right corner: Likewise, words which appear frequently in retweets but not the tweets appear in the lower right corner.

Well, this is about as much time a “citizen data analyst” like me can devote to this analysis. As always, if you spot any errors, ping me @

Twitter: @chinhon

LinkedIn: www.linkedin.com/in/chuachinhon

My earlier project on state disinformation campaigns on Twitter: https://github.com/chuachinhon/twitter_state_trolls_cch