This is a quick dive into the trove of Chinese state troll tweets released by Twitter on Aug 19. More to come in the coming days and weeks.

On August 19, Twitter dropped a new trove of state troll tweets that the company said were from "a significant state-backed information operation focused on the situation in Hong Kong, specifically the protest movement and their calls for political change."
The tweets deserve a deeper examination, and clearly more can be done with the material. I had previously worked on a project on uncovering such disinformation campaigns on Twitter.
Due to time constraints, here’s a quick and dirty first exploratory look at the data. More to come in the coming days and weeks.
1. DATA, NOTEBOOK AND ASSUMPTIONS
My rough notebook is here, and the repo will be updated as I find more time to work on this project.
The CSV files are too huge to be uploaded on Github. Download them directly from Twitter instead.
To contain the complexity of the project at this stage, I filtered out the retweets, which is an interesting area deserving a separate look. I also focused only on the English and Chinese-language tweets. The tweets in this dataset came in 59 languages, believe it or not.
2. OVERALL LOOK AT THE CHINESE STATE TROLL TWEETS
Twitter said the tweets it released came from "936 accounts originating from within the People’s Republic of China (PRC). Overall, these accounts were deliberately and specifically attempting to sow political discord in Hong Kong, including undermining the legitimacy and political positions of the protest movement on the ground".
The accounts, already suspended, "represent the most active portions of this campaign; a larger, spammy network of approximately 200,000 accounts", Twitter added in its press release.
Here are the key figures I found from a quick overview:
Unique userids: 890
Unique user display names: 883
Unique user screen names: 890
Unique user reported locations: 178
Unique user creation dates: 427
Unique account languages: 9
Unique tweet languages: 59
Unique tweet text: 3236991
Unique tweet time: 1412732
Unique hashtags: 110957
The number of troll tweets were whittled down from the initial 3.6 million down to 581,070 after I filtered them out for RTs and language. Too aggressive? Perhaps, but that’s still a lot to work with.
3. QUINTESSENTIAL CHINESE STATE TWEETS
First, let’s have a quick look at what these Chinese state tweets targeting Hong Kong look like, both in English and Chinese:



The phrasing of some tweets would be immediately familiar to those who follow official Chinese rhetoric and its state-driven Internet commentary. Phrases like 外國勢力(foreign forces) are not commonly used elsewhere, if at all.
4. KEY CHINESE TROLL ACCOUNTS
These are the top 10 troll accounts in my filtered dataset:
曲剑明 54455
阿丽木琴 46600
Klausv 34451
春天里 17989
gwalcki4 17707
emiliya naum 16230
derrickmc 16163
Lily Mann 14673
炫彩 14604
mauricerowleyx 11749

It is far more interesting to look at specific accounts, in my view. Let’s start with the one at the top of the list: qujianming, or 曲剑明.

This troll account sent out 19,614 unique tweets in my filtered dataset. The account was created on June 28 2012, and has 28,405 followers and follows 24079 users – a suspiciously high number and a major tell-tale sign of a troll account.
The account tweeted 77 times in 2019. Some samples:
- 2019–07–05: ‘香港作为一个法治社会的典范,这种暴徒行径一定要严惩,换民众一个公道’
- 2019–07–04: ‘反对势力想尽办法想搞乱香港,想等混乱的时候谋取利益,他们根本就无暇顾及市民的利益,可怜的是还…’
- 019–07–02: ‘#HongKongProtest #香港 #七一 #游行 #民阵 7月1日反对派宣扬暴力冲击…’
Twitter also highlighted two accounts in its press release, one of which is HKpoliticalnew, or HK時政直擊:

This troll account sent out just 1,059 unique tweets in my filtered dataset. The account was created on January 22 2015, and has 22,551 followers and follows 66 users. The move to co-opt the identity of a news outfit is straight out of Russia’s Internet Research Agency’s playbook.
The account sent 462 "original tweets" in 2019, many of them aimed at the HK protests. Some samples:
- 2019–06–21: ‘千名黑衣人圍堵立會,當中不乏重裝上陣的人,警方一定要嚴防,大家小心!👍💪 nn#香港 #大專學界 #升級行動 #包圍立會 #重裝上陣 #警察 n原圖:星島日報 https://t.co/DIoiFWWkBo’
- 2019–06–20: ‘#香港 反對派一再散布失實資訊抹黑修例、誤導市民,更不斷造謠煽動情緒,企圖以網民壓力杯葛支持修例甚至只是沉默的商戶。n政府決定停止修例工作後,反對派又繼續播謠,稱換領智能身份證會「失去選民資格」,企圖撕裂市民對社會不同界別、團體、組織的信任。 https://t.co/9B3xCI9MWv’
- 2019–06–20: ‘反修例暴動後,#香港 仿佛又回到占中時期嘅黑暗日子,最令人痛心嘅系,執法者又再次成為黃營發泄嘅對象,黑警,「警你老x」等言詞唔絕於意,立志以生命守護人民,卻落得過街老鼠嘅下場…nn立法會議員何君堯律師呼吁各位市民站出嚟山席撐警集會,為 #警察 打氣,讓佢哋知,黃營所言非香港嘅主流聲音! https://t.co/yDpKt0mSAM’
5. SELF-PROCLAIMED LOCATION OF STATE TROLL ACCOUNTS
Like the Russian state trolls, the Chinese accounts primarily claimed to be in the US:

Twitter is blocked in China. But the company said "many of these accounts accessed Twitter using VPNs. However, some accounts accessed Twitter from specific unblocked IP addresses originating in mainland China".
6. TROLL ACCOUNTS CREATION PEAKED IN AUGUST 2017
As expected, the troll accounts were created far earlier – peaking in August 2017:

The number of tweets similarly peaked in 2017:

Temporal analysis of such data is tricky, however. For one, it is unclear when Twitter began taking action against the troll accounts. Second, it is unclear if the Chinese state agency has some capability to alter the account creation/tweet date in order to mask its activities. Sounds far-fetched, but such are the times.
END-NOTE, FOR NOW:
The analysis would, of course, benefit from extensive use of NLP tools. But an early look at the tweet text suggests that more filtering is needed to weed out the noise. The tweets in Chinese will have to be dealt with separately as well from the English ones. I’ll continue this in a future notebook.
This will have to suffice for now, in the interest of time. If you spot any errors, or have comments, ping me @
Twitter: @chinhon
LinkedIn: www.linkedin.com/in/chuachinhon
My earlier project on disinformation campaigns on Twitter: https://github.com/chuachinhon/twitter_state_trolls_cch