The world’s leading publication for data science, AI, and ML professionals.

DisInfoVis: How to Understand Networks of Disinformation Through Visualization

Data visualization has great potential for communicating complexity with clarity, but this requires iteration, empathy and tailoring to…

Image by Author.
Image by Author.

Network visualizations of disinformation operations have the power to enlighten. When effectively designed they can express the nuanced strategies of those spreading disinformation. These dense forms of data visualization benefit from informed design choices throughout their development, and evaluations that improve future iterations.

Last year, I graduated from my MSc in Data Science at City, University of London. This exceptional one-year intensive course taught me the fundamentals of Machine Learning, Neural Computation, Computer Vision, and Visual Analytics. For my three month dissertation project, I chose network analysis as a domain in which to conduct my research. Particularly, I focused on the design and evaluation of network visualizations that evolve over time, applied to the field of disinformation.

Since graduating, I’ve continued studying temporal networks and their applications in disinformation research. In early 2020 I published a Medium article titled "Watch six decade-long disinformation operations unfold in six minutes" and it went a little bit viral. I’m currently studying disinformation aimed at France with Institut Montaigne, and studying gendered and sexualized disinformation through network analysis with the Wilson Center. All my research has had the common thread of using network visualizations to understand how the tactics of people who want to influence others online evolve over time.

Today I’d like to step back, and tell you the story of how the design choices of my temporal network visualizations themselves evolved throughout the completion of my MSc dissertation. This work was generously supported by the Data Science Institute at City, University of London, and guided by my academic advisor at City, Professor Jason Dykes of the giCentre, one of the World’s leading visualization research groups. We presented this dissertation research as a poster at EuroVis 2020 under the appropriate title of DisInfoVis.

If you’re into data visualization design, hopefully you find my approach interesting. If you’re interested in disinformation networks, hopefully you’ll find the visualization informative and the visualization approach useful (as documented in the various other research reports). If you’re considering studying data science at a MSc level, you can use this article as an example of what an MSc dissertation can look like.

On to my dissertation, which was titled…

Evaluating Temporal Network Visualization Representations of Information Operations

The key moving parts here are that in this study, I was evaluating two different representations (static slideshows and dynamic videos) of three temporal network visualizations. The datasets I used were of Chinese, Russian, and Venezuelan information operations that Twitter has been identifying and releasing since 2018.

The goal of this dissertation was to learn which representation type disinformation practitioners found most useful, so that the designs of future temporal network visualizations could be tailored to their needs. I achieved this goal by answering two related research questions – the first in data science, with the second a social science research question:

Research Question 1: What are the benefits and challenges associated with static and dynamic representations of temporal network visualizations?

Research Question 2: What are the unique and shared aspects between Russian, Chinese, and Venezuelan state-backed information operations on Twitter?

These two questions were the result of months of thinking, editing, reading, and pivoting. The overall curiosity was around whether or not temporal network visualization could help establish the extent to which countries like Russia, China, and Venezuela have distinct online ‘fingerprints’ to their campaigns. The choice to create a dynamic and static visualization developed over time as I thought about how I could best contribute to domain knowledge about online disinformation through the method of temporal network analysis.

Example of Venezuelan information operation evolving over time. Image by Author.
Example of Venezuelan information operation evolving over time. Image by Author.

The Methods

My project consisted of two stages of methods. First, I created the temporal network visualizations of Russian, Chinese, and Venezuelan information operations on Twitter. This was informed by my interpretation of some established best practice in visualization design. Then, I evaluated the networks by showing them to disinformation practitioners during structured interviews. In the interviews, participants described how they interacted with the networks, and what they saw in them.

The data for the network visualizations themselves came from tweets posted by inauthentic accounts (sometimes known as bots, cyborgs, or trolls) which were "potentially part of state-backed information operations" on Twitter. Since 2018, Twitter has released datasets of user profiles and tweets that they believed to be part of broader networks of state-backed actors who were engaging in "inauthentic behavior" on their platform. The data were released so "researchers can investigate, learn, and build media literacy capacities for the future".

Twitter has released data on 16 countries (and counting), but for this study I chose to only analyze the English language tweets from the Chinese, Venezuelan, and Russian information operations, which all contained a substantial amount of English-language content. This decision was made in order to capture foreign interference in English-language online conversations within a three-month project timeframe, and was a subjective choice that framed my entire study and its findings. In subsequent research, I expanded to visualizing all languages of six countries.

To further hone in on the data, I chose to only visualize the hashtags that inauthentic accounts used. In other words: all I needed for my visualizations was the name of the inauthentic account (the source) and all the hashtags they used (the target). These sources and targets (which always remained separated by country) were connected in the network visualizations. Those connecting lines (edges) were colour coded by the year the inauthentic accounts were created in order to retain more context of the visualizations.

Simplified graphic displaying sources (inauthentic accounts) connected to targets (hashtags). Image by Author.
Simplified graphic displaying sources (inauthentic accounts) connected to targets (hashtags). Image by Author.

In order to make the temporal network visualizations more understandable to my participants (disinformation practitioners), I created this "Introduction to Network Visualizations" video (and also a slideshow of the same information) for my participants to view:

After breaking down the English-language Venezuelan, Russian, and Chinese inauthentic account to hashtag relationships, they were each separately uploaded to Gephi. Gephi is an open source network visualization software that can break down visualizations by their temporal elements. For each country, a random subset of 300,000 tweets was visualized due to memory restrictions in Gephi.

There are many different ways to arrange the nodes in a network visualization (called layout algorithms). I chose to use ForceAtlas2, because of the way it attracts closely-related nodes, repels non-related nodes, and compactly lays out ‘bursts’ in the networks.

After I uploaded a country dataset into Gephi, I ran the ForceAtlas2 layout algorithm, colour coded by account creation year, and finally converted the network to a temporal network visualization in order to see the times that different regions of the network were active. In doing so, a toggle bar appeared at the bottom of the software window, allowing me to single out connections between inauthentic users and hashtags that occurred during certain time periods. This affordance of Gephi was an important reason as to why the software was used in this project. Below you can see the entire Chinese network visualization (left), and a one-year slice of the network (right):

Entire Chinese network visualization (left), and 2017 hashtag use in same network visualization (right) displayed in Gephi software. The image on the right is a 'time slice' of the network which made up a slide in the static slideshow. Image by Author.
Entire Chinese network visualization (left), and 2017 hashtag use in same network visualization (right) displayed in Gephi software. The image on the right is a ‘time slice’ of the network which made up a slide in the static slideshow. Image by Author.

Creating the Static Representation

The slideshows first showed the entire network, and then 5–8 time slices of the same network. Here is the final slideshow for the Chinese dataset:

Creating the Dynamic Representation

In order to have two comparable representations for this study, I also recorded the network evolving in Gephi, and created voice-over explanations that were the exact same as the text in the slideshows. The result was this video on YouTube for the Chinese visualization:

Evaluating the Temporal Network Visualization Representations

Given the need for domain expertise a narrow and rich qualitative study was conducted in line with Sheelagh Carpendale’s guidance on information visualization evaluation under such circumstances:

"Running evaluations with full datasets, domain specific tasks, and domain experts as participants will help develop much more concrete and realistic evidence of the effectiveness of a given information visualization."

A small number of domain experts who are working to understand, combat, or inform the public about online disinformation were recruited via email invitations delivered through a snowball sampling method. This resulted in 5 sessions in which participants undertook training, explored the visualizations and then provided qualitative feedback through structured interviews.

The data were analysed by transcribing responses and conducting a thematic analysis (as described by Fereday and Muir-Cochrane, 2006) that boiled down participant responses to the benefits and challenges they faced while interacting with the visualizations, their recommendations for future visualizations, and their analyses of the networks themselves.

By asking domain experts about their impressions of and experiences with the visualizations, I was able to glean findings that were informed by knowledge of disinformation networks and centered around possible end-users of the visualizations. This seemed apt for a data science dissertation as the final stage of any data analysis process (the communication to the stakeholder who needs the analysis) is dependent on the valid and effective use of Data Visualization.

The Results

The results of this study can be broken down according to which research question they are answering. For the first research question, I wanted answers on the benefits and challenges my participants experienced to inform future network visualization design choices in the context of Disinformation research. For the second research question, I was hoping to get into the heads of my participants and understand what they actually saw in the networks.

Research Question 1: What are the benefits and challenges associated with static and dynamic representations of temporal network visualizations?

This Research question was approached in the interview by asking participants about which representation felt easier to observe or gave them enough information, and which parts of each representation were more helpful when trying to understand what they were looking at. The varying responses from each question were tallied according to the number of mentions and organized into tables.

The Findings: I found that participants preferred dynamic representations, or videos, of complex temporal network visualizations of information operations. This was largely due to the fact that every participant felt that the videos gave them a better understanding of the evolution of the network over time, commenting that the videos made it much easier to understand "the huge difference between various periods of activity". Although participants enjoyed moving through the static slideshows at their own pace, four out of five participants preferred either the dynamic, or a combination of the dynamic and static representations, for analyzing the networks or explaining them to others.

The Next Steps: These findings allow us to develop recommendations for future designs. We suggest that an interactive dynamic representation with pauses between key moments and greater contextual information be used. We have produced a wire-frame of such a tool – a concept design that demonstrates our recommendations… through visualization.

Proposed wire-frame for a temporal network visualization that has both static and dynamic elements (Presented at EuroVis 2020). Image by Author.
Proposed wire-frame for a temporal network visualization that has both static and dynamic elements (Presented at EuroVis 2020). Image by Author.

A key takeaway I’ve learned about visualization design from this study is one that I was also taught while studying professional communication: tailor your message to your medium and audience. The beauty of using visualization as a medium is that it can be interactive, informative, and limitless with design options. In my research for Mozilla (which I conducted after completing this study), I decided against implementing some of the more detailed recommendations above in favour of a simpler format that I felt was more fitting for a Medium article, because my goal was to explain the networks and inform the public, rather than give them explorable or investigatable visualizations. In future iterations where the purpose of the visualizations is to allow exploration and investigation, a more sophisticated visualization design is likely to be appropriate.

Research Question 2: What are the unique and shared aspects between Russian, Chinese, and Venezuelan state-backed information operations on Twitter?

This research question was approached in the interview by asking participants which aspects of the networks were unique to each of them, or shared between multiple networks. The unique and shared aspects were aggregated and analyzed according to whether they required an understanding of network structure, hashtag content, and temporal components. A visualization of the findings can be seen in the figure below.

The Findings: By analyzing temporal network visualizations, participants noted that information operations are cyclical, dynamic, and evolve over time. They also found that some operations appeared to be more organized and political than others, and that the Russian activity was uniquely exhibiting polarization in its network structure. Both Russia and China were found to be using popularity-increasing hashtags such as #followme, and resurrecting old accounts from as far back as 2009 in 2016 and 2019. These behaviours suggest that actions are part of a long-term strategy. Venezuela, on the other hand, was not found to exhibit either behaviour, and may have had a less well-developed strategy on its use of account personas.

In the figure below, the key aspects of the information operations that participants noted are outlined. Some were unique to individual countries, and some were shared between two, or all three countries.

Similarities and differences between Russian, Venezuelan, and Chinese information operations on Twitter. Results are ordered by the number of participants who made the claim ("p"), and the number of times it was mentioned overall ("x"). Image by Author.
Similarities and differences between Russian, Venezuelan, and Chinese information operations on Twitter. Results are ordered by the number of participants who made the claim ("p"), and the number of times it was mentioned overall ("x"). Image by Author.

The red letters on the right of each aspect indicate the information attributes required to identify them. They were: the content ("C") of the hashtags, the structure ("S") of interactions, and the temporal ("T") evolution of the operation. These three information attributes were developed after the aspects were derived in the thematic analysis. They were identified according to which types of data would have been required to identify each aspect. Aspects that use two or three information attributes can be considered to be more robust, such as the finding that all information operation tactics evolved over time. Aspects such as hashtag content and temporal evolution can be visualized using different methods, such as word frequency tables or bar charts. The robustness of content and temporal findings alongside network structure findings may be explored in further research.

So, what’s next?

The experience of working on this MSc dissertation has taught me the value of informing design choices through evaluation and iteration. It also solidified the benefits of tailoring data visualization design to a specific domain, such as online disinformation.

As I move on to pursue a DPhil in Social Data Science at the University of Oxford studying networks of disinformation, I will carry with me the design values I have absorbed through this project, and through the completion of the EuroVis poster. By moving forward cautiously and systematically with this kind of study, we can establish further insights into disinformation, network visualization, and the beneficial relationships between the two.


Related Articles