Animate Dynamic Graphs with Gephi

Data visualization tutorial on animating time-dynamic behaviour in social network graphs.

Haaya Naushan
Towards Data Science

--

Gephi visualization of retweeting behaviour of Twitter influencers over time. Image by Author.

When it comes to analyzing social networks, my previous articles have primarily been about natural language processing (NLP), or more specifically Arabic NLP. Tweets, however, are more than just text data, they represent network connections between Twitter users. Adding on network analysis, allows for a synthesis between the content and actions of social media data; therefore, combining network and text data creates a far more nuanced understanding of a social media network.

My Python-learning journey began out of necessity, my goal was to animate a Twitter network graph and coding appeared to be the solution. Hence, my first-ever script was a tearful fight with pandas, all in an effort to create a simple csv that would be accepted by Gephi, a popular open-source graph visualization software. Months later, reading my earliest code was a reminder of my perseverance, and improving that first script was a lesson in humility.

Recently, I used that improved script while working on a Lebanon investigation in collaboration with researchers at the World Bank. Guided by regional knowledge and expertise, I was analyzing a specific Twitter hashtag “#لبنان_ينتفض”. The data had been collected over a long period of time; tracking started with the first protests of the 2019 “October Revolution”, collection continued through the August, 2020 Beirut explosion, and ended in November, 2020.

As part of the investigation, I was observing the Twitter influencer retweet network, and I had been advised to focus on selected periods of interest, so my initial approach was to create static snapshots. The temporal nature of the data, however, motivated me to visualize the time-dynamics of the retweeting behaviour. Thanks to my first-ever script, I was able to create several animated graphs like the example above, using only the most basic Python and Gephi. Following that, I made my animations of the time-dynamic Twitter network accessible by creating screen-capture gifs using CloudApp.

In this short data visualization tutorial, I will outline the steps and minimal code necessary to create animated graphs of network data. I will be using Twitter for my example, but the same process can be used for other social media networks. Since this tutorial focuses on data visualization, I will skip an explanation of the data collection process. Instead, I have shared a gist with my beginner Twitter scraping script, which is sufficient to collect data for the purposes of this tutorial.

Gephi has several options for loading network data from a database or as graph file types such as .graphml or .gexf. For dynamic graphs, however, the simplest option is to load data into Gephi from correctly labeled and formatted spreadsheets. In network graph terminology, “nodes” represent individual Twitter users and “edges” represent the retweet connections between users. I start with nodes and edges csv files, created with networkx in Python from raw unprocessed Twitter data. This excellent Medium post explains how to get started with visualizing a Twitter network, including how to create nodes and edges with networkx.

The code snippet below shows how to load the Twitter data from the nodes and edges csv files, so that they can be properly labeled and formatted in Python.

Python code snippet to load nodes and edges csv files.

Once the raw Twitter data has been loaded, I first process the nodes, followed by the edges, and saving both as csv files.

Gephi requires a nodes spreadsheet with the first column specifically named “Id” containing the Twitter user ids, the second should be “Label” and contain the Twitter user screen names. All other columns represent node attributes and are optional. In code example below, I include a column for the Louvain cluster (as determined by the Louvain community detection algorithm, implemented in networkx) and the Twitter user follower count. The final step for processing nodes is to save the nodes dataframe as a csv, so that later it can be imported it into Gephi.

Python code snippet to process nodes for Gephi.

Next, for the edges spreadsheet, similarly to the nodes, Gephi expects specifically labeled and ordered columns during import. The first two required columns are “Source” and “Target”, representing the Twitter user pair engaged in retweeting. The third column should be “Type”, which for this Twitter example is “directed” since we are dealing with retweets. The fourth column should be “Label”, which in this case is a simple index. The fifth column is the most important, it should be named “Timeset” and contain the creation time of the retweet — specifically in iso format. The “Timeset” column is the time variable and will be used to animate the network graph in Gephi. The last column “Weight” is optional, Gephi assumes this to be “1” by default. Finally, the edges dataframe can be saved as a csv for import into Gephi.

Python code snippet to process edges for Gephi.

Now that the nodes and edges spreadsheets have been formatted and labeled, they are ready to import into Gephi with the “Import spreadsheet” menu option. Starting with nodes, the screenshot below shows the first import menu, where the separator should be “Comma”, the import option should be “Nodes table” and the encoding should be “UTF-8”.

Gephi spreadsheet import screen for nodes. Image by Author.

The next step is selecting the columns to import and assigning data types for the optional attributes columns. Make sure to select “Timestamps” in the drop down menu for Time representation, this is important for later when we import the time stamped edges.

The nodes import process finishes by adding the nodes to a new Gephi workspace.

Next, is adding the edges from the edges spreadsheet. The edges import process seen in the screenshot below is similar to the nodes import process; the only difference is selecting the import option of “Edges table”.

Gephi spreadsheet import screen for edges. Image by Author.

The following screenshot shows the second import screen where we specify the time representation as “Timestamps” and assign the data type of the “Timeset” column as “TimestampSet”.

The final step of the edges import process is to append the edges to the existing workspace created when we imported the nodes. Within Gephi’s data laboratory the edges table should be visible as seen in the screenshot below, the “Timestamp” will appear in iso format.

Gephi screenshot of edges table. Image by Author.

As is my usual procedure, I worked on the imported network graph by applying a force directed algorithm to layout the nodes and picked which attributes to use for colouring and sizing the nodes. In the “Overview” window seen below the designed network graph is shown with a wide bar underneath displaying the option of “Enable timeline”.

Gephi screenshot of Twitter retweet network graph. Image by Author.

Simply select “Enable Timeline” which reveals a ticked numeric time bar as seen below. The settings wheel in the bottom left corner allows for setting a time format, from which I select “Datetime”.

Gephi screenshot of adjusting the time format settings. Image by Author.

All that is left is to use the cursor to select the size of the time interval window, and press play.

Use the cursor to adjust the size of the time window and drag to the desired start position. Press play to start the animation.

As I previously mentioned, I used CloudApp to create screen-capture gifs of the animated networks, that I then shared with my co-authors. That’s it! With a little Python code, it is easy to modify network data so that it can be animated in Gephi.

I hope that this tutorial was helpful, if it was, consider leaving a comment below so I know there is an interest in this topic and/or this style of post. All questions and comments are welcome, feel free to connect with me on Linkedin.

--

--

Data Scientist enthusiastic about machine learning, social justice, video games and philosophy.