NetworkX: Code Demo for Manipulating Subgraphs

Jinhang Jiang
Towards Data Science
4 min readJun 6, 2021

--

Image by author

Introduction

NetworkX is a Python library for studying graphs and networks. This is a code demo to show how we used NetworkX to conduct subgraphs comparison and manipulated the parameters for drawing the graphs. If you are not familiar with NetworkX before, it will give you an idea of how to convert a weighted edge-list to a NetworkX graph and what you can do if you want to study a specific node in a super complex network graph, especially with weights attribution provided.

Data

We are using an Electronic Health Records (EHR) dataset, which included the diagnoses for patients admitted to hospitals from the second half of 2019 to the second half of 2020 in Arizona. The dataset contains near 9000 unique diagnoses. Each diagnosis was labeled with an icd10 code. Each patient might have more than one diagnosis during the time they stayed in the hospital. By studying the EHR, we could catch the co-occurrence relationship between the diseases or diagnoses.

ICD10 (https://icdcodelookup.com/icd-10/codes) is the 10th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD), a medical classification list by the World Health Organization (WHO), indicating diseases, signs and symptoms, abnormal findings, complaints, social circumstances, and external causes of injury or diseases.

We manually split the dataset into three timestamps: the second half of 2019, the first half of 2020, and the second half of 2020 to study how the network for the diseases changed during the different stages of the pandemic. To implement NetworkX with the data, we first converted the data into a weighted edge list with three columns: Source, Target, and Weights.

Figure 1: Sample Data

Code

Convert the edge list to a NetworkX graph:

Figure 2: Sample outputs of convert_graph( )

Draw the whole graph with the edge weights

Here is a sample output:

Figure 3: Sample graph

As we can see from Figure 3, the graph is very complex and hard to read due to a large number of nodes and edges of the graph. If your network is small and sample, it should return an obvious view by now.

Study a certain node in the graph

So, if we want to especially look into what is going on with Coronavirus (u071 was created to represent COVID-19) in the first half of 2020, what we need to do is to type:

drawnodegraph(graph2,”u071",weightbar=0.1)

Figure 4: The disease network for u071

Usage Example (time-evolving graph comparison):

Let’s say we want to study how the disease network for the “Permanent atrial fibrillation” (i4821) changed across the three different timestamps regarding the pandemic. By using the code above, we can see this:

Figure 5: Network for i4821 before the pandemic
Figure 6: Network for i4821 at the beginning of the pandemic
Figure 7: Network for i4821 during the outbreak of the pandemic

It is evident that, before the pandemic, only a few diseases were significantly associated with permanent atrial fibrillation (i4821). However, as time went on in 2020, the network became more and more complex.

Conclusion

In this code demo, we showed you how to use the NetworkX to manipulate the subgraph. You can use the weights of the edges to change the width of the edges in the graph. And of course, you also can make other transformations based on that, for example: use the weights to change the size of the nodes, etc. In the end, we also presented you one of the usages with this code — conducting time-evolving graphs comparison.

Please feel free to connect with me on LinkedIn.

Related Reading:

Analyzing Disease Co-occurrence Using NetworkX, Gephi, and Node2Vec

I completed the blog under the guidance of Dr. Karthik Srinivasan, Assistant professor — Business Analytics, School of Business, University of Kansas.

--

--