
Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.
First, I want to say that it seems to me of enormous importance that online media gives the opportunity to present to the world the efforts being made by the scientific-technological communities in times as urgent as those we currently live in. It is necessary to disseminate the information that is being generated to a speed that could compete with the speed of spread of the pandemic caused by the SARS COV2 virus causing the disease known as Covid-19. The virus has become a personal enemy of scientists and in many cases it is an unknown enemy for which you have to invent a way to combat it since we don’t really know how (if you are not an epidemiologist or something related). Of course, the best tool is to maintain physical distance and hygiene measures that have been said and repeated so much. But as scientists we have tools that can be used in one way or another in the fight against the epidemic and we must try to make our contributions however small they may be. That is what I believe and that is why I present this small work that is more an invitation to review the usefulness of the so-called Graph Convolutional Neural Networks (GCNN’s) in the study of the behavior of the COVID-19. But I must clarify that the results that I present should not be taken as predictions, since it is a very preliminary scheme and surely other factors than those used here will have to be taken into account. That said, let’s get to work !!
- Install complementary libraries.
A relatively simple way to enter the world of Gcnn‘s is using the DGL library, which apart from having different implementations, also has good documentation ( I encourage you to review the tutorial for the problem of "Zachary’s karate club" [1] that I refer to for our case.), so I will build on this library in what follows. I am very engaged with Google Colaboratory or "Colab" as it allows to rapidly test prototype projects in Python which are extremely portable, so the code I present works for a Colab Notebook (See at the end of this post for a link to the Notebook).
First we need to install dgl with pip, and we will use pandas, and numpy also, so we proceed to import them,
Optionally we can also install chart_studio and call Plotly to make interactive charts as I showed in the previous post [2].
- Build the Graph.
A graph is made up of a set of nodes and their edges that define connectivity. If it were a molecule for example, the nodes would be the atoms that compose it while the bonds between them would be the edges [3]. However, if we want to transfer this idea to a geographic map, we must decide what are nodes and what are edges. One way is to take the nodes as the states of a country, but the edges can be several things. Here, I have decided to define the edges as the common borders between two given states. To do this, I have taken the map of the Mexican Republic with 31 states + Mexico City and I have listed the connections between each of them. The following figure shows the map of Mexico with the numbering taken in this work:

At this point, I wish there was an adjacency matrix associated with each country, which would make the process I just described easier. Anyway, the connections that I determined, ultimately form that adjacency matrix encoded as the source points (src) and the destination points (dst). This can be delivered to DGL by giving the source and destination points ensuring that the information is bidirectional. This is accomplished with the following function:
The graph G with this definition consists of 32 nodes and 264 edges and we can visualize it using networkx:
Giving this graph:

The resulting graph can be identified visually, and (with a little imagination) with the map of Mexico, noticing that nodes 1 and 2 form the Baja California peninsula, while nodes 30, 22 and 3 form the Yucatan peninsula. So for now it seems everything is going well.
- Featurization.
To use GCNN we have to featurize the nodes. For that, I am going to use a one-hot encoding scheme.
The one-hot vector is multiplied by the population density taken from [4], the following figure shows the population density for each of the states for 2015.

In order to make a semi-supervised approach, I take two nodes (states) for each of three levels (Low, Medium, High) following the official information in [5] . This information is stored in two tensors, _labelednodes to identify the states, and labels to indicate the three levels of incidence (defined according to the situation today, April 17, 2020), being "0" for states with a low level, "1" for those with a medium level and "2" for those with a high level:
Here is the correspondence between the names of the states, their number and the associated level:
3 → CAMPECHE (0)
17 → NAYARIT (0)
13 → JALISCO (1)
18 → NUEVO LEÓN (1)
4 → CIUDAD DE MÉXICO (2)
1 → BAJA CALIFORNIA (2)
These nodes are the ones that the GCNN learns about and will try to predict the levels that correspond to the rest of the unlabeled nodes.
- Graph Convolutional Neural Network.
According to Kipf [6] a multi-layer GCNN learns from a graph 𝒢=(𝒱,ℰ) taking as input 1) an input N× _D fe_ature matrix X, _w_here N _i_s the number of nodes and D _is the number of features per node; and 2) a matrix representation of the graph structure like the Adjacency matrix A. ****_ The general form of a GCNN has the following propagation rule for a given neural network layer:

where f is a non-linear function, H⁰=X, and after a number of l=L layers, we get an output Z feature matrix of N×F dimension, where F is the number of features per node. More details on GCNN can be found at greater length in [3,6]. In our case, it is enough to keep in mind that we want to input a feature vector of N = 32 and to obtain three categories: Low, Medium, and High, thus F=3. Below is the code for the GCNN structure:
- Training.
The last piece of code is the training loop, where we use Adam optimizer as implemented in Pytorch with learning rate of 0.01. The GCNN is trained with the input labels for the six labeled nodes, and at each iteration log_softmax and loss functions are applied as well as back propagation just like in any neural network:
- Visualization.
We can make a DataFrame with the _alllogits array for the last epoch. Then for each node we get the index of the maximum of the three labels. That will indicate the prediction for each node.
I have found that Flourish (https://flourish.studio/) allows to graph maps in a very simple way and with very good results, so this time I have chosen to use this tool. The states with blue represent those with low level (0), the red ones for medium level (1) and yellow for high level (2):
What’s Next? 1) Identify more representative descriptors for the spread of COVID-19. For instance, it could be interesting to use the incidence rate per 100,000 inhabitants that is an indicator of how effectively the disease spreads or it could also be the mobility in transport stations in the municipalities of each state. 2) Define the graph not for the borders between states, but for the roads that connect the cities and give some weight to the influx. 3) Review the predictions for other countries where the maximum contagion point has already been passed, such as China or Italy.
Conclusion: Taking as a link the borders between the states, it is assumed that there is a communication between the nodes, which may represent the fact that the flow of people between different states is not being canceled. Clearly by stopping the migration of people, the links are broken and that would have to modify the prediction of a neural network like the one presented here. Probably the result would be a minor effect due to a state with high rates on its neighbors.
Acknowledgement: I want to thank PhD. Jaime Raúl Adame Gallegos from the Autonomous University of Chihuahua for his kind review and suggestions in the writing of this document.
The code shown above can be opened using Google Colaboratory in this link: https://github.com/napoles-uach/covid19mx/blob/master/dgl_map.ipynb
References:
- https://docs.dgl.ai/en/0.4.x/tutorials/basics/1_first.html
- https://medium.com/@jnapoles/tracking-covid19-with-pandas-and-plotly-open-source-graphing-library-maps-d24baf31aac4
- https://towardsdatascience.com/practical-graph-neural-networks-for-molecular-machine-learning-5e6dee7dc003
- https://www.inegi.org.mx/app/tabulados/interactivos/?px=Poblacion_07&bd=Poblacion
- https://covid19.sinave.gob.mx/
- http://tkipf.github.io/graph-convolutional-networks/