The world’s leading publication for data science, AI, and ML professionals.

The pandemic of Dataville: 5 insights that will surprise you

How data scientists use simulations to reveal unintuitive and interesting insights

Simulating pandemics across complex networks

Photo by Edwin Hooper on Unsplash
Photo by Edwin Hooper on Unsplash

Introduction

One of the more exciting and rewarding exercises (in my opinion) is exploration. It often brings out unintuitive insights that can "wow" and inform your audience – something you’ll definitely want to do as a data scientist.

In this article be exploring a pandemic in a fictional town using a simulation. Keep reading to reveal the not-so-obvious insights.

But before going any further, it’s important for the reader to remember that models are simplifications of reality loaded with assumptions.

"All models are wrong, but some are useful" – George Box


Welcome to Dataville

Dataville is a small town of 10,000 people. It’s a close knit community with relatively few degrees of separation between neighbours. Recently some of the residents have come down with a mysterious virus.

The virus appears to be passed from person to person through close contact. The people of Dataville are concerned and news outlets are warning of a potential pandemic.

Dataville’s top scientists analysed the virus and saw that it is resistant to existing antibodies; nobody has immunity. So far, 5% of Dataville’s residents have been infected.

The government of Dataville have hired data scientists to model the spread to help in containing the virus.


Tools

The simulation is built end to end in Python using a Jupyter notebook. I’ll link you to my GitHub page where you can grab the code and play around with it yourself (see end of article).

I’ve built the simulation on the Network Diffusion Library (NDlib) – A framework for simulating, describing and studying diffusion processes on complex networks. It’s very intuitive and easy to set-up. I would highly recommend it.


Modelling Methodology

Modelling methodology isn’t the most exciting thing to read or write about, but it’s important. I want you to challenge the model, see where you could improve it. Keep this in mind while you are reading.

I’ve made several simplifying assumptions about the virus and the network across which it is transmitted.

Social Network Model: I have used a mathematical graph to represent Dataville, it’s citizens and their relationships. A citizen of Dataville is represented by a node in the graph, and their connections/relationships to other citizens are represented by edges.

Image by Author: a basic mathematical graph
Image by Author: a basic mathematical graph

The spread of the virus is simulated on a complex network of 10,000 people modelled as a Barbassi-Albert graph. Barbassi-Albert graphs are a special mathematical graph that models preferential attachment.

Preferential attachment means that nodes in the network are proportionally more likely to have edges to nodes with more edges. This is a phenomenon frequently observed in social networks. Put simply, popular people have more friends but not many people are popular.

To visualise the network, here’s a distribution of the connections for each node. You may have noticed that it’s roughly a power law distribution.

Image by Author: distribution of node edges
Image by Author: distribution of node edges

Note: If you’re not familiar with graph theory here’s a nice and friendly article on graphs.

I’ve assumed that the network is static meaning that there are no changes to its topology. In actuality social networks are dynamic.

Finally, I’ll assume in the course of daily life, each person in the network has interactions with 10 other people at least.

In other words, Dataville is modelled as a small but busy town.

States and Behaviours: People can exist in one of three states: Susceptible, Infected or Deceased. The following state changes are permitted:

1.Susceptible -> Infected

2.Infected -> Susceptible

3.Infected -> Deceased

A node must have an edge connecting it to an infected node to trigger state change one; a sensible rule in that we know viruses spread via close contact.

Virus Assumptions: The virus does not mutate and become less or more infectious over time and people do not develop an immune response.

Initial condition: The initial rate of infections is 5%.

Mortality rate: There is a 2.86% chance of a person succumbing to the virus once infected.

Recovery rate: Assumed to be 1-Mortality rate.

Transmission rate: The probability of passing on the virus is 6.4% once infected.


Simulations

Now that all the assumptions have been laid out, we can have some fun simulating some scenarios for the virus.

Important – All simulations are run over 1000 iterations. This is an arbitrary unit of time and should not be assumed to be days, months, seconds etc.

Scenario 1: What if we let the virus run its course?

Many people in Dataville have argued that we should simply let the virus run its course. After all, the mortality rate is relatively low compared to say – Ebola.

They point out that the impact of social restrictions on the economy adversely impact mental health, physical health and will be more devastating than the virus.

Let’s simulate this and see what insights we can gather.

Scenario 1: Image by Author
Scenario 1: Image by Author

After 1000 iterations, the proportion of people infected with the virus reduces to zero. However, the virus has devastated our population with around 35% deceased.

So unrestricted spreading of a virus amongst a population where there is no immunity doesn’t appear to be an optimal solution, even when the risk of mortality appears to be low.

Scenario 2: What if the virus was as deadly as Ebola?

Dataville’s scientists have warned of a new strain of the virus with a mortality rate similar to the Ebola virus.

It’s estimated that the Ebola virus fatality rate is around 50% (WHO). Let’s adjust the mortality rate to 50% and re-run the simulation.

Scenario 2: Image by Author
Scenario 2: Image by Author

As you probably expected, this more deadly form kills off a significantly higher proportion of our population (Around 50%). Interestingly, the number of infected people flattens out to zero very quickly (after less than 50 iterations). Because of the high mortality rate, the virus isn’t able to spread as far. Don’t forget our simulation does not allow the deceased to pass the virus on.

This deadly virus would be devastating but short-lived likely wiping itself out before it could wipe out our population.

Scenario 3: What happens if the virus is more transmissible?

Dataville’s scientists speculate that the virus becomes more transmissible in summer because hot air currents carry it further.

Let’s model this by increasing the transmission rate to 20%. We will move the mortality rate back down to its baseline value.

Scenario 3: Image by Author
Scenario 3: Image by Author

The increased transmission rate causes a spike in cases earlier on then flattens out to zero at around 600 iterations. The impact is devastating with over 70% of the population succumbing to the virus.

Scenario 4: What would happen if people recovered more slowly from the virus?

Dataville’s scientists discover that people are recovering at a much slower rate than initially thought. Let’s decrease the recovery rate to 10%. As per usual we will reset the other parameters to the baseline.

Scenario 4: Image by Author
Scenario 4: Image by Author

This turns out to be our most devastating scenario so far leaving 90% of our population deceased by the end of the simulation.

Decreasing the recovery rate causes a build-up of infections peaking at around 30% of the population. This is only subdued by the high death rate levelling out the number of infections to zero by the 400th iteration. Not good!

Scenario 5: How effective would a vaccine be?

Dataville’s government is planning on rolling out a vaccine. Its effectiveness is said to be around 95%.

Vaccines work by granting the receiver some immunity to the virus (more on vaccines here). People that are immune are less likely to transmit the virus. Therefore, we can account for vaccines in our simulation by scaling down the transmission rates. We will do this for scenario 4, our worst case, to see what the impact would be.

Scenario 5: Image by Author
Scenario 5: Image by Author

The results are incredible. We have gone from 90% of our population deceased to less than 3%. The virus is effectively a non-issue with our population vaccinated.


Final Thoughts

This wasn’t a plug for vaccinations and the results of this exercise should not be taken too seriously. After all these models are simplistic and Dataville doesn’t exist.

However, I think you’ll agree that the simulations have revealed some rather unintuitive outcomes. For example, which one of us would have guessed that lower recovery rates could have such a devastating impact – even more so than mortality rates?

Ultimately simulating complex phenomena can help guide our decision making.

I hope more than anything you’ll see the usefulness of simulations for exploring these phenomena.

Feel free to play around with the code yourself in the GitHub link below.

john-adeojo/Pandemic_Simulation

John Ade-Ojo – Data Science | Tech | Banking & Finance | LinkedIn


Related Articles