Infection Modeling — Part 1

Estimating the Impact of a Pathogen via Monte Carlo Simulation

Published in

Towards Data Science

6 min readJan 2, 2019

A word of warning: some of the ideas presented throughout this series may not be ethical. Debate about privacy, individual autonomy, equality, the ‘greater good’, and iatrogenics would certainly arise. But ignoring all of that makes for an interesting example of system modeling and optimization. So we will sidestep the ethics conversation for the sake of our own entertainment (and because these concepts can be readily applied to other domains that lack such moral gray-areas).

Social networks are commonly perceived as synonymous with online social media, but they exist in the real world as well: our friend groups, co-workers, family, fellow grocery shoppers, etc. Analyzing these real-world social networks is valuable for election strategy, retail customer segmentation, or as we will see here, epidemiology.

In this work, we will look at a frequently-used, simplified, epidemic model and show how it can be applied to a specific social network in conjunction with Monte Carlo simulations to estimate the impact of a pathogen. Later parts will go into optimization, system dynamics, and (importantly) a review of all over-simplifications, gross generalizations, and things to be taken with a grain of salt.

The SIR Epidemic Model

Quantitative models for epidemics exist in several forms, though they all deal with fractions of the population belonging to these groups: susceptible (S) — not immune and capable of contracting the pathogen, exposed (E) — having come into contact with the pathogen, infected (I) — currently infected with the pathogen, and removed (R) — immune from the pathogen (via vaccine or post-exposure) or dead. In this work we will focus on the SIR model, whereby members of the population are either susceptible, infected, or removed.

The SIR model is governed by the differential equations in (1). Beta is the infection rate of the pathogen, and gamma is the recovery rate. Together, these two values give the basic reproduction number R0: the average number of secondary infections caused by an infected host.

Equation 1: SIR model differential equations. x(t) is the fraction of the population infected, s(t) is the fraction of the population susceptible, r(t) is the fraction of the population removed (recovered and/or immune, or worse, dead). Beta is the infection rate, and gamma is the recovery rate.

If the R0 value is greater than one, the infection rate is greater than the recovery rate, and thus the infection will grow throughout the population (as seen in Figure 1). If R0 is less than one, the infection quickly will die out since people are healing faster than they are spreading it.

Figure 1: Typical SIR model applied to populations. gamma=0.17, beta=0.23, initial infected fraction=0.01

The Network Approach to the SIR Model

The above SIR model is a powerful tool, but addresses the abstract ‘population’. What if you have more information about your community’s interactions? The proliferation of cellphones and wearable tech makes tracking movement and interactions more feasible. It is possible to establish a social network of the local population, and model processes based on this network. In the event of a high concern of an epidemic, public health officials can model the spread of the infection through the network to get a better idea of what outcomes to expect and prepare for. The methodology presented here is similar to that of Liu et al in their 2018 paper.

The infection simulation begins with n nodes initialized as infected to introduce the pathogen into the network. At each time-step, infected nodes have a p percent chance of infecting their neighbors, with p decaying each time-step to account for individuals becoming more careful as they realize they are sick. An infected node is removed from the network after r time-steps, signifying either recovery and post-infection immunity, or death, depending on the pathogen being modeled and/or how optimistic you, the modeler, are feeling.

Unlike the population-based SIR model, this is a stochastic process to be modeled via Monte Carlo simulation: repeated simulations with varying input parameters to generate a distribution of possible outcomes, rather than generating a single deterministic outcome. That is, p and r are generated from a probability distribution for each node, resulting in different outcomes for each simulation.

Construct the Infection Simulation

As an example of this, we will use a 62 node social network constructed from interactions within a pod of dolphins. Let’s take the example of an infection that has an average recovery time r of 30 days (standard deviation of 8 days), and an average transmission probability p of 6% (standard deviation of 1%) that decays by 10% each day of a node’s infection. Let’s further take the scenario that two nodes (n=2), chosen randomly, are the source of the infection in the network.

Following the modeling procedure above, and repeated 1,000 times, results in an average SIR response shown in the top graph of Figure 2. The probability density function of the total number of infected nodes during the epidemic is shown in the lower graph.

Figure 2: (above) averaged predictions of fraction of the population susceptible, infected, and removed. (below) PDF of the total number of nodes infected during the outbreak. The time span of the simulation is 150 days.

The averaged SIR response from the Monte Carlo simulations is quite dramatic, with nearly 80 percent of the population becoming infected at some point. The PDF of the final “removed” number shows how it is distributed among the 1,000 simulations. As can be seen in Figure 3, it roughly fits to a skewed normal distribution, with an average total number of infected nodes of 50 (standard deviation of 3.5).

Figure 3: PDF of total infected nodes roughly fits to a skewed normal distribution

The worst case scenario was the infection reaching 59 out of the 62 nodes. But as far as tail events go, the lower-impact outliers seem to be more prevalent in the simulated outcomes than the high-impact outliers.

Solving the SIR differential equations in (1) for beta and gamma lead to the equations shown below in (2). As t increases, beta and gamma converge to their respective values.

Equation 2: Solving SIR equations from (1) for beta and gamma

With beta and gamma calculated from the Monte Carlo simulation results, we can then calculate the R0 value for each simulation to get an idea of how infections this pathogen may be in the local population. The PDF of the calculated R0 values for the simulations is shown below in Figure 4.

Figure 4: Probability distribution of R0 calculated from Monte Carlo simulations

The Monte Carlo simulations show that the R0 value for this pathogen in the network can be is most likely to be between 2 and 3, comparative to that of the H1N1 Flu of 1918. While R0 values closer to 5 were observed in some of the simulations, R0 values in the left tail were more prevalent, suggesting that if public health officials assume the “most likely” R0 values when creating a response plan, they are more likely (though not guaranteed) to be over-prepared than under-prepared.

In the next article, we will look at how this modeling methodology can be used to account for the effects of vaccination. Moreover, we will attempt to optimize the vaccination strategy to minimize the impact of the pathogen in this network.

Infection Modeling — Part 1

Estimating the Impact of a Pathogen via Monte Carlo Simulation

The SIR Epidemic Model

The Network Approach to the SIR Model

Construct the Infection Simulation

Written by Mark Ditsworth