The world’s leading publication for data science, AI, and ML professionals.

Using Bayesian Hierarchical Models in PyMC3 to Infer the Disease Parameters of COVID-19

In this post, we look at how to use PyMC3 to infer the disease parameters for COVID-19. PyMC3 is a popular probabilistic programming…

In this post, we look at how to use PyMC3 to infer the disease parameters for COVID-19. PyMC3 is a popular probabilistic programming framework that is used for Bayesian modeling. Two popular methods to accomplish this are the Markov Chain Monte Carlo (MCMC) and Variational Inference methods. The work here looks at using the currently available data for the infected cases in the United States as a time-series and attempts to model this using a compartmental probabilistic model. We want to try to infer the disease parameters and eventually estimate R0 using MCMC sampling. We will then explore how to do the same using Bayesian hierarchical models and the benefits compared to a pooled or an unpooled model. We conclude with the limitations of this model and outline the steps for improving the inference process.

The work presented here is for illustration purposes only and real-life Bayesian modeling requires far more sophisticated tools than what is shown here. Various assumptions regarding population dynamics are made here, which may not be valid for large non-homogeneous populations. Also, interventions such as social distancing and vaccinations are not considered here.

This post will cover the following:

  1. Compartmental models for Epidemics
  2. Where the data comes from and how it is ingested
  3. The SIR/SIRS models for disease dynamics
  4. Bayesian Inference for ODEs with PyMC3
  5. Extension of the work with hierarchical models
  6. Guidelines and debugging tips for probabilistic programming

I have also launched a series of courses on Coursera covering this topic of Bayesian modeling and inference, courses 2 and 3 are particularly relevant to this post. Check them out at: https://www.coursera.org/specializations/compstats .

Compartmental Models for Epidemics

For an overview of compartmental models and their behavior, please refer to this notebook in Julia https://github.com/sjster/Epidemic.

Compartmental models are a set of Ordinary Differential Equations (ODEs) for closed populations, which imply that there is no movement of the population in or out of this compartment. These aim to model disease propagation in compartments of populations that are homogeneous. As you can imagine, these assumptions may not be valid in large populations. It is also important to point out here that vital statistics such as the number of births and deaths in the population may not be included in this model. The following list mentions some of the compartmental models along with the various compartments of disease propagation, however, this is not an exhaustive list by any means.

  • Susceptible Infected (SI)
  • Susceptible Infected Recovered (SIR)
  • Susceptible Infected Susceptible (SIS)
  • Susceptible Infected Recovered Susceptible (SIRS)
  • Susceptible Infected Recovered Dead (SIRD)
  • Susceptible Exposed Infected Recovered (SEIR)
  • Susceptible Exposed Infected Recovered Susceptible (SEIRS)
  • Susceptible Exposed Infected Recovered Dead (SEIRD)
  • Maternally-derived Immunity Susceptible Infectious Recovered (MSIR)
  • SIDARTHE (https://www.nature.com/articles/s41591-020-0883-7)

The last one listed above is more recent and specifically targets Covid-19 and maybe worth a read for those interested. Real-world disease modeling often involves more than just the temporal evolution of disease stages since many of the assumptions associated with compartments are violated. To understand how the disease propagates, we would want to look at the spatial discretization and evolution of the progression of the disease through the population. An example of a framework that models this spatio-temporal evolution is GLEAM (Fig.1).

​ Fig. 1- Real-world epidemic modeling (spatio-temporal dynamics).

Tools such as GLEAM use the population census data and the mobility patterns to understand how people move geographically. GLEAM divides the globe into spatial grids of roughly 25km x 25km. There are broadly two types of mobility: global or long-range mobility and local or short-range mobility. Long-term mobility mostly involves air travel and as such airports are considered a central hub for disease transmission. Travel by sea is also another significant factor and therefore naval ports are another type of access point. Along with the mathematical models listed above, this provides a stochastic framework that can be used to make millions of simulations to draw inferences about parameters and make forecasts.

COVID-19 data

The data used here is obtained from the Johns Hopkins CSSE Github page where case counts are regularly updated:

CSSE GitHub

Confirmed cases

Number of deaths

The data is available as CSV files which can be read in through Python pandas.

The SIR and SIRS models

SIR Model

The SIR model is given by the set of three Ordinary Differential Equations (ODEs) shown below. There are three compartments in this model.

Here ‘S’, ‘I’ and ‘R’ refer to the susceptible, infected and recovered portions of the population of size ‘N’ such that

The assumption here is that once you have recovered from the disease, lifetime immunity is conferred on an individual. This is not the case for a lot of diseases and hence may not be a valid model.

λ is the rate of infection and μ is the rate of recovery from the disease. The fraction of people who recover from the infection is given by ‘f’ but for the purpose of this work, ‘f’ is set to 1 here. We end up with an Initial Value Problem (IVP) for our set of ODEs where I(0) is assumed to be known from the case counts at the beginning of the pandemic and S(0) can be estimated as N – I(0). Here we make the assumption that the entire population is susceptible. Our goal is to accomplish the following:

  • Use Bayesian Inference to make estimates about λ and μ
  • Use the above parameters to estimate I(t) for any time ‘t’
  • Compute R0

As already pointed out, λ is the disease transmission coefficient. This depends on the number of interactions, in unit time, with infectious people. This in turn depends on the number of infectious people in the population.

The force of infection or risk at any time ‘t’ is defined as:

Also, μ is the fraction of recovery that happens in unit time. The inverse of μ is hence the mean recovery time. The ‘basic reproduction number’ R0 is the average number of secondary cases produced by a single primary case (Examples https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6002118/). R0 is also defined in terms of the λ and μ as the ratio given by

The above assumes that S0 is close to 1. When R0>1, we have a proliferation of the disease and we have a pandemic. With the recent efforts to vaccinate the vulnerable, this has become even more relevant to understand. If we vaccinate a fraction ‘p’ of the population to get (1−p)R0<1, we can halt the spread of the disease.

SIRS Model

The SIRS model, shown below, makes no assumption of lifetime immunity once an infected person has recovered. Therefore, one goes from the recovered compartment to the susceptible compartment. As such, this is probably a better low-fidelity baseline model for COVID-19 where it is suggested that the acquired immunity is short-term. The only additional parameter here is γ which refers to the rate at which immunity is lost and the infected individual moves from the recovered pool to the susceptible pool.

For this work, only the SIR model is implemented, and the SIRS model and its variants are left for future work.

Using PyMC3 to Infer the Disease Parameters

We can discretize the SIR model using a first-order or a second-order temporal differentiation scheme which can then be passed to Pymc3 which will march the solution forward in time using these discretized equations. The parameters λ and μ can then be fitted using the Monte Carlo sampling procedure.

First-order scheme

Second-order scheme

The DifferentialEquation Method in PyMC3

While we can provide the discretization manually with our choice of a higher-order discretization scheme, this quickly becomes cumbersome and error-prone not to mention computationally inefficient in a language like Python. Fortunately, PyMC3 has an ODE module to accomplish this. We can use the DifferentialEquation method from the ODE module which takes as input a function that returns the value of the set of ODEs as a vector, the time steps where the solution is desired, the number of states corresponding to the number of equations and the number of variables we would like to have solved. Even though this is still faster than manual discretization, this method scales poorly with problem size. The recommended best practice is to use the ‘sunode’ module (see below) in PyMC3. For example, the same problem took 5.4 mins using DifferentialEquations vs. 16s with sunode for 100 samples,100 tuning samples and 20 time points.

self.sir_model_non_normalized = DifferentialEquation(
   func = self.SIR_non_normalized,
   times = self.time_range1:],
   n_states = 2,
   n_theta = 2,
   t0 = 0)
def SIR_non_normalized(self, y, t, p):
   ds = -p[0] * y[0] * y[1] / self.covid_data.N,
   di = p[0] * y[0] * y[1] / self.covid_data.N - p[1] * y[1]
   return[ds, di]

The syntax for using the sunode module is shown below. While there are some syntactic differences, the general structure is the same as that of DifferentialEquations.

import sunode
import sunode.wrappers.as_theano
def SIR_sunode(t, y, p):
  return { 'S': -p.lam * y.S * y.I,
  'I': p.lam * y.S * y.I - p.mu * y.I}
...
...
sir_curves, _, problem, solver, _, _ =       
   sunode.wrappers.as_theano.solve_ivp(
   y0={ # Initial conditions of the ODE
   'S': (S_init, ()),
   'I': (I_init, ()),},
   params={
      # Parameters of the ODE, specify shape
     'lam': (lam, ()),
     'mu': (mu, ()),
     '_dummy': (np.array(1.), ())} # currently, sunode throws an error, without this
   # RHS of the ODE
   rhs=SIR_sunode,
   # Time points of th solution
   tvals=times,
   t0=times[0],)

The Inference Process for an SIR Model

In order to perform inference on the parameters we seek, we start by selecting reasonable priors for the disease parameters. Based on our understanding of the behavior of the disease phenomenon, a lognormal distribution is a reasonable prior for the disease parameters. Ideally, we want the mean parameter of this lognormal to be in the neighborhood of what we expect the desired parameters to reside. For good convergence and solutions, it is also essential that the data likelihood is appropriate (domain expertise!). It is common to pick one of the following as the likelihood.

  • Normal distribution
  • Lognormal distribution
  • Student’s t-distribution

We obtain the Susceptible (S(t)) and Infectious (I(t)) numbers from the ODE solver and then sample for values of λ and μ as shown below.

with pm.Model() as model4:
   sigma = pm.HalfCauchy('sigma', self.likelihood['sigma'], shape=1)
   lam = pm.Lognormal('lambda', self.prior['lam'], self.prior['lambda_std']) # 1.5, 1.5
   mu = pm.Lognormal('mu', self.prior['mu'], self.prior['mu_std'])
   res, _, problem, solver, _, _ = sunode.wrappers.as_theano.solve_ivp(
   y0={'S': (self.S_init, ()), 'I': (self.I_init, ()),},
   params={'lam': (lam, ()), 'mu': (mu, ()), '_dummy': (np.array(1.), ())},
   rhs=self.SIR_sunode,
   tvals=self.time_range,
   t0=self.time_range[0])
   # likelihood distribution mean, these are the predictions from the SIR model ODE solver
   if(likelihood['distribution'] == 'lognormal'):
       I = pm.Lognormal('I', mu=res['I'], sigma=sigma, observed=self.cases_obs_scaled)
   elif(likelihood['distribution'] == 'normal'):
       I = pm.Normal('I', mu=res['I'], sigma=sigma, observed=self.cases_obs_scaled)
   elif(likelihood['distribution'] == 'students-t'):
       I = pm.StudentT( "I", nu=likelihood['nu'], 
       mu=res['I'], 
       sigma=sigma,
       observed=self.cases_obs_scaled)
   R0 = pm.Deterministic('R0',lam/mu)
   trace = pm.sample(self.n_samples, tune=self.n_tune, chains=4, cores=4)
   data = az.from_pymc3(trace=trace)

The Inference Workflow with PyMC3

Since developing a model such as this, for estimating the disease parameters using Bayesian inference, is an iterative process we would like to automate away as much as possible. It is probably a good idea to instantiate a class of model objects with various parameters and have automated runs. It is also a good idea to save the trace information, inference metrics such as R̂ (R-hat) along with other metadata information for each run. A file format such as NetCDF can be used for this although it could be as simple as using the Python built-in database module ‘shelve’. The classes for data extraction are not shown here, but their invocations are shown so that you get a sense for the data and the model parameters used here.

covid_obj = COVID_data('US', Population=328.2e6)
covid_obj.get_dates(data_begin='10/1/20', data_end='10/28/20')
sir_model = SIR_model_sunode(covid_obj)
likelihood = {'distribution': 'normal', 'sigma': 2}
prior = {'lam': 1.5,'mu': 1.5, 'lambda_std': 1.5, 'mu_std': 1.5 }
sir_model.run_SIR_model(n_samples=500, n_tune=500, likelihood=likelihood, prior=prior)

These results are purely for illustration purposes and extensive experimentation is needed before meaningful results can be expected from this simulation. The case count for the United States from January to October is shown below (Fig 2).

Fig. 2 – Example COVID-19 case count visualization for the US

Fig. 3 shows the results of an inference run where the posterior distributions of λ, μ and R0 are displayed. One of the advantages of performing Bayesian inference is that the distributions show the mean value estimate along with the Highest Density Interval (HDI) for quantifying uncertainty. It is a good idea to check the trace (at the very least!) to ensure sampling was done properly.

​ Fig. 3 – Example results of an inference run displaying the Highest Density Interval (HDI) using PyMC3.

Pooled, unpooled and hierarchical models

Suppose you have information regarding the number of infections from various states in the United States. One way to use this data to infer the disease parameters of COVID-19 (e.g. R0) is to sum it all up to estimate a single parameter. This is called a pooled model. However, the problem with this approach is that fine-grained information that might be contained in these individual states or groups is lost. The other extreme would be to estimate an individual parameter R0 per state. This approach results in an unpooled model. However, considering that we are trying to estimate the parameters corresponding to the same virus, there has to be a way to perform this collectively, which brings us to the hierarchical model. This is particularly useful when there isn’t sufficient information in certain states to create accurate estimates. Hierarchical models allow us to share the information from other states using a shared ‘hyperprior’. Let us look at this formulation in more detail using the example for :

For a pooled model, we can draw from a single distribution with fixed parameters λ_μ, λ_σ

For an unpooled model, we can draw λᵢ for each state from distributions with fixed parameters λ_μᵢ, λ_σᵢ

For a hierarchical model, we have a prior that is parameterized by non-constant parameters drawn from other distributions. Here, we draw a λ value for each state, however they are connected through shared hyperprior distributions (with constant parameters) as shown below.

Check out course 3 Introduction to PyMC3 for Bayesian Modeling and Inference (https://www.coursera.org/learn/introduction-to-pymc3?specialization=compstats) in the recently-launched Coursera specialization for more details on hierarchical models.

COVID-19 data for hierarchical models

Here we plot and use the case count of infections-per-day for two countries, the United States and Brazil. However, there is no limitation on either the choice or number of countries that can be used in a hierarchical model. The cases below are from Mar 1, 2020 to Jan 1, 2021. The graphs seem to follow a similar trajectory, even though the scales on the y-axis are different for these countries. Considering that these cases are from the same COVID-19 virus, this is reasonable. However, in a realistic scenario there are differences to account for, such as the different variants, different geographical structures and social distancing rules, healthcare infrastructure and so on.

Fig. 4 – Plot of the number of COVID-19 cases for two countries

Inference of parameters

For a hierarchical model, the code snippet to perform the inference of the disease parameters is given below.

with pm.Model() as model4:
   # narrow std is roughly equivalent to a constant prior parameter,  if there are issues with sampling from the prior distribution
   # make the variance of the mean smaller. Variance of the distribution of the variance parameter seems less relevant in this regard.
   nsamples = 8000 
   ntune = 4000
   Hyperprior = {"Lambda mean": 0.75, "Lambda std": 2, "Mu mean": 0.75, "Mu std": 2}
   Prior = {"Lambda std": 1.0, "Mu std": 1.0}
   Likelihood = {"Name": "Normal", "Parameters": {"std": 0.01}}
   prior_lam = pm.Lognormal('prior_lam', Hyperprior['Lambda mean'], Hyperprior['Lambda std'])
   prior_mu = pm.Lognormal('prior_mu', Hyperprior['Mu mean'], Hyperprior['Mu std'])
   prior_lam_std = pm.HalfNormal('prior_lam_std', Prior['Lambda std'])
   prior_mu_std = pm.HalfNormal('prior_mu_std', Prior['Mu std'])
   lam = pm.Lognormal('lambda', prior_lam , prior_lam_std, shape=2)
   mu = pm.Lognormal('mu', prior_mu , prior_mu_std, shape=2)
   # - - - - - - - - - - ODE model - - - - - - - - #
   res, _, problem, solver, _, _ = sunode.wrappers.as_theano.solve_ivp(
   y0={ 'S': (S_init, (2,)), 'I': (I_init, (2,)),},
   params={'lam': (lam, (2,)), 'mu': (mu, (2,)), '_dummy': (np.array(1.), ())},
   rhs=SIR_sunode,
   tvals=time_range[1:],
   t0=time_range[0])
   I = pm.Normal('I', mu=res['I'], sigma=Likelihood['Parameters']['std'], observed=cases_obs_scaled[1:])
   R0 = pm.Deterministic('R0',lam/mu)
   # if you increase the variance and the distributions looks choppy, increase the tuning sample size to sample the space more effectively
   # also, increase the total number of samples
  trace = pm.sample(nsamples, tune=ntune, chains=8, cores=8)
  data = az.from_pymc3(trace=trace)

The sampled posterior distributions are shown below, along with their 94% Highest Density Interval (HDI).

Fig. 5 – The sampled posterior distributions along with their 94% Highest Density Interval (HDI).

We can also inspect the traceplots for convergence, which shows good mixing in all the variables – a good sign that the sampler has explored the space well. There is good agreement between all the traces. This behavior can be confirmed with the fairly narrow HDI intervals in the plots above.

Fig. 6 – The traceplots and density plots for R0 and other variables

The table below summarizes the distributions of the various inferred variables and parameters, along with the sampler statistics. While estimates about the variables are essential, this table is particularly useful for informing us about the quality and efficiency of the sampler. For example, the R-hat is all close to or equal to 1.0, indicating good agreement between all the chains. The effective sample size is another critical metric. If this is small compared to the total number of samples, that is a sure sign of trouble with the sampler. Even if the R-hat values look good, be sure to inspect the effective sample size!

Fig. 7 – Table of the inferred variable distributions along with the sampling statistics

Notes and Guidelines

Some general guidelines for modeling and inference:

  • Use at least 5000 samples and 1000 samples for tuning.
  • For the results shown above, I have used: Mean: λ= 1.5, = 1.5, Standard deviation: 2.0 for both parameters. A domain expert and his knowledge is invaluable in setting these parameters.
  • Sample from 3 chains at least.
  • Set target_accept to > 0.85 (depends on the sampling algorithm).
  • If possible, sample in parallel with cores=n, where ‘n’ is the number of cores available.
  • Inspect the trace for convergence.
  • Limited time-samples have an impact on inference accuracy, it is always better to have more good quality data.
  • Normalize your data, large values are generally not good for convergence

Debugging your model

  • Since the backend for PyMC3 is theano, the Python print statement cannot be used to inspect the value of a variable. Use theano.printing.Print(DESCRIPTIVE_STRING)(VAR) to accomplish this.
  • Initialize stochastic variables by passing a ‘testval’. This is very helpful to check those pesky ‘Bad Energy’ errors, which are usually due to poor choice of likelihoods or priors. Use Model.check_test_point() to verify this.
  • Use step = pm.Metropolis() for quick debugging, this runs much faster but results in a rougher posterior.
  • If the sampling is slow, check your prior and likelihood distributions.
  • A narrow sigma value for a prior distribution can be used to simulate a constant prior and can help debug issues with sampling from the prior distribution.
  • If increasing the variance results in a choppy posterior distribution, increase both the tuning sample size and the number of samples to sample the space more effectively.

Future work

Although this yielded satisfactory estimates for our parameters, often we run into the issue of the sampler not performing effectively. For future work, there are a few ways to diagnose and improve the modeling process. These are listed, in increasing order of difficulty, below:

  1. Increase the tuning size and the number of samples drawn.
  2. Decrease the target_accept parameter for the sampler so as to reduce the autocorrelation among the samples. Use the autocorrelation plot to confirm this.
  3. Add more samples to the observed data, i.e. increase the sample frequency.
  4. Use better priors and hyperpriors for the parameters.
  5. Use an alternative parameterization of the model.
  6. Incorporate changes such as social-distancing measures into the model.

LEARN MORE

You can learn more about these topics at my Coursera specialization (https://www.coursera.org/specializations/compstats) that consists of the following courses:

  1. Introduction to Bayesian Statistics (https://www.coursera.org/learn/compstatsintro?specialization=compstats)
  2. Bayesian Inference with MCMC (https://www.coursera.org/learn/mcmc?specialization=compstats)
  3. Introduction to PyMC3 for Bayesian Modeling and Inference (https://www.coursera.org/learn/introduction-to-pymc3?specialization=compstats)

References

  1. The work by the Priesemann Group

GitHub – Priesemann-Group/covid_bayesian_mcmc: Bayesian Markov Chain Monte Carlo Forecast for…

  1. Work by Demetri Pananos on the PyMC3 website

GSoC 2019: Introduction of pymc3.ode API – PyMC3 3.10.0 documentation

  1. Sunode

GitHub – aseyboldt/sunode: Solve ODEs fast, with support for PyMC3

  1. GLEAM

http://www.gleamviz.org


Related Articles