A Bayesian Model for Estimating the Effects of Covid-19 Vaccines

PYMC3, Bernoulli Distribution, Pfizer, Moderna, AstraZeneca

Published in

Towards Data Science

4 min readNov 30, 2020

Because of the trials are still ongoing, researchers caution against making head-to-head comparisons of vaccines based on incomplete data. But for the sake of learning, we will do it anyway, just not making any meaningful conclusions.

Recently, the announcements went out that the potential effectiveness of SARS-CoV-2 vaccine candidates developed by Pfizer-Biontech, Moderna, AstraZeneca regimen 1 and AstraZeneca regimen 2 to be 95%, 94.5%, 90% and 62% respectively. We know that some of the data are incomplete. The following analysis will be based on whatever information we currently have.

The Data

Based on the information from the announcements and the other scientist’s blog posts, we know that:

Pfizer: Over 43,000 participants got the two doses of the vaccine or placebo. The same number of subjects was assigned to the treatment and control group. An efficacy of 95% implies that among 170 confirmed cases of COVID-19, 8 of them observed in the vaccine group.

Moderna: The vaccine is being tested in 30,000 people. Half received two doses of the vaccine, and half received a placebo. Of the 95 cases of covid-19, 90 were in the group that received the placebo.

AstraZeneca regimen 1: Regimen 1(first a half dose and at least a month later a full dose) with 2741 participants showed 90% efficacy implies that among 37 confirmed cases of COVID-19, 3 of them observed in the vaccine group.

AstraZeneca regimen 2: Regimen 2 (two full doses at least one month apart) with 8896 participants showed 62% efficacy implies that among 94 confirmed cases of COVID-19, 26 of them observed in the vaccine group.

Therefore, we could set up the data like this:

setup_data.py

The Model

Because our variables should be between 0 and 1,we’ll use a beta distribution for the priors and a Bernoulli distribution for the likelihood. To start, we assume that we do not know anything about how many participants caught the COVID in each group. So we choose the parameters of the beta distribution a uniform distribution. After we see the observed data, the distribution will change accordingly.

vaccine_bernoulli.py

At this point, you will be able to evaluate each vaccine trial result individually:

az.plot_trace(trace);

Or together:

sns.distplot(trace['p_Pfizer'], hist=False, rug=True, label='Pfizer')
sns.distplot(trace['p_Moderna'], hist=False, rug=True, label='Moderna')
sns.distplot(trace['p_regimen1'], hist=False, rug=True, label='AstraZeneca regimen 1')
sns.distplot(trace['p_regimen2'], hist=False, rug=True, label='AstraZeneca regimen 2')
plt.title("Posterior distributions of efficacy of different vaccines")
plt.legend()
plt.show();

We can also use Forest plot (95% HPD, IQR and median) of the posterior distribution on the same axes, allowing us to directly compare the vaccines.

pm.forestplot(trace, var_names=['p_Pfizer', 'p_Moderna', 'p_regimen1', 'p_regimen2']);

According to the available data, it looks like Pfizer’s vaccine has the highest mean effectiveness, and AstraZeneca regimen 2 has the lowest mean effectiveness.

Comparison Between Vaccines

Is Pfizer’s vaccine always better than Moderna’s? We will create two variables that will contain the “difference” and the “relation” between the variations.

Pfizer vs. Moderna.py

_ = pm.plot_posterior(trace_1, varnames=['difference', 'relation'], ref_val=0);

diffs_1 = trace_1.get_values('difference', burn=1000)
100*len(diffs_1[diffs_1>0])*1.0/len(diffs_1)

We can say that the difference in effectiveness of these two vaccines, with 61.4% of the posterior probability are greater than zero. We can not rule out a difference of zero, which suggests these two vaccines are not credibly different.

Is AstraZeneca regimen 1 always better than AstraZeneca regimen 2?

regimen 1 vs. regimen 2.py

_ = pm.plot_posterior(trace_2, varnames=['difference', 'relation'], ref_val=0);

diffs_2 = trace_2.get_values('difference', burn=1000)
100*len(diffs_2[diffs_2>0])*1.0/len(diffs_2)

According to the available data, we have 99% confidence that regimen 1 is more effective than regimen 2.

100*len(diffs_2[diffs_2>0.06])*1.0/len(diffs_2)

There is 95.5% probability that regimen 1 is more effective than regimen 2 by 6%.

plt.figure(figsize=(12,4))
plt.hist(diffs_2, bins=100, density=True)
plt.vlines(0.06, 0, 6, linestyle='--', color='red')
plt.title('Posterior distribution of the difference of the two regimens')
plt.show();

Please note, scientist communities have described several problematic issues with AstraZeneca vaccine study results. So AstraZeneca is likely to run an additional global trial. My above analysis will be updated once new data becomes available.

Jupyter notebook can be found on Github. Have a nice week!

A Bayesian Model for Estimating the Effects of Covid-19 Vaccines

PYMC3, Bernoulli Distribution, Pfizer, Moderna, AstraZeneca

The Data

The Model

Comparison Between Vaccines

Written by Susan Li