BCN Casual ALGO

Bayesian Statistics of Efficacy of Pfizer-BioNTech COVID-19 Vaccine — part II

Reproducing original results

Bartek Skorulski

Published in

Towards Data Science

7 min readApr 10, 2021

(By permission of Maria Jose Pelaez Montalvo)

This post is the second part of the post Bayesian Statistics of Efficacy of Pfizer-BioNTech COVID-19 Vaccine — part I . In Part I, I gave you a minimal summary of Bayesian Inference, which is needed to understand how to calculate statistics for the Vaccine Efficacy. Then I showed you a simple method that could allow you to get results close to the results from the original article. If this is not enough for you, you should also read this part. It is slightly more difficult but here I will show you how you can reproduce the computation of Credible Interval with one-digit precision. We will then validate the method with another Credible Interval that can be found in the article.

Part I (previous post)
Introduction
What is Vaccine Efficacy
Credibility of results
Bayesian Inference
Beta-binomial model
Statistics of Vaccine Efficacy using simulations
Vaccine and Placebo Incidence Rates
Monte Carlo methods
Posterior probabilities and 95% Credible Interval

Part II (this post)
Reproducing statistics from the article
Additional parameter θ
Prior distribution of θ and an adjustment of occurrences
Credible Interval for posterior Vaccine Efficacy
COVID-19 occurrence in participants with and those without prior evidence of infection
Final note
References

Reproducing statistics from the article

If we would like to understand how the statistics in the article were calculated, we need to look at the study protocol. The important thing about the protocol is that it should be registered before the start of the experiment, in order to avoid post-hoc cherry picking of calculations that are the most convenient.

However, as Sebastian Kranz wrote in this blog post, we needed to “make an educated guess”, since not all the details were available to us. In addition to his educated guesses, I needed to add one more, namely an adjustment of the number of occurrences of COVID-19. With this additional “guess”, the numbers are exactly the same as in the following table.

Table 1: Vaccine Efficacy against Covid (reproduced from https://www.nejm.org/doi/full/10.1056/NEJMoa2034577)

At the end of this section we verify our calculations, by reproducing COVID-19 occurrence in participants with and those without evidence of infection, which is the second row of Table 1.

Additional parameter θ

So let us look at the protocol pages 102–103. They have decided to estimate the uncertainty of the Vaccine Efficacy with the help of an additional parameter. This parameter θ is given by the following formula:

This formula relates θ with Vaccine Efficacy. What is the advantage of having this extra parameter, that at first sight seems to be an additional complication? The reason is, as we will show below, that we can model θ with a single Beta-Bernoulli model which makes usage of Monte Carlo methods unnecessary. Let me show that.

First, note that we can rewrite the formula as

Then, if we assume that there is an equal number of participants in each group (vaccine and placebo), the formula became

And here we are. We can model θ by Beta-Bernoulli model. And once we know θ, we can calculate Vaccine Efficacy using the formula

Prior distribution of θ

On protocol pages 102–103, it is assumed that the prior distribution of θ follows Beta distribution Beta(0.700102, 1). Now let me try to follow the protocol and find out where these α=0.700102 and β=1 come from.

First they assume that the prior value for the Vaccine Efficacy is 30%. Then it follows that the prior value for θ should be equal to:

Since θ follows Beta distribution, then, as we did before, the natural choice of its parameters, that β=1 and α≤1 should be such that

Then putting β=1 we got

As in the protocol.

Prior distribution of θ: *Beta(*0.700102, 1)

Posterior distribution of θ and an adjustment of occurrences

Now let us get the posterior distribution of θ. Since the vaccine group had 8 cases of COVID-19 and the placebo group 162, it would follow that the posterior distribution is Beta(0.700102 + 8, 1 + 162). However, since the size of the vaccine group is not equal to the size of the placebo group, we need to adjust those numbers. Let me explain how we need to do this.

After adjusting for surveillance time, we have 17411 in the vaccine group and 17511 in placebo. Total 34922. Hence, if the groups were equal, we should have 17461 in each group. Since in the vaccine group we have 8 and 162 occurrences of COVID-19 in respective groups, using proportionality we get

Then the posterior distribution of θ is Beta(0.700102+8.02297, 1+161.53743).

Posterior distribution of θ: Beta(0.700102+8.02297, 1+161.53743)

Credible Interval for posterior Vaccine Efficacy

Knowing the distribution of θ we can finally calculate the Credible Interval of Vaccine Efficacy. Using python it can be done as follows.

Rounding it we get that 95% Credible Interval is (90.3, 97.6), exactly as in the article.

The posterior distribution of Vaccine Efficacy with shaded 95% Credible Interval

COVID-19 occurrence in participants with and those without prior evidence of infection

Finally let me apply all of the above to reproduce the Vaccine Efficacy and
95% Credible Interval for Covid-19 occurrence at least 7 days after the second dose in participants with and those without evidence of infection (see the second row of Table 1). This will validate our educated guesses.

First, we once again need to adjust the number of occurrences of COVID-19 in both groups:

Hence in this case the posterior distribution of θ is Beta(0.700102 + 9.03613, 1 + 168.327) and again we can calculate 95% Credible Interval with python as follows.

After rounding, we get that it is equal to (89.9–97.3), exactly as in the article.

Final note

If you have got to this point and have not got lost too much, congratulations. We have managed to reproduce Table 1 with one-digit precision. It took me quite some time to figure out all the details. And still, I cannot say with 100% certainty that those are the exact steps that the researchers took.

I have to admit that not only have I had fun trying to figure out how statistics were done, but I have learnt quite a lot. I hope you do too. I also hope that if you are relatively new to Bayesian statistics, I was able to show that this way of doing inference is very powerful. It gives me a sense that I understand quite well what I am doing.

I have only scratched the surface of Bayesian Inference. Let me leave you here with a list of references where you can find more information about this topic.

References

For readers not that familiar with Bayesian Statistics I would recommend one of the two following books. Both are great, although the first is definitely lighter on maths.

Then I would recommend taking a look at the original article and the preregister protocol and Sebastian Kanz note.

[1] Richard McElreath: Statistical Rethinking, Second Edition.

[2] Peter Lee: Bayesian Statistics, An introduction, Fourth Edition

[3] Polack FP, Thomas SJ, Kitchin N, et al Safety and efficacy of the BNT162b2 mRNA Covid-19 vaccine. N Engl J Med 2020;383:2603–2615.

[4] BioNTech-Pfizer: Protocol: A Phase 1/2/3 Study to Evaluate the Safety, Tolerability, Immunogenicity, and Efficacy of RNA Vaccine Candidates Against COVID-19 in Healthy Individuals (starts from page 326)

[5] Sebastian Kranz: A look at BioNTech/Pfizer’s Bayesian Analysis of their COVID-19 Vaccine Trial.

All visualisations, unless otherwise noted, are by the author.