Using Bayes’ to interpret COVID-19 rapid home test results: Probability Simulation Part 1–2

Have you taken a COVID-19 home test and are trying to interpret the results? Let’s use some Bayes’ to answer this question and also verify the answer by simulating the scenario using Python.

Published in

Towards Data Science

9 min readMay 31, 2022

This post is a continuation of my previous Probability post, Part 1. (This is an interim post, I am going to publish Part 2 of the series later).

Probability Simulations: Solve complex probability problems using randomness: Part 1

Estimate the probability of favorable outcome/s by designing experiments and repeating them several times in the…

towardsdatascience.com

Background

Let’s assume there is a disease termed rarestDisease, that affects 1% of the total human population. And say there is a RapitTest to detect this disease which is 95% accurate for both positive and negative tests.

Now if you test positive by this RapitTest, what is the probability that you actually have the disease?
The answer is ~16%. Why is that? Let’s take a look at the following COVID-19 example for a better understanding.

Introduction

When we all had started thinking that COVID-19 was under control, the numbers started increasing again. In the US alone, we have 100,000 plus positive cases being reported per day.

Source, May 25th, 2022. cdc.gov Daily Update for the United States

Last week, one of the people I had come in close contact with tested positive for COVID 19. I decided to take a COVID-19 rapid home test, so that I can self-isolate, if I am positive [reasons to get tested]. After giving the test I was wondering how to interpret the results, and I am assuming that many of you may have the same question. So, let’s try to answer that using probability and verify that answer by some Monte Carlo simulation.

Disclaimer: I am an engineer and data scientist by profession, and have no medical expertise. This post is more about data and probability, and in no means intends to provide any kind of medical advice. Please refer to the latest CDC guidelines or consult your doctor for medical expertise.

Note: This post assumes that the reader is somewhat familiar with Probability and Bayes’ theorem. If not I would recommend you to read my last post.

COVID-19 Tests

Self-tests are rapid tests that can be taken at home or anywhere, are easy to use, and produce rapid results. In this post we will focus on how I tried to interpret my rapid home test results.

Please refer to CDC’s Self-Testing At Home or Anywhere link for the official guidance.

Let’s dive into it

I took the Intelliswab COVID-19 Rapid Test.

From information included within the test kit and their website, they claim that in their clinical study, the test correctly identified ~85% of the positive samples and ~98% of the negative samples.

Let’s try to understand what it means;

A higher Sensitivity means lower False negatives and a higher Specificity means lower False positives.
Based on the disease and control measures, the government or CDC may set some guidelines on what’s the minimum allowable sensitivity and specificity numbers for a test to be mass approved and, they may also give higher preference to one number over the other. (e.g. FPs may increase the burden on healthcare system, whereas FNs may increase the spread).

Coming back to the main question now: “Given a certain test result, what is the probability that I have the infection?”.

These numbers may look similar to the sensitivity and specificity numbers but they are actually different (look at the denominators).

*Note: In the DS community, we also know Sensitivity as Recall and the PPV as Precision.

Let’s assume that you have a rare disease X that affects only 1% of the population. You can begin with this number as your prior (or pre-test) probability.
And you can update this probability with new information too: For eg. if you were in close contact with someone who had this disease, or if it runs in your family, or if you are more prone to have it given your current health condition, etc..
And then, say, you decide to take a test. Based on the test results, you can further update your probability of having this disease. And the Sensitivity and Specificity numbers are the ones that will help you identify the likelihood factor by which you should increase or decrease your prior probability of having a disease after getting the test results.

You get the picture, right?

** We will assume that we have already calculated a good prior before taking the rapid test and will only focus on updating the probability after getting the test results.

Bayes’ theorem can help us calculate this likelihood factor and update our prior.

Post test probability = Likelihood based on test results ∗ Pre test probability

which in more technical terms, translates into;

We are interested in knowing the value for the left side, the PPV. The first part on the right hand side of the equation is the likelihood part. Note that this ratio is basically calculating the ratio of chances of getting a positive test result if you have Covid19 to the total chances of you getting a positive test result (**remember since the test can have false positives, this denominator will include both the scenarios → getting a positive when you have Covid19 as well as getting a positive when you don’t have Covid19).

Can you see how the Specificity and Sensitivity numbers will affect the above likelihood ratio and hence the updating factor?

Breaking down the components of the above equation,

The most difficult to get and also most debated number in the above equations is the P(you have Covid19) → pre-test probability (also known as Prior or Prevalence). It depends on various factors and is pretty difficult to nail down, as we will discuss later.

For the sake of an example calculation and based on my conditions mentioned above, let’s use P(I have Covid19) = 10% as a conservative guesstimate for now. And thus P(I don’t have Covid19) = 90%.
Putting all the numbers together in the above equation,

I tested negative, and hence my chances of having Covid19, **given my pre-test estimate is correct**, is only 1.67%.

Other assumptions that I have made here are:

I correctly self-administered the test per the instructions on the kit.
I took the test 5 to 7 days after close contact (as recommended by CDC).
The sensitivity and specificity numbers reported on the test kit are correct and valid for everyone, and remain the same over time.
Note that if you have symptoms, you should take the test within 7 days of the symptoms onset. [Source]

Coming to the main topic of this post — Simulation

Maybe the above explanation and calculations make complete sense without any confusion. Or maybe you did calculate the numbers but aren’t sure if they are correct?

This is where simulating the scenario will help. We have already built the foundation of how to simulate a certain probability problem in our last post. Let’s use the same framework here.

The answers from the simulation match our previous calculations.

To be more certain about my test result, I actually took the test twice (serial antigen testing), with a 36 hours gap between the tests [CDC guidelines, technical note 2].
Both results were negative, so, the chance of me actually having Covid19 is only 0.29% (**again, given our assumptions are True).

Let’s verify the sensitivity and specificity numbers from the results. Notice how the denominator changes when we calculate these numbers compared to the above numbers.

They match the input numbers that we provided, which is expected.

** Please refer to this site and algorithm set by CDC on what action you should take based on the positive or negative antigen test results. You can also refer to Interpreting-Results-of-Diagnostic-Tests.

** Also if you tested positive and continue to be tested positive for a longer period of time check out this article.

The Prior (Pre-test probability)

As we saw earlier, while evaluating results of an antigen test for Covid19 the following factors need to be considered → test sensitivity and specificity, and Pretest probability (which depends on the prevalence of Covid19 infection in the community and clinical context of the individual). [Source]

If the prevalence of infection in the community is high, the person being tested is symptomatic, and the likelihood of alternative diagnoses is low, then the pretest probability is generally considered high.
If the prevalence of infection in the community is low, and the person being tested is asymptomatic and has not had close contact to a person with COVID-19, then the pretest probability is generally considered low.

One way to calculate the prior for Covid19 can be to check the number of reported cases in your area. Or the ratio of positive tests to the total tests can be an option to consider too. For example, you can find the numbers for NYC here and for entire US here or here.
So, if you are in NYC, you could use ~9% as a prior estimate of you having Covid to begin with.
The above number can be updated based on other factors such as your health conditions or exposure. For example, this research states that there is around 2.6% chance, that you will get diagnosed with Covid given you were exposed to it (seems very low, right?).
** Note that above values are still estimates. At least I couldn’t find reliable sources which say with a certain confidence that if you are from XYZ region and meet ABC parameters, this is your prior probability of having Covid19. CDC does have Community level information, but that is in categories, instead of numbers.

So, taking this uncertainty into account, let’s calculate the post test probabilities (PPV and 1 - NPV) of one having Covid19 over the entire range of possible prior probabilities [0 to 1], given certain sensitivity and specificity numbers for a test.

For example, in my case, considering my health conditions, close contact and other factors, if I assume that my pre-test probability was somewhere betwen 10% to 30%, my single negative test result would put the probability of me having Covid19 between 1.7% to 6% (0.3% to 1% for 2 negative tests).

Checkout the full interactive graph on Streamlit (*see in Dark mode).

Play with the interactive graph a bit and try to understand how these 3 main factors — Sensitivity and Specificty of the test and the Pre-test probability affect the probability of you actually having the disease given a positive (or negative) test result.

Conclusion

Estimating the Prior probability is of crucial importance. A higher Prior can reduce your confidence in even highly accurate negative tests, whereas a very low prior can reduce your confidence in even highly accurate positive tests.
A highly sensitive test has lower False Negatives.
Similarly, a highly specific test has lower False Positives.
The two values above decide the factor by which you will update your prior belief of having the infection after the test.
Getting negative results twice gives you more confidence of not having the infection, especially when the Sensitivity number is high. **If you continue to experience symptoms you should seek follow up care with your healthcare provider.
If you get a positive result, there is a considerable chance that you might have the infection, especially for a highly specific test, and should follow the CDC guidelines to decide what’s best for you.

References

Covid Data Tracker Weekly review

Covid testing overview

How To Interpret Self-Test Results

Interpreting Results of Diagnostic Tests

Antigen test guidelines

Why Pretest and Posttest Probability Matter in the Time of COVID-19