Addressing criticisms of COVID-19 reporting through data

A light-touch stab at adjusting doomsday numbers

Marcell Ferencz
Towards Data Science

--

The 6 month anniversary of the introduction of the first lockdown measures in the UK is fast approaching, and with it the promise of tightening of restrictions once more, following a period of cautious “return to normal” over the past few months. Most of us have anticipated a second lockdown in some form after the relative frivolities of the summer, so few will be surprised by Boris Johnson’s announcement to re-introduce certain limitations in an effort to curb the spread of the virus. Looking at the alarming rate at which reported positive cases are increasing, this seems like an obvious decision, too:

Fig 1 UK Total Cases — source

With the ensuing public concern came a rising sense of skepticism about the reported numbers. Several outlets, including the BBC, have called into question the reliability of these figures, driven by the observation that neither hospital admissions nor COVID-related mortality have increased at anywhere near the rate of reported positive cases:

Fig 2 UK Total Hospital Admissions — source
Fig 3 UK Total Deaths — source

While all of the challenges laid out by these outlets may not be valid, they do highlight a number of glaring issues with the way these figures are reported and, more importantly, interpreted by the public and seemingly the government. In the next few minutes I attempt to highlight a few of these and offer ways to put these numbers into a different perspective.

Disclaimer

Before I begin, I want to highlight that I’m not a doctor, an epidemiologist, a medical biologist, a health minister, a policy advisor, an economist, a utilitarian philosopher. These calculations should be taken with the biggest pinch of salt and certainly not used to support any agenda. I do believe however, that if policy decisions are made based on data, the data should be scrutinised wherever possible.

The Issues With COVID Reporting In the UK

I’m going to explore three areas of concern regarding the reporting of positive cases, but before we dive into that, it’s worth providing a bit of context on how testing is conducted in the UK.

Testing is currently carried out in 4 ways (pillars). These are:

  • Pillar 1: NHS and PHE Testing — PCR swab testing in Public Health England (PHE) labs and NHS hospitals for those with a clinical need, and health and care workers
  • Pillar 2: Commercial partner testing — PCR swab testing for the wider population, as set out in government guidance
  • Pillar 3: Antibody testing — antibody serology testing to show if people have antibodies from having had COVID-19, reported from 1st June onwards
  • Pillar 4: Surveillance testing — antibody serology and PCR swab testing for national surveillance supported by PHE, ONS, Biobank, universities and other partners to learn more about the prevalence and spread of the virus and for other testing research purposes, for example on the accuracy and ease of use of home testing

Pillars 1, 2 and 4 are all PCR swab tests carried out under different circumstances; pillar 3 is the odd one out, being a serological antibody test. Pillar 2 tests are carried out through a commercial contractor, at mobile testing sites or through kits delivered to people’s homes. Incidentally, Pillar 2 testing has seen a significant increase in recent months as the government is ramping up its testing programme (rightly so). This leads us to the first issue.

Cases Are Not Adjusted For Testing Numbers

Keen observers will have clicked away from the daily case counts on gov.uk and wandered onto the charts showing the number of tests carried by pillar:

Fig 4 Tests carried out in the UK by pillar — source

While most pillars have remained fairly consistent, pillar 2 testing (darkest blue) has seen a significant jump in activity since August, particularly pronounced in September.

The issue with this should be obvious: the more you test, the more cases you will find. The science behind this is pretty intuitive; at least some of the increase in positive cases will be explained by the simple fact that more of these home kits were made available and used.

The Tests Are Assumed To Be Perfect

A more subtle complication of these figures comes not from how many people are tested, but how much we can trust those tests. Medical tests are not perfect, and how much you can be sure of your diagnosis depends heavily on their predictive power. Much like classification algorithms, diagnostic tools in medicine have sensitivity and specificity. To understand them in the context of COVID, it’s useful to look at them in isolation at first.

Sensitivity gives us the probability that an individual who definitely has Coronavirus will be diagnosed as such. If we were to test 10 people who definitely have the virus with a test with 60% sensitivity, the results (on average) would return 6 positive cases and 4 negative cases:

Fig 5 Diagram by author, icon made by bqlqn

Specificity, on the other hand, is the probability that a healthy person is identified correctly with a negative diagnosis. If 10 healthy people were tested with a kit with 60% specificity, the results would return 6 negatives and 4 positives, on average:

Fig 6 Diagram by author, icon made by bqlqn

We can illustrate the effect of specificity in the context of mass testing with slightly higher numbers. Suppose we have 100 people, 10 of whom definitely have the virus, and 90 of whom don’t. What would a 100% sensitivity, 90% specificity test give in this case?

Fig 7 Diagram by author, icon made by bqlqn

The test has 100% sensitivity, therefore it correctly identifies all 10 of the truly infected people directly. However, the 90% sensitivity means that only 90% of the remaining 90 negatives are correctly diagnosed, leaving a surplus of 9 people testing positive when they did not in fact have the virus.

This is another key page in the sceptics’ book; if specificity (or rather 1-specificity) is significantly higher than the actual prevalence of the virus, more testing will only muddy the waters and inherently increase the number of positive test cases. The actual sensitivity and specificity of RT-PCR tests are somewhere between 60–85% and 94–99%, respectively, depending on which (small sample sized) study you consult.

Regardless, specificity and sensitivity should (and do) play a part in estimates of true prevalence, which we’ll explore in the next section.

Testing Pillars Are Treated Equally

But first, I want to point out something a bit more nuanced; the circumstances under which testing is conducted matter a lot for how much we can rely on its results. Surely, tests carried out by professionals in hospitals (where repeat testing is available to reduce uncertainty, for example) should have a greater significance than take-home kits, where people do their own swabbing and the risk of contamination is far higher.

Interestingly, despite these factors suggesting a higher positive rate in hospital-based Pillar 1 tests, the opposite is true in reality:

Fig 8 Ratio of pillar 1 & 2 tests returning positive in England — Data source

Since June, the chance to test positive has decreased in hospitals and stayed relatively constant for take-home kits until both started increasing again in September, by which time you were 4–5 times more likely to test positive at home than in a hospital.

There could be a number of explanations for this, but I can only guess (not an expert, remember?) — it could be a combination of contamination and a difference in demographics, for example. Pillar 2 tests are used in care homes a lot, which have been known for localised mass infections, but they’re also used by the general public, presumably before travelling or working. Pillar 1 tests on the other hand are used for, well, patients and those in medical need. Another key difference is that Pillar 2 tests are processed by a commercial contractor, so there could be a difference in methodologies involved.

In any case, such systematic discrepancies should be highlighted especially when these positive cases are reported together.

Can We Do Better?

In the final section of the article I will demonstrate some methods to address the issues I’ve outlined above. In the interest of brevity I will not be sharing in-line code snippets, but I will make my notebook available for anyone to peruse.

The Data

I will be using 15 weeks’ worth of Pillar 1&2 test results in England between 28 May and 9 September, published on gov.uk. The data provides weekly totals of tests carried out and positive test cases, by pillar. Taking a quick look at the positive case totals shows us the familiar upward curve:

Fig 9 Weekly positive pillar 1&2 test cases — source

My job here will be to:

  1. Estimate the true prevalence of active infections in the test population given estimates of specificity and sensitivity of PCR testing
  2. Do this separately for pillars 1&2
  3. Adjust for differences in sample sizes over each week

Estimating True Prevalence

The idea that your medical tests are not 100% accurate is not exactly a novel one. Owing to this, a pretty simple formula has been derived as presented by Lewis & Torgerson (2012):

Where P_t is the true prevalence of the disease in the population, P_a is the apparent prevalence, and S_e and S_p are the sensitivity and specificity of the diagnostic test, respectively. Note that we have 3 unknowns here, the data only provides us with the apparent prevalence. Given that P_a is the probability of obtaining a positive test result, our observations (test results) will follow the binomial distribution:

P_a is parametrised by 3 unknowns, P_t, S_e, S_p — none of which we know with certainty. To solve this, I will use Bayesian inference with Markov Chain Monte Carlo (MCMC) sampling to draw random values of the parameters given prior distributions for them, comparing the results to the probability above given by the binomial PDF. My posterior for P_t should then converge around the real prevalence, given the test results.

Sensitivity & Specificity Priors

We briefly talked about the approximate sensitivities and specificities of PCR swab tests; the results for these vary wildly depending on a number of factors, such as demographic, geography, disease stage and time of study. Zhang & Du (2020) have pooled a number of these results together and presented estimated probability distributions for both sensitivity and specificity, from which I’ve taken inspiration to construct my Beta priors:

Which gives means of 0.821 and 0.983, respectively for the sensitivity and specificity of both Pillar 1 and 2 swab tests, but doubles the variance of Pillar 2 sensitivity to represent higher uncertainty of the home kits’ ability to correctly detect the virus.

True Prevalence Priors

This is where things get more difficult. COVID is a pandemic, therefore its prevalence will change week by week, community to community, demographic to demographic, so using research from overseas won’t be very effective here. The ideal data set would come from Pillar 4 testing, which was set up for this exact purpose. Sadly, the UK government does not publish the results (and it recently refused a Freedom of Information request to do so).

For lack of good data, my best bet is to initially believe what the results tell me and construct my Beta priors with means corresponding to the apparent prevalence on that week (scaling my parameters somewhat arbitrarily to represent my confidence in the results):

I can plot my priors for the true prevalence for the week commencing September 3:

Fig 10 Beta priors for sensitivity, specificity and prevalence

Monte Carlo Markov Chain Sampling

We can draw random samples from our 3 prior distributions (for both Pillars 1 and 2) to estimate our apparent prevalence as per our equation from before. We can then compare that to our observed data and assess how likely those samples are to arise given the binomial distribution of our apparent prevalence, and construct posterior distributions.

After 10,000 samples drawn, our posteriors for Pillar 1 look like this:

Fig 11 Posteriors for Pillar 1

Sensitivity and specificity have roughly remained constant, our estimate for the true prevalence of Pillar 1 testing has converged around a lower mean (0.006 down from 0.010).

Similarly for Pillar 2:

Fig 12 Posteriors for Pillar 2

Again, test accuracy metrics remain relatively consistent, whilst prevalence drops from 0.040 to 0.031.

Weekly Estimates

We can repeat the above process for our individual weekly observations, recreating the prior construction exercise based on our data. This allows us to plot the estimated prevalence against the reported one; I’ve included the 94% highest posterior density values as pseudo confidence intervals on the plots:

Fig 13 Estimated prevalence for Pillars 1 & 2

We observe that the estimated mean prevalence is consistently lower than the reported prevalence for both Pillars, however, the reported prevalence does fall within the 94% HPD range.

We can also construct estimates of total cases by pillar by scaling up the prevalence estimates by the total tests carried out, essentially giving us how many individuals we believe were truly infected from those tested:

Fig 14 Estimated cases for Pillars 1 & 2

Finally, we can combine estimates of Pillar 1 & 2 cases to compare it with the reported total cases:

Fig 15 Estimated total cases across both Pillars

What Did We Learn From This?

I’ve outlined a number of challenges raised with regards to COVID-19 reporting in the UK, and I’ve offered some ways to address them, demonstrating their effects given the data that is publicly available, specifically accounting for estimated specificity and sensitivity of PCR tests, looked at estimating the prevalence instead of total cases (removing testing volumes from consideration), and treated the two Pillars separately. What did this tell us?

There May Be Fewer Positives Than Reported…

In the population that was tested, that is. Given that our knowledge of PCR test accuracy is about right, and our priors are fairly representative, the actual positive cases may be lower than what’s been reported for Pillars 1 & 2. Having said that, even the reduced estimated prevalence would mean a much higher actual number of positive cases across the country, if we were to scale up to the UK’s population, so it’s probably no reason to start relaxing.

…But Cases Are Probably Still Increasing

In an ideal world we could arrive at better estimates of true prevalence with Pillar 4 test results, however the estimated (and reported) prevalence shows that even if we adjust for the number of tests carried out we still observe a relatively sharp rise in positive cases starting in September. While availability of pillar-specific data only allowed us to model up to September 9, it is reasonable to assume that the trend continues thereafter, looking at the total reported cases.

So What Now?

We’ve been able to address some of the challenges raised by skeptics and offer data-driven solutions to them. While our analysis showed an overall lower number of cases were likely to have been truly detected, we can’t disprove a sudden increase in prevalence. Hospital admissions may still be low relative to positive cases, but it is probably prudent to remain cautious in our new world, lest we find ourselves in a situation where we simply can’t treat our ill for lack of capacity.

Afterthoughts

Did I do something wrong? Could I have done something better? Did I do something well?

Please don’t hesitate to reach out to me on LinkedIn; I’m always happy to be challenged or just have a chat if you’re interested in my work.

If you want to play with the code yourself, please follow the link to my Google Colab notebook:

--

--