Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an author’s works without seeking professional advice. See our Reader Terms for details.

As COVID-19 deaths reach a new daily high in the United States, it feels like an appropriate time to use basic Python concepts to explore the stats around the pandemic. We are at the start of what may be a very difficult winter, but not all of us are in agreement about whether Covid-19 even poses a credible threat. The goal of this article is to put COVID-19 stats into perspective and directly address some common misinformation. Download the Jupyter Notebook here.
The Misinformation
- "COVID-19 is all hype…the scary models in the beginning were totally wrong."
- "The cure is worse than the disease"
- "COVID-19 is just like the flu"
The Tools to Bring Perspective
- Worst Case Scenario Calculator: plug in your own population, R0, and mortality rate to estimate total deaths in a worst case scenario
- Charting Cause of Death: simple bar chart showing annual mortality from leading causes
- Flu vs. COVID-19 Charts: simple bar charts showing flu stats vs. COVID-19 stats
- Animated Disease Spread: visualize the spread of a disease with different number of days infectious and mortality rates using matplotlib
1. "COVID-19 is all hype…the scary models in the beginning were totally wrong."
On March 16th 2020, Imperial College’s Professor Neil Ferguson released a COVID-19 mortality model that estimated under the WORST CASE scenario, with ZERO precautions taken, the UK would lose 550k people to the virus, and the USA 2.2 million. This shocked the world into action.
However, in this same publication, the Imperial team provides tables demonstrating expected number of deaths under situations where precautions ARE in fact taken. These include "case isolation, home quarantine, social distancing, and school/university closure." With these precautions in place, the estimated deaths in the UK drop DRASTICALLY to a range of 5.6k – 48k over two years. As of 12/3/20, ~60k people in the UK have died from COVID-19. So, like any decent (though by definition imperfect) model, the reality fell somewhere between the best and worst case scenarios.
As the month passed, Professor Ferguson and team made edits to the model as new facts came to light (note: this was before we even had meaningful testing!). Certain elements of the media were swift to jump on any modifications to the model, claiming that these changes meant the whole model was wrong and should be dismissed, and consequently COVID-19 no longer poses a credible threat. The WSJ’s Editorial Board (with whom I am known to have my differences!) wrote an op-ed that accurately argued:
"Critics are bashing him for the revisions, but not so fast. Mr. Ferguson didn’t change his model so much as adjust for new circumstances. In particular he believes that Covid-19 is more transmissible than he previously had thought-but because strong measures had been implemented, deaths would be far lower than his worst-case scenario. There’s a warning here about science and journalism. Surely if we hope to neutralize a pandemic we don’t fully understand, we need to encourage a culture in which scientists feel able to adapt and clarify with new evidence."
So yes, scientists and mathematicians will tweak the assumptions they feed into their models as new information comes to light. But the idea that the initial worst case scenario was wrong by "orders of magnitude," as some allege, is a miscomprehension of the fact that the PUBLICATION of the model itself was in part responsible for the (multiple) lockdowns the UK put in place, which by default reduced the likelihood that the outcome would resemble the model’s worst case 550k figure.

Tool #1: Worst Case Scenario Calculator
We can even demonstrate that the worst case scenario is not THAT far off base by using an overly simplified (and highly theoretical) estimate of a worst case death total. How? By plugging in an estimated population, R0 value, and estimated mortality rate. With these three inputs we can calculate: estimated herd immunity rate, estimated number of people infected, and estimated number of deaths.
R0 ("r-naught"): The number of people (on average) an infected person will infect. Represents how contagious a disease is. Current best estimate from the CDC is 2.5.
Mortality Rate: While the mortality rate of COVID-19 is difficult to gauge, given advances in treatment since the start of the pandemic, one recent estimate is 0.6%.
Population: Current estimate of the US population is 328,200,000.
Herd Immunity Rate: Per the Mayo Clinic, "Herd immunity occurs when a large portion of a community (the herd) becomes immune to a disease, making the spread of disease from person to person unlikely. As a result, the whole community becomes protected – not just those who are immune." How to estimate herd immunity: 1 – (1/R0).
Our Python function will calculate the herd immunity rate, multiply this rate by the total population to calculate total infections, and then multiply total infections by the mortality rate to derive total deaths.
Calculation using current best estimates
- Population = USA population of 328,200,000
- R0 = CDC’s best estimate of 2.5
- Mortality Rate = recent estimate of 0.6%
Calculation using mortality rate based off of confirmed cases and deaths in the USA as of 12/3
Mortality Rate = 27,3518 / 13,999,300 = 1.95%
Note this mortality rate is clearly too high, as we do not accurately know ALL COVID-19 cases (particularly asymptomatic ones), but I include it as a rough estimate of what little we knew in March, when the initial models were created.
If you download the notebook, you can try it out for yourself with your own inputs!
2. "The cure is worse than the disease"
As of 12/3/20, >270k Americans have died from COVID-19. Despite this fact, in the United States debates rage over whether COVID-19 even poses a real threat (see part 1, above). If 2018 trends (per the CDC) hold relatively stable in 2020, COVID-19 is already the third most common cause of death in 2020, after Heart Disease and Cancer.
A common talking point against "lockdowns" is that we "can’t let the cure be worse than the disease." The point here is that the increase in poverty and drug overdoses (among other issues) as a result of prolonged economic shutdowns outweighs the benefit of slowing the spread of COVID-19. To be clear: an increase in poverty and/or overdoses is devastating for the nation and specifically for the communities impacted. Measures must be taken to address the awful side effects of the pandemic and its associated policy responses. However, I hope to illustrate that the cure is NOT worse than the disease, because the disease is pretty dang bad.
On poverty: It seems reasonable to suggest that nations were faced with two options to avoid a spike in poverty in 2020. 1) Nations could keep their economies open, let the virus rip through their populations, and hope citizens continue to venture out of the house despite mounting deaths. 2) Nations could shut their economies down (in targeted locations, when possible) and use government resources to support the unemployed. With proper government stimulus (akin to the CARES Act) we might reasonably hope to mitigate a severe increase in poverty. What if a nation chose neither path? We will see what happens in the United States in the absence of a second stimulus bill. Already, government inaction paired with spotty economic lockdowns has led to a stark rise in poverty.
Although deaths from poverty can be difficult to ascertain, one study of deaths back in 2000 concluded that 133,000 deaths could be attributed to "individual-level poverty" and 39,000 to "area-level" poverty. Taken together, these numbers clearly indicate that deaths as a result of poverty are a massive issue (at pandemic levels in their own right) facing our nation. Fortunately (or perhaps, in some nations mired in gridlock, unfortunately), a nation’s government has the power to directly address the pandemic of poverty (anyone remember those checks back in the day?).
On overdoses: Overdoses, the majority of which are due to opioid use, are quite appropriately referred to as an epidemic in the U.S. According to the CDC, between 1999 and 2018 (a 20 year period), approximately 450k Americans died from an opioid overdose. In 2018, the number of total drug overdoses was 67,367 (same source). These numbers are jarringly high and demand our attention. However, a rise in overdoses as a result of economic shutdowns is statistically highly unlikely to outnumber the deaths that would occur due to the unfettered spread of COVID-19. Even with periodic lockdowns and other safety measures put in place, >270k Americans have already died from COVID-19 in ~9 months alone, ~4x higher than deaths caused by overdoses in 2018.
In summary, the stats clearly illustrate that the threat of COVID-19 is meaningful and warrants policy intervention to attempt to slow the spread of the #3 killer in the USA in 2020. Negative side effects from policy intervention (i.e. lockdowns) include spikes in both poverty and drug overdoses. While these effects demand our attention, a spike in poverty can be directly mitigated by government action and overdoses are greatly outnumbered by COVID-19 deaths. To sum it up: the cure is NOT worse than the disease.
Tool #2: Charting Causes of Death
We’ll add the USA mortality stats to a dictionary, which we will convert to a pandas dataframe and chart using seaborn. This will help us put the COVID-19 deaths into better perspective. Note: all mortality data is as of 2018, except for COVID-19 which is of course YTD 2020.
3. "COVID-19 is just like the flu"
Mortality Rates
Just as more and more people are questioning whether COVID-19 is a credible threat (see part 1), or whether the costs of containing it outweigh the benefits (see part 2), there is a growing population that asserts that COVID-19 is "just like the flu."
The chart above already illustrates that COVID-19 deaths year-to-date vastly exceed the combined number of deaths caused by both Influenza AND Pneumonia in the full year of 2018. But let’s dive deeper, focusing specifically on the flu. We’ll use data from the 2019–2020 flu season for this exercise.
Per the CDC, during the 2019–2020 flu season in the U.S., about 38 million people had the flu and about 22,000 people died of the flu. We’ll compare that stat against the latest year-to-date COVID-19 numbers (13,999,300 people infected; 273,518 people dead as of 12/3/20)
Tool #3: Mortality Rates – Flu vs. COVID-19 Charts
Known Infections
Known Deaths
Observed Mortality Rate (Known Deaths / Known Infections)
From the charts above, we can see that although 2.7x more people were infected with the flu, 12.4x more people have died from COVID-19 YTD 2020 than died from the flu in the 2019–2020 season. Why? Because the current observed mortality rate (number of people who have died / number of people who have a confirmed case) is 32x greater for COVID-19 than the 2019–2020 flu mortality rate in the US.
Now of course the mortality rate was at its worst at the beginning of the pandemic, before major strides in treatment were made by our incredible healthcare workers and scientific community. AND we weren’t (and still aren’t) catching every case. Still, with testing increasing and daily cases of ~180k preceding ~1.5–2k daily deaths three weeks later, the mortality rate for COVID-19 still appears to be meaningfully higher than the flu.
Perhaps you fall in the camp of people who believe the mortality rate is actually MUCH lower than it seems, because we’re missing a LOT of asymptomatic cases. In the final chart below, we display the COVID-19 infections needed to result in the number of deaths we have observed, at various mortality rates. We display these mortality rates as multiples of the mortality rate of the 2019–2020 US flu season (.06%). So, for instance, if COVID-19 is "just like the flu" with a 0.06% mortality rate, we would need to have 455,863,333 cases of COVID in the US to result in the 12/3/20 death total: 273,518. Clearly that isn’t possible, as the U.S. population is approx. 328,200,000. You can continue with this thought experiment up to 20x the mortality rate of the flu.
Hypothetical COVID-19 Total Cases at Various Mortality Rates

From this table and chart, we can see that even if we have missed 50% of all cases, and the true number is ~28M cases, COVID-19 would still be 16x deadlier than the flu during the 2019–2020 influenza season.
Disease Spread
Despite the fact that there have been fewer confirmed cases of COVID-19 YTD 2020 vs. cases of the flu during the 2019–2020 influenza season, there is also reason to believe that COVID-19 is more infectious than the flu when left unchecked. (Note that we also have annual flu shots that a good portion of the population receives, which reduces the spread of the flu right off the bat.)
So COVID-19 seems to be deadlier and more easily spread than the flu. It also seems that people are contagious for longer with COVID-19 than with the flu (from same source), even if they are asymptomatic. While people with the flu are contagious for up to 7 days, the scientific evidence currently suggests people with COVID-19 are contagious for at least 10.
How do these two factors (days infectious and mortality rate) contribute to the spread (or relative containment) of the two diseases?
Tool #4: Animated Disease Spread
We will use matplotlib animations to look at the theoretical spread of a disease within a population. Key features:
- 1000 People moving at "random walks" within a grid of a specified size
- 1 Person will be randomly infected with the disease (Patient Zero)
- At each time step, the disease will be spread to any healthy person within 1 coordinate of an infected person who is actively contagious
- At the end of each infected Person’s contagious time period, they will randomly die according to the mortality rate specified
- We will run the simulation for 365 time steps
NOTE: This is for demonstration purposes only. It is not intended to produce accurate scientific projections.
To create this simulation, we need to first create a new Python class: "Person." A Person will have: a location (x and y coordinates), infection status, alive status, number of people they have infected, number of infectious days remaining, disease mortality rate, and range of possible x and y coordinates (based on size of graph).
This new class, "Person" will also have a method associated with it, called "time_step()." During one time_step(), a Person will take a random step (within the confines of the graph) and if they are infected, their remaining days infectious will be reduced by one. If their number of infectious days remaining is exactly zero, we randomly decide whether the Person dies of the disease, using the specified mortality rate.
"Person" also has a method enabling each Person to infect another instance of a Person. You can infect someone if you are infected, alive, and still infectious. The other Person must not yet be infected, and must have a location within 1 coordinate point of you (both x + y). If you infect the other Person, the number of people you have infected increases by 1.
Now that we have created this new class, we can create a few helpful functions that will aid in our simulation. You can see the details of these functions in the Jupyter Notebook. They create a list of Person objects, pull the position and healthy/infected/dead stats from a list of Person objects, cycle all Person objects through one time_step, and pull healthy/infected/dead stats from a scatterplot.
Finally, we can animate two different scenarios. In one, a disease exhibits flu-like characteristics. In the other, a disease exhibits COVID-19-like characteristics. Note that each time the simulation is run, the results will differ. The results will depend on the location of Patient Zero, the location of all other Person objects, the random walk of the Person objects, the random chance of death based on mortality rate, etc. Think of it this way: sometimes we have bad flu seasons, and sometimes we have light ones. Therefore, the recordings below are just two versions of how the simulation might run. Feel free to run the simulations as many times as you’d like, by downloading the notebook!
Spread of Flu-like Disease
Days Infectious = 7; Mortality rate = 0.06%
Spread of COVID-19-like Disease
Days Infectious = 10; Mortality Rate = 0.6%
Taking stock
So, in our animations we are able to see the impact of changing days infectious and mortality rate. With three additional infectious days, in our COVID-19 simulation 47.4% of people were infected vs. 15.0% in our flu simulation. Similarly, the 0.6% COVID-19 mortality rate resulted in 4 deaths (0.4%) in the total population vs. 0 deaths in our flu simulation with a 0.06% mortality rate. Now, you might say to yourself "0.4% is nothing!" Well… if 0.4% of the US population died from COVID-19, that would be 1.3 million deaths. Now we’re back in millions like those original models…
Conclusion
Let’s see what we figured out using Python and some good old critical thinking. Did any of the misinformation hold its own against the sheer power of my data visualizations?!?!?!
The Misinformation
"COVID-19 is all hype…the scary models in the beginning were totally wrong."
Not so fast! The worst case scenarios depended on ZERO precautions being taken. That clearly didn’t happen – hello masks! goodbye indoor dining! Plus, a simple Python function shows us that the unfettered spread of COVID-19 could very well result in >1 million deaths, even with all of the better information we know now.
"The cure is worse than the disease"
Without a doubt, the side effects of economic shutdowns will be devastating if we take no action. Spikes in poverty and drug overdoses demand our attention. However, government intervention CAN and SHOULD address the rise in poverty, while a simple chart showing the leading causes of death demonstrates that COVID-19 is already the #3 leading cause of death in 2020 (if 2018 trends hold). The cure is NOT worse than the unfettered spread of the disease.
"COVID-19 is just like the flu"
We’ve debunked this on two levels: mortality rate and disease spread.
On mortality rate, the number of COVID-19 deaths (>270k at the time of writing) and confirmed cases (~14M at the time of writing) demonstrate that the disease has a significantly higher mortality rate than the flu. Even if we have missed 50% of all cases, and the true number is ~28M cases, COVID-19 would still be 16x deadlier than the flu during the 2019–2020 influenza season.
As far as disease spread, a simple matplotlib animation helps us see how an additional 3 days of being contagious, plus a 10x greater mortality rate (0.6% vs. 0.06%), can impact both the spread of a disease in a population and total deaths.