Data Visualization of COVID-19 in the US

Have we turned the corner?

Daniel Reiff
Towards Data Science

--

Photo by Yassine Khalfalli on Unsplash

On January 19, 2020, the first case of novel coronavirus (COVID-19) reached the United States when a 35 year old man in Washington state walked into a clinic with what we now understand as common symptoms of the virus: respiratory inflammation and a fever. By April 30, 2020, there were more than 1 million cases of COVID-19 in the U.S. Most Americans were truly shocked by the spread of the virus.

COVID-19 Growth Dynamics

As is the case with epidemics, COVID-19’s growth was exponential at first. Imagine a person with COVID-19 going to a party and spreading the disease to three people. The next day those three people spread the virus to three other people each. Within two days, one case of Coronavirus has turned into fourteen.

The number of additional cases stemming from one can be expressed by a growth metric or the variable commonly known as R0. R0 is determined by the ability of the virus to transmit and the average number of people a carrier of the virus is exposed to. R0 is variable throughout the life of an epidemic.

Epidemics like COVID-19 do not grow at exponential rates forever. Over time, an epidemic’s growth comes to resemble and S-shaped logistic curve. At the start of an epidemic, R0 is greater than one indicating exponential growth. As the virus either progresses throughout an entire population or is blocked by a vaccine, case isolation techniques, and/or social distancing strategies, the growth rate reaches an inflection point.

Exponential vs. Logistic Growth

Exponential growth eventually turns to decay as RO drops below one. The transition from exponential growth to decay is often described in the media as “bending the curve.” As decay gathers momentum and R0 approaches zero, growth rates can actually decline exponentially.

A vaccine for COVID-19 is unlikely to be ready anytime soon. Case isolation techniques (including widespread, rapid testing and contact tracing) are only in their infancy in the U.S. Therefore, we have been heavily reliant on social distancing to slow the rate of growth. Obviously, social distancing imposes huge economic and social costs. We don’t want to continue social distancing longer than absolutely necessary.

But one of the scariest aspects of COVID-19 is the uncertainty and abstract nature of exponential growth. There are no rules to tell us how long exponential growth can continue. It is very hard to tell if we are beyond the inflection point while the epidemic is ongoing. Simply comparing numbers of new cases over time is not sufficient. Slower growth for a brief period of time may not be evidence that the growth rate has truly begun to decay. R0 can decline for a few weeks and then spike upwards again. On the other hand, new cases will continue to grow even during the decay phase. If we focus only on the raw numbers, miss the infection point and underestimate the velocity of the decay phase, we could unnecessarily prolong social distancing and inflict unneeded pain on the most economically vulnerable Americans.

We need a better way to analyze and visualize the COVID-19 growth rate data. Ideally, we should present the data so that laymen can understand at a glance where we stand at any moment in time. Specifically, we should be able to present the data so that laymen can quickly answer these key questions:

Have we passed the inflection point? Have we “bent the curve?”

Where is each state in its quest to slow the spread of the virus? Are there large differences by state and by region?

What happens after growth has turned to decay? How long does the decay phase last?

The Method

Using a process of data preparation, exploration, and visualization in Python, I am going to attempt to present the COVID-19 growth rate data in a more useful way.

On a logistic curve, the moment exponential growth turns to decay is commonly called the inflection point. This is the moment the growth metric or R0 drops below one (a case of infection would lead to, on average, less than one other case of infection). Clearly, the inflection point is a significant landmark towards virus eradication. How do we know if we have reached the inflection point? When looking at a logistic curve differently, using a logarithmic scale, we can answer this question in real time.

To gain another perspective on an S-Curve, it is helpful to reconsider what exponential growth implies: new cases with respect to time are proportional to existing cases. Let’s look at the relationship between existing cases and new cases on a logarithmic scale, where units on the axis increment by factors of ten. Doing this allow us to clearly see exponential growth as a positive linear relationship between existing cases and new cases. At the inflection point, the relationship between existing and new cases is no longer linear. Instead, the relationship become increasingly negative as fewer new cases occur.

Left (logistic), Right (existing vs. new)

A New Perspective on Logistic Curves

Each graph is a different expression of the same data. On the left is a logistic curve showing how existing values change over time. On the right, is the log of existing cases on the x-axis and the log of new cases on the y-axis. Each point is a point in time. As time progresses, points migrate left to right.

Not all logistic curves are the same, especially in the context of an epidemic. Here is an example of two logistic curves that tell different stories. On top, we see a curve that rises steeply and quickly but also decays quickly. Essentially, growth falls off a “cliff.” After the cliff is a steep drop and total cases settle at 6000. On the bottom, we see a curve that never had quite as steep growth as the curve on the right, but far more gradual decay after the inflection point. The total cases eventually settle at 10000. Like these differing logistic curves, COVID-19 has seen growth decay faster in some countries than others.

Sharp Decay Logistic Curve
Gradual Decal Logistic Curve

How do logistic curve growth dynamics play out in an epidemic? Applying these concepts to COVID-19, here are cases in South Korea (widely regarded as the most successful case of containing the epidemic), Italy (a country who has passed the inflection but is not out of the woods yet), and the United States. On the left, we are looking at total cases over time for each country. On the right we are looking at the situation from a different perspective, Existing Cases vs. New Cases.

For South Korea, the rate of change for new cases has been in relatively sharp decay since the inflection point occurred on March 4th. Looking at Existing Cases vs. New Cases, we can observe that the drop after the inflection is quite steep. It really did “fall off a cliff.” The data describes to us a picture of effective South Korean policy in dealing with COVID-19. South Korea is now in the process of reopening its economy and society, including schools and non-essential businesses.

In Italy, the story since reaching the inflection point has been much different. In Italy growth decaying at a much slower rate. In general, the other side of the inflection point been more like a “hillside” than a “cliff.” Italy remain much more vulnerable to a second wave of exponential growth than South Korea. Italy has yet to begin reopening most of its economy and expects to begin a gradual process in early May.

When looking at the US curve, it appears we have turned a corner in terms of reaching the inflection point. But the question remains, will COVID-19 growth in the U.S. “fall off a cliff” like South Korea or decline more slowly like Italy? So far, the data suggests the Italian scenario is more likely with potentially an even more gradual decay in growth. While the U.S. has passed the inflection point, we may be in for many more months of significant number of new cases. We may have great difficulty re-opening our economy and school system.

Looking at individual states we get a more detailed picture of the current situation in the US.

In the US, Every State Tells a Different Story

Updated May 9th

There is a lot to digest in this visual. Let’s start by looking at the scatter plot on the right. Here we are plotting Log(Existing Cases) vs. Log(New Cases) like we were when comparing US and South Korea. Constant growth along the identity line (Log(Existing Cases) = Log(New Cases)) indicates exponential growth. Dropping off this line indicates hitting the inflection point. The rate at which a state drops of the line indicates how fast the states cases are decaying.

On the left hand side, we have states in the US colored by a 10-day rolling average of their case growth. A smoothing average was applied due to inconsistent reporting over time across 50 different states.

  • Red colored states indicate a period of exponential growth.
  • Lighter red colored states indicate nearing the inflection point.
  • White states indicate the state is at the inflection point
  • Lighter blue colors indicate the state’s case rate is decaying slowly.
  • Blue colors indicate the state’s case rate is decaying fast.

Takeaways

  • From early-mid March through early-mid April, COVID-19 grew exponentially in all states
  • Since mid April, we have seen a drop in growth rates across all states
  • While the US as a whole has appeared to reach the inflection point, there are states at every stage of growth.
  • There are intra-region similarities and discrepancies.
  • A handful states have fallen off the cliff like South Korea and are experiencing rapid exponential decay. These states are less populous and rural in the far northwest and northeast (Wyoming, Alaska, Idaho, Montana, Maine, and Vermont)
  • There are states that remain close to exponential growth. These states are in the upper Midwest (Minnesota, Iowa, Nebraska, and Kansas)
  • For the states with the most cases (New York, New Jersey, and Massachusetts), the other side of the inflection point appears to be very gradual. Essentially, these states are stuck at the top of cliff; the growth metric is below one, but moving very slowly towards zero.
  • Intra region discrepancies in terms of logistic growth dynamics are ever-present. Examples include Virginia/West Virginia. Louisiana/Mississippi, Oregon/California, Montana/North Dakota, and Vermont/New Hampshire. This is especially significant as governors of California, Oregon, and Washington are all announcing similar reopening plans purely based on region.
  • Perhaps the scariest finding: we have seen some states experience a drop in growth followed by a second wave of increased growth (Massachusetts, Colorado, Tennessee, and South Carolina)
A Second Wave?
  • It is important to note that testing and reporting throughout the COVID-19 epidemic has been problematic. Sudden increases in growth could be due to a rapid increase in testing and reporting capturing cases that existed all along. At the same time Colorado, Tennessee, and South Carolina have all been in the news recently for reopening parts of their economies.

Final Thoughts

While hitting the inflection point is a milestone for the US in eradicating COVID-19, the data is showing us that there are months of virus ahead of us. We can clearly see from the data, that many states are not falling off the cliff to fast case decay like South Korea. In some states, we have even witnessed increasing growth metrics after the inflection point. The data has made a couple points crystal clear:

  • COVID-19 policy should be determined on a state basis due to logistic growth discrepancies (even intra-region). States are moving at different rates in different stages of the virus.
  • Despite “turning the corner” and passing the inflection point in every state, this is no time to encourage return to normalcy. The majority of states are not falling off a cliff into exponential decay and have months and thousands of cases of COVID-19 ahead of them. Returning to normalcy will only lead us back to exponential growth.

I will continue updating the data for each visual weekly until the time has come to return to normalcy.

Sources

--

--

Machine Learning Engineer at Forsight | Projects @ https://github.com/reiffd7 | Interested in Computer Vision and Data Visualization