The Graphs You Need to Understand the Covid-19 Pandemic

The right graphs can be tremendously revealing — if you know what to look for

Steve McConnell
Towards Data Science
9 min readSep 30, 2020

--

As one of the contributors to the CDC’s Covid-19 “Ensemble” forecast model, I update a set of state and national graphs several times a week on my Covid-19 Spin Free Data Center. I include the charts that I personally find useful in understanding the status and trends of the pandemic.

National Graphs

The most foundational graph is the one that shows the raw data on daily positive tests and deaths, as shown below. The blue lines represent positive tests, and the red lines represent deaths. The axis is scaled so that the positive test scale is 10 times the deaths scale.

The main thing I look for in this graph is unexpected noise in the data. We typically see a weekly reporting pattern of underreporting on Sunday and Monday and overreporting on Tuesday and Wednesday. You can see that in the “hills and valleys” of this graph. Sometimes there are exceptions, such as severe underreporting around holiday weekends. If you look at the blue lines around Labor Day, you can see a conspicuous dip followed by a spike.

The next graph I review is the smoothed tests and deaths data graph. This smooths the data on a 7-day smoothing period, which eliminates the underreporting and overreporting. Usually I consider raw data to be less filtered and more meaningful than smoothed data. But in this instance, the smoothed data actually presents a truer picture of the course of the pandemic by reducing the effect of reporting irregularities.

It’s easier to see trends in the smoothed data. In the example below from September 28, it’s easy to see that the blue lines (positive tests) are leveling off after about two weeks of increasing.

This chart also makes it easy to see that the relationship between positive tests and deaths is not what it once was. If you look at the left third of the chart, you can see the red curve follows the blue curve, lagging slightly behind. But after about mid-May, the blue curve shoots up. The red curve still follows it, but the ratios are completely different. This is because vastly more testing was conducted after mid-May, and a higher percentage of infections were caught. Thus the ratio of deaths to positive tests decreased.

The final graph I review for national trending is the 7-day delta graph. This graph presents the same information as the smoothed testing graph, but in a form that makes it easier to see the rate at which positive tests are trending up or down. Red means positive tests are trending up. Green means positive tests are trending down. High red means positive tests are trending up quickly. Low red means positive tests are trending up slowly. For the math folks, you could think of this graph as showing the second derivative.

In the example below, you can see on the far right that positive tests aren’t trending up quite as quickly as they were a week earlier. But they’re still trending up, not down.

The series of green dips on the right show the history of attempts to contain the pandemic. We almost contain it (large green bars), then we fall back (short green bars, or short red bars). Then we almost contain it again, then we fall back again. You can see we went through that cycle three times.

On a weekly basis it is interesting to look at the logarithmic graphs of tests and deaths, below. These graphs are unusual in that they don’t plot time on the x-axis (the horizontal axis). The x-axis is the cumulative number of deaths or positive tests, and the y-axis is the incremental weekly number.

These graphs also tell the story of the pandemic in a different way. You can see that the pandemic peaked, started to fall, then peaked again a bit lower, then started to fall again. At least that’s the story on the deaths graph, below.

The story on the positive tests graph looks like there’s a slight third rise in positive tests starting. If that’s true, we would expect a third rise in deaths to follow about two weeks after the rise in positive tests.

State Graphs

With the basic national picture in hand, I turn to the state-level graphs. This first graph shows the per capita trend in positive tests. The position of the marker shows the number of positive tests per 1000 people over the preceding 7 days. The solid line shows the trend for 7 days before that, and the hollow line shows the trend for the next earlier 7 days.

The interesting thing about this graph is that the states are sorted west to east, from left to right. This makes it easy to see geographic clustering of virus activity.

Right now, you can see that the eastern part of the country is pretty quiet. The south is not quite as quiet as the east, but it’s still relatively quiet, except for SC and NC. The far west is somewhere between the east and the south.

The middle of the country is the most active in terms of virus activity. The midwestern states are all pretty high, and most of them have been moving upward for the past couple of weeks. In fact, the only state in the whole picture that’s moved downward is GA (shown with the green line).

The picture above shows virus activity on a per capita basis. The graph below shows state virus activity on an absolute basis. The horizontal line shows where each state was a week earlier. If the bar for the state is above the line, the trend is upward. If the bar is below the line, the trend is downward. (States are listed alphabetically on this graph rather than west to east.)

ND might be the most active on a per capita basis, but TX has 35 times as many people, so it easily has the most activity on an absolute basis, and that’s easy to see from this figure. Similarly, CA is pretty quiet on a per capita basis, but with a population of 40 million people, it’s still #2 overall on an absolute basis.

The relationships between the lines and the bars makes it easy to see the trend in each state. FL and GA are trending down. CA is basically stable. NC is trending strongly up. TX is still trending significantly up. Most other states are stable or trending slightly up.

I also look at the summary of open readiness scores from the state dashboards I provide on my website. The open readiness scores are a combination of absolute numbers of positive tests, recent trends in testing, the history of the pandemic in each state, and other factors. Sometimes the open readiness score provides a hint that the other graphs do not. It also provides a different view of geographic clustering.

In this case, the graphic shows pretty clearly that the bulk of virus activity is more or less bounded by the Rocky Mountains to the west and the Mississippi River to the east.

Through the middle of May, the pandemic was concentrated in the northeast. Throughout the summer, it was concentrated in the southernmost states. Now the pandemic has moved mostly into the midwest.

Most of the national media coverage of the pandemic has focused on the politics, but my reading of the data says there’s been an interesting and important geographic aspect to the pandemic that hasn’t received as much attention as it deserves.

Advanced Graph Reading for the Really Serious

The most complicated graph, but also potentially the most revealing, is the Long Term Ratio graph. This is kind of a weird format that plots several factors against their long term averages. That means every line is centered around 1.0, and you can study how the lines rise and fall relative to one another.

If you study the left side of the graph, you can see the solid blue estimated cases line rise and fall, and the red deaths line rises and falls on a delayed basis after the blue line. This pattern repeats on the right side of the graph, but at a lower level. Deaths lag cases, which is what you would expect.

You can see the relationship between positive tests (the dashed blue line) and cases (the solid blue line). The lines track with each other (which they should), but you can see how the ratio of tests to cases flips over the course of the pandemic, with positive tests capturing a higher percentage of case on the right side of the graph than on the left.

You can also see the total tests line increasing steadily from left to right through the end of July. The rise of the gray line is part of what explains why the solid blue line and the dotted blue line reverse positions.

You can also see that the gray line plateaus for several weeks in late summer. Meanwhile, the positive tests line trends down for several weeks, after which the total tests line trends down. That suggests to me that testing wasn’t capturing as many actual cases, so eventually it decreased. If the total tests line had headed down, and the positive tests line had lagged it, then I would think that the positive test numbers were being artificially deflated. But that isn’t what the graph shows.

The orange hospitalization line is interesting because on the left side of the graph it lags deaths. That doesn’t make any sense, and I believe that’s a data quality issue. States were not very consistent in reporting hospitalization data early in the pandemic, but they have improved over time.

On the right side of the graph you can see that the orange line closely tracks the solid blue line, which makes sense. The red line tracks but lags the orange line, which also makes sense.

That all brings us up to the present day. What we have right now is that total tests are increasing sharply. Positive tests are also increasing rapidly, but not as sharply as total tests. That suggests that total virus activity is flat or even declining. The fact that the hospitalization line is also declining suggests the same thing.

Summary

I like visuals, and I’ve always enjoyed studying graphs to try to extract as much information from them as possible. If you stop after you answer, “Is it going up or down” you often miss the most important information the graph is trying to convey. This set of graphs, collectively, conveys quite a comprehensive story about the past, present, and potential future of the pandemic in the US.

More Details on the Covid-19 Information Website

I lead the team that contributes the CovidComplete forecasts into the CDC’s Ensemble model. For updates to these graphs, more graphs, forecasts at the US and state-level, and forecast evaluations, check out my Covid-19 Information website.

My Background

For the past 20 years, I have focused on understanding the data analytics of software development, including quality, productivity, and estimation. The techniques I’ve learned from working with noisy data, bad data, uncertainty, and forecasting all apply to COVID-19.

--

--