Ranking recovery of US states after COVID-19 first wave, using Google mobility data

Visualizing mobility data trends of Saxophone shape graphs

Dror Berel
Towards Data Science

--

Rumor has it that COVID-19 is a great opportunity to show your data-science / analytics skills, and you don’t have to be an epidemiologist in order to share your 2 cents with the world. Well… here is my attempt.

Google has shared a public data set about daily mobility across the world, before, during, and after COVID-19 first wave, starting February 15, 2020. The version I have has 6 types of mobility measures:

Mobility retail and recreation

mobility grocery and pharmacy

mobility parks

mobility transit stations

mobility workplaces

mobility residential

In this post I focused on overall state level analysis, for each of 51 US states. However, data source also has information for several regions within states, and for other countries.

Analysis goal: Measure and rank states recovery between first drop in mobility trends, till back-to-normal phase (at time of analysis).

Disclaimer: The purpose of the following analysis is to demonstrate data visualization techniques, analytic reasoning, and mostly satisfy my own curiosity. Some ad-hoc quick heuristics (assumptions) may have been adopted for the purpose of focus on high level insight. There are of course multiple possible biases that might better explain the data, which will be discussed at the end.

Data: I don’t really know how Google measures mobility, nonetheless, I don’t know their sample size, experimental unit, data collection procedures and aggregation, and validation. However, looking at trends of the time series for mobility data at state level, seem to be well corresponding with clear signals trends.

Figure 1: Daily mobility data trends (smoothed). credit: self.
Figure 2: Daily mobility data trends (smoothed). credit: self.

One can easily observe the drop across mobility of retail and recreation, parks, grocery and pharmacy, transit stations and workplaces. However, an opposite trend is observed for residential mobility. Well… if people stopped driving to places, and after sitting at home most of the time, they started to do something else… most likely, walking around their houses, getting to know their neighbors which otherwise they were rarely talking to. (At least with my experience of size =1).

Correlogram plot for the above 5 mobility measures with each other, support the positive correlation hypothesized earlier, and the negative correlation with residential mobility. States that divert from such correlation trends may indicate different behaviors, perhaps strong effects of weather seasonality, or different concentrations of population in rural areas.

Figure 3: Correlogram for WA. credit: self.
Figure 4: workplace mobility for each state. credit: self.

Metric definitions:

1st major drop date: the first date with mobility date below -10. (vertical dashed line).

Average mobility during ‘pre-drop’: average from February 15, 2020 until above date. (horizontal, full green line at left side). This is also used to be the assumed ‘back-to-normal’ level post the first drop, assuming there are no other seasonal effects, or other trends (dashed green line).

Average mobility during ‘back-to-normal’: average signal between July 1, 2020 and August 16, 2020, last collected day when analysis was done. (dashed blue line at right side).

The gap between the blue and green line is used to indicate if the state was able to recover to its mobility level before the drop.

Figure 5: WA workplace mobility data, with pre and post average levels. first drop is a holiday (President day). credit: self.
Figure 6: WA retail and recreation mobility data, with pre and post average levels. credit: self.

Cross-state recovery comparison:

Calculating the above for each state, the following is the rank of the state based on its recovery gap for workplace mobility.

Any of the other mobility measures could be used as well, however, to avoid possible biases of geographic weather, or other socio-demographic characteristics, I assume workplace mobility to be the least biased.

Another option is to create an average of the 6 mobility measures (after reversing the negative correlation with residential mobility).

Figure 7: state recovery rank based on workplace mobility. credit: self.

Most affected state is DC, followed by MD, CA, AZ, TX and NY. Least affected states, but still with some gap to fill are ME, SD, WY and AK.

Figure 8: state recovery rank based on recreation mobility. credit: self.

Recreation mobility has similar patterns to the above workplace mobility, however at the bottom, there are couple of states that were able to close the gap, and even getting to higher level than before the drop (WY, SD, AK, ME, MT).

Possible biases:

On top of the issues already mentioned, the following may be better handled.

Time series: seasonality, remove holidays that are not observed equally across the states.

Possible confouders: socio-demographic unique characteristics such as rural vs urban areas, unemployment level, population age, pandemic/contamination level, population density, political affiliation, virus mutation, health system, and many, many others.

The open data set also have additional well annotated information that could definitely shed more light on the above, however is outside the scope of this post.

What’s next?

Well, other than some simple rough assumption to estimate the date of the major 1st drop, this analysis is descriptive only, i.e. there is no fancy inference (prediction) estimation.

Future posts in this series will attempt to demonstrate some statistical modeling such as time-series (forecasting), machine-learning, and causal-inference. Stay tuned!

Where is the code?

Will be glad to share it upon request. R packages that were used are: tidyverse, tidyverts, jsonlite, geofacet, gt, GGally, urbnmapr, sf, and rnaturalearthdata.

Inspired by my friend’s blog post.

Check my other blog posts and my GitHub page for additional fun reading. LinkedIn.

--

--