A Problem with that Restaurant Air-Conditioning Study

Robert (Munro) Monarch
Towards Data Science
7 min readMay 12, 2020

--

You might have seen an image recently that showed potential transmissions of SARS-CoV-2 in a restaurant in Guangzhou. It is taken from a paper that is not yet finalized in the journal of Emerging Infectious Disease. It is concerning that so many media outlets have been sharing this article and image as fact.

The study reports how people at three adjacent tables later tested positive for SARS-CoV-2, but people at other different tables in the restaurant did not. The authors conclude “droplet transmission was prompted by air-conditioned ventilation.” The authors report the average distances between tables but do not give exact distances. When you look at their graph on the left below, you can see that they treat 8.3 meters and 17.5 meters as exactly the same distance, turning a long narrow restaurant into a square one. This makes the tables without infections seem much closer than they really were if we space them out equally according to the actual dimensions:

The restaurant layout from the Guangzhou, rearranged here to show the actual dimensions of the restaurant.

As you can see in the image, the tables without people who were infected seem to be much further away when we plot the distances more accurately. For reference, the paper is here:

And this is the original graphic in that paper:

Source: https://wwwnc.cdc.gov/eid/article/26/7/20-0764-f1

The error does seem to be an error in the “17.5m” alone, because the authors report the total square-meters of the space as 145 square meters, which is consistent with the 8.3m x 17.5m dimensions. The horizontal dimensions with the “6.0m” and “8.3m” measurements seem proportional, so it is only the vertical dimension that is problematic.

I’ve worked in epidemic tracking and disaster response for many years, including in spatial data modeling, and I have never seen someone compress a dimension like this (although I’ve never worked in spatial data modeling for disaster response inside of rooms). It is especially concerning that the compression on the vertical dimension makes the image support their conclusion but the compression and the reasons for it are not mentioned in the paper.

The time overlap is also confusing. According to the paper, Table A overlapped with Table B for 53 minutes and with Table C for 73 minutes. How long did Table A overlap with E or F? That is not reported.

The world’s largest media organizations have reproduced the distorted image without commenting on it. For example, The New York Times reported this graphic under the headline of “How Coronavirus Infected Some, but Not All, in a Restaurant”. The Hindustan Times reported under the headline of “AC air flow spread Covid-19 in China restaurant”, and go further by carefully created a new version of graphic where they introduce it as “WHAT HAPPENED”:

The Hindustan Times carefully reproduced the entire image and created calendars for every infected table, but failed to notice that their “5 metre” dimension was really about “3.5 metre” if on the same scale as the “6 metre” dimension. They also dangerously report the ambiguous findings as “what happened”. Source: https://www.hindustantimes.com/world-news/ac-air-flow-spread-covid-19-in-china-restaurant/story-4XU9YVC3bAyAkHahV7bMOO.html

Can we trust this study?

No, this study cannot be trusted. We already know that distance is one of the main factors in transmission and this study offers no evidence to change this. The paper cannot be trusted until it reports the exact distance between the source person at Table A and the people at Tables B, C, E and F, and the lengths of time they overlapped. At the moment, the paper only reports the distance between tables as “about 1 meter”, which is highly unlikely for Tables E and F when we look at the space available in the non-distorted graph.

Even if the distances and time overlap between people are the same, like the paper implies, the results are still weak. For example, there are 21 people who are not at Table A but are presented as sitting near Table A, of which 5 later tested positive:

5/21 = 23.8% chance of being infected

Tables E and F have 5 people at them. So, what is the chance that at least one person on a table of 5 would get infected from a 23.8% transmission rate?

1-(1-(5/21))⁵ = 74.3% chance that at least one person at a table of 5 is infected.

This isn’t very strong evidence. To put it another way, there’s a 1 in 4 chance that the infection rate across all the tables adjacent to Table A is purely random. There is no scientific field that should accept conclusions with only 74.3% chance of being correct.

74.3% is the best case scenario for this paper. The authors indicate that transmission might have been to just one person at the adjacent tables and then in turn spread to others. In that case, there is only 39.4% chance that at least one person at an adjacent table gets infected:

1-(1-(2/21))⁵ = 39.4% chance that at least one person at a table of 5 is infected.

So, if the pattern was a person at Table A infecting one person each at Tables B and C, and those people in turn infecting other people in their groups, then it is more likely that Table E and F would not get infections, even if they are the same distance from Table A and people stay there for the same time.

I would also expect a more sophisticated mathematical model that included greater dispersion over longer distance and I was surprised that this was not in the original paper. The airflow arrows seem to be conjecture and not based on observations. There are many other factors, too: whether the source person at Table A walked near Table B and C (they would pass B or C on the way to the bathroom); whether being near the air-conditioning increased the chances of transmission at Table C because of dried nasal passages; and the chance that the infections came from somewhere else entirely. With all these factors, and correct modeling of the distance and time overlap, the 74.3% and 39.4% probabilities at the nearby tables would almost certainly go down.

With such a small sample number, the confidence intervals will also be very high. For example, if we take away just one infected person and conclude that only 4 people sat at Table E for a long enough time, then 74.3% becomes just 61.2%. It simply isn’t possible to take away precise information from so few data points. With accurate details about the distances and amount of overlapping time between all people, then this data could be incorporated into larger studies that include potential transmissions recorded elsewhere. But without that information, the article can’t help us reach actionable conclusions.

While I have worked in disaster response for many years, including epidemic tracking, I am not an epidemiologist or specialist in the transmission of infectious diseases. So, even if I could draw strong conclusions from their study, I would not share them. Neither should you if you are not experienced in this field: there are already too many arm-chair epidemiologists. However, it is right for you to apply critical analysis of the scientific methods behind any media report, especially inconsistencies like a dimension compressed more than 50%.

Please note that it is fine to reject a conclusion based on simple best-case-scenario models, like I do above. However, a simple best-case-scenario model cannot be used to support some other conclusion. If you have data science skills and want to contribute to the COVID-19 response, then there are many ways you can help that don’t prior require knowledge about infection. I recommend my article: 5 Ways Data Scientists Can Help Respond to COVID-19 and 5 Actions to Avoid:

You’ll see in the article that the third way you can help is “Prepare data that might be directly related to the response”. That’s what I was trying to do. I was working in food safety for outbreaks before COVID-19 and I was trying to structure the data to help epidemiologists better model the transmission risk.

We know that airflow can be a factor in the transmission of viruses and so this is a valid problem to study. But the paper in the current form simply offers no evidence to support a conclusion that anything other than proximity was the main factor for transmission at this restaurant. I had read the paper once and seen that image a dozen or more times in my social media feeds before I realized that they compressed one of the dimensions in the diagram.

This is highly problematic because media organizations like the New York Times and the Hindustan Times are reporting this study as fact and most people (like me) don’t notice the problem with the image when they first saw it. I’m currently helping with messaging for the COVID-19 response for tens of millions of people, and when a study like this gets reported as fact, it is bad for everyone. If people are changing their behavior for reasons not based on sound science, it is likely to make things worse.

Robert Munro

May, 2020

Note from the author: This article is not medical advice. The advice from other people in this article does not necessarily apply to you. Ask your trusted healthcare providers for any medical advice for you in your situation.

Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.

--

--

Private/Global Machine Learning at @Apple | Runs @BayAreaNLP | Wrote bit.ly/human-in-the-l… | Prev @StanfordNLP @AWSCloud | Opinions my own | 🚲🌍 | they/he