Utilising novel data streams in the fight against COVID-19

Published in

Towards Data Science

3 min readAug 28, 2020

Gulf of Mexico, United States; Photograph by NASA.

With social distancing measures in place, a large amount of discourse relating to COVID-19 now takes places on social media platforms such as Twitter. These platforms contain a treasure trove of information that can help us answer questions such as how many people are exhibiting Coronavirus symptoms today? However, not all information is created equal — these platforms also contain a lot of misinformation which could potentially cause harm to members of the public.

We developed a system to track and analyse tweets that mention symptoms of COVID-19. This system ‘listens’ for tweets that mention COVID-19 symptoms. Once identified, tweets are fed through a machine learning classifier which identifies whether it relates to a user’s personal symptoms, someone else’s symptoms or if the tweet contains misinformation.

We can also use geolocation data to calculate the number of users who tweet about symptoms in each region of a given country (where geolocation is permitted by the user). From this data, it is possible to determine the number of users who travel between different regions of a given country. This information could potentially help to identify new outbreak clusters within a country and provide insight into how members of the public responded to lockdown measures.

Dashboard showing the number of Tweets relating to COVID-19 symptoms over time

To make this information easily accessible, we developed a ‘Symptom Watch’ dashboard, which reports a daily count of the number of tweets that mention symptoms. These counts are currently provided per state in the USA and at various levels (local and upper tier authority, NHS region and national) in the UK. This functionality will be extended to other countries in the near future.

We have also been working with Evergreen Life to analyse data from their health and wellness app. In response to COVID-19, Evergreen Life have been asking app users questions to gain insight into the pandemic . Users are asked to report, for example, if they are isolating or if they or someone in their household has symptoms. The depth and breadth of the data collected is impressive and could answer an endless number of questions.

The team has developed solutions to answer to some of these questions — for example the average duration an individual experiences symptoms of COVID-19 for. User reports to the Evergreen Life app are sporadic and we therefore don’t see a complete timeline of reports for the full duration an individual is exhibiting symptoms. To deal with the sporadic nature of user reports, we defined and fit a Bayesian model in the ‘Stan’ programming language, which enabled us to determine that users were most likely to experience symptoms for 3.06 days.

Where users report a household member exhibiting symptoms, we can gain insight into the interaction of COVID-19 within households by determining the time between two household members falling ill. We also know whether a user is isolating and subsequently develops symptoms. From these reports, we can quantify whether isolating reduces your chances of developing coronavirus. We analysed data collected between March and June this year and determined that individuals who did not isolate were 35% more likely to report symptoms within 7 days of reporting that they were not isolating.

The work we have done so far demonstrates how novel data streams can be utilised to gain a deeper understanding of the COVID-19 pandemic. When combined with more conventional data streams, these novel data streams could aid governments in making more informed decisions to combat the virus.

Matthew Carter is a PhD student who is part of the EPSRC CDT in Distributed Algorithms. This blog was originally posted on the University of Liverpool COVID-19 Hub.

Utilising novel data streams in the fight against COVID-19

Written by Matthew Carter