The world’s leading publication for data science, AI, and ML professionals.

Understanding the pandemic through the lens of data

Exploratory data analysis of current Covid-19 cases in Toronto

Photo by CDC on Unsplash
Photo by CDC on Unsplash

Introduction

December of 2019, the first case of a new virus was found in Wuhan, China. WHO along with Chinese authorities confirmed human-to-human transmission. Soon after, it was declared a global pandemic. This of course is Covid-19, a highly contagious disease caused by SARS-CoV-2. We have come a long way since, with over 105 million people infected currently (as of February 2021). The global cases are still on the rise, with several countries employing some measures of restrictions.

To better understand the ongoing pandemic, several researchers started analyzing the available data to extract valuable insights. In this article, I am going to be analyzing the publicly available data of Covid cases in Toronto. The objective is to understand the information this data has to offer in conjunction with what’s been said on the news.


Data Source

City of Toronto’s open data portal is a great initiative that provides a ton of datasets to explore and understand how the city of Toronto works. For this project particularly, I took up the Covid-19-cases, which contains demographic and geographic information of all the cases reported to Toronto public health. This data updates on a weekly basis and the current analysis contains data until February 1st, 2021.


Exploratory Analysis

The dataset consists of 83474 points and 18 features, a sample of which is shown below. As of 1st February 2021, we have 80,775 confirmed cases and 2699 probable cases of Covid.

The first five points from the data with some of the features
The first five points from the data with some of the features
Number of cases reported in each month from Jan 2020
Number of cases reported in each month from Jan 2020
Number of fatal cases reported in each month from Jan 2020
Number of fatal cases reported in each month from Jan 2020

The plots above show the total number of cases and the number of fatal cases reported every month since the start of the pandemic. We could clearly spot two waves, the second of which is still ongoing. The peak was of the first wave that ended in August can be seen to be in April. The second wave appears to have infected a significantly larger population, but resulting in fewer fatal cases. This suggests that we have learned to handle the pandemic better from the first wave. The daily cases since the start of the pandemic is shown below.

Daily cases since the start of the pandemic
Daily cases since the start of the pandemic

Let us now understand the distribution of the cases over various age groups.

Age Group: Age groups under consideration are 0-19, 20–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89, ≥90.

Covid-19 cases over various age groups
Covid-19 cases over various age groups

Covid cases (in thousands) are shown for various age groups above. We could see the number of cases to be significantly higher towards the younger side, which could simply be caused by the higher population of youth in Toronto. Therefore, it is important to get an understanding of the population tally of each of these age groups. This demographic data is taken from StatCan. This is the 2016 census data and I assumed a 5% growth to 2021, which is slightly greater than the growth from 2011 to 2016 (4.3%).

Population and Covid cases vs Age Group in Toronto
Population and Covid cases vs Age Group in Toronto

As expected, the number of cases in the younger age groups was significantly higher due to a larger population. In absolute numbers, the age group 20 to 29 has the highest number of cases. This makes sense, as people in age group 20 to 29 tend to have the most contact with one another and also have high mobility [source]. Was this the worst affected age group in terms of the number of infections? To find this out, let’s plot the ratio of number of cases to population for various age groups.

Number of cases per population
Number of cases per population

Surprisingly, it turns out to be the age group of 90 and older! The ratio of cases to population is significantly higher (~10) than the other age groups (~3). Although the 90+ age group population is the smallest, a lot of them reside in long-term care homes where severe outbreaks were observed. More on this later. For now, let’s look at the outcome of each of these confirmed cases.

Outcome: Active (cases that are still ongoing), Resolved (cases in which the patient recovered), Fatal (cases resulting in death)

Out of all the cases in Toronto, about 70000 of the cases have been resolved and about 3% of the cases (2265) were fatal.

Case count based on outcome
Case count based on outcome
Outcome of the cases for various age groups
Outcome of the cases for various age groups

The table above shows the outcome for various age groups. We can see an increase in the fatality with increasing age.

Fatality and number of cases for various age groups
Fatality and number of cases for various age groups

This plot is scary! We can see that, as the age increases the number of infections decreases significantly, while the number of fatal cases increases dramatically. The percentage of fatality for the age groups are shown below.

Percentage of fatality for various age groups
Percentage of fatality for various age groups

The percentage of fatality increases dramatically with age, with about 30% for ages 90 and older. The more neutralizing antibodies we have, the better immunity we have against Covid. Unfortunately, these antibodies decrease over time and hence older age groups are more susceptible to death due to Covid. More discussion on this can be found here.

Source of Infection:

Travel (travel outside Ontario), Close contact (come in close contact with a confirmed or probable case), Institutional (long-term care homes, special care hospitals, acute care hospitals, etc.), Healthcare (family physician, dentists, etc.), Community (grocery stores, gyms, etc.), Unknown (unknown source of infection), Outbreak (Cases related to outbreaks in long-term care, homeless shelter, etc.)

Cases vs source of infection
Cases vs source of infection

It is a bit concerning and even scary that the source of infection for most of the cases is unknown. This shows how difficult contact tracing is. I recommend people start using the Covid Alert app to help trace the source of contact.

Close contact, outbreak and community are the next most common sources of infection.

Most of the cases on the younger side of the age group correspond to close contact and community transmission. While outbreak-related infections is the leading source of infection in ages 80 and older. This corresponds to the outbreaks in long-term care units that we have been hearing in the news since the beginning of the pandemic, which is deeply disheartening.

Sources of infection corresponding to fatal cases
Sources of infection corresponding to fatal cases

Outbreak particularly in long-term care units can be seen as the primary contributor to the fatal cases. The number of deaths could have been significantly reduced had Ontario handled the long-term care units better! A detailed analysis of Canada’s care home crisis can be found here.

Lastly, let us look at the distribution of the cases over various regions in Toronto. The city is divided into 96 forward sortation areas (FSA). Each FSA corresponds to a unique 3 digit postal code starting with the letter M.

Map of Toronto with the number of fatal cases in each FSA
Map of Toronto with the number of fatal cases in each FSA
Map of Toronto with the number of active cases in each FSA
Map of Toronto with the number of active cases in each FSA

While rolling out vaccines in Toronto, it is best to target the locations with the highest number of active cases shown above. This corresponds to Etobicoke (M9V), North York (M3N, M2R), Scarborough (M1B), and York (M6M). The ideal scenario is to begin vaccinating people in long-term care homes and a simultaneous mass vaccination of the aforementioned locations.

Conclusion

  • The second wave that started in August infected significantly more people compared to the first wave but resulted in relatively fewer fatal cases. This goes to show how being prepared to handle a pandemic helps save a lot of lives and is a particularly important lesson to carry forward to the future.
  • The most number of infections was observed in the age groups 20 to 29 due to high mobility and contact with one another. On the other hand, most of the fatal cases were observed in ages 80 and older. This is due to the decrease in the neutralizing antibodies over time.
  • The source of the spread of infection is unknown in over 40% of the cases, which shows the difficulty in contact tracing. Among the known cases, close contact was the number one source of infection. This is the reason the premier of Ontario has imposed strict stay-at-home and isolation orders.
  • Among all the Covid-19 cases in Toronto, the most fatal cases were caused due to outbreaks, particularly in long-term care homes and retirement homes. As discussed here, the conditions of these care homes must be dramatically improved, overcrowding of these homes must be addressed and more qualified staff should be hired.
  • There is no schedule of vaccination available in Toronto, but based on the available data, the ideal strategy would be to vaccinate people in the long-term care homes and retirement homes as quickly as possible, and mass vaccination in the most affected zones should be initiated.
  • The fatality percentage for ages 50 and below is almost zero, and the age group 20 to 29 has the highest number of infections. After vaccinating people of ages 50 and older, and vulnerable people of all age groups, it looks like vaccinating people in age groups 20 to 29 could contain the spread of the virus.

We could see through the data that the major cause of the spread of infection is close contact. Isolating and staying at home are effective strategies to curtail the spread of the virus. I understand that not a lot of people have the privilege that I do, of isolating and staying at home. But I urge you to try and follow these orders as close as possible. Although not mandatory, I also strongly urge you to get vaccinated, build herd immunity and fight this pandemic together!

All the analysis was done in Python and the relevant codes can be found in my GitHub repo. Please feel free to share your thoughts here or connect with me on LinkedIn.

Thanks for reading!


Note from the editors: towardsdatascience.com is a Medium publication primarily based on the study of Data Science and machine learning. We aren’t health professionals or epidemiologists. To learn more about the coronavirus pandemic, you can click here.


References

  1. https://open.toronto.ca/dataset/covid-19-cases-in-toronto/
  2. https://www12.statcan.gc.ca
  3. https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00083-0/fulltext
  4. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7288963/

Related Articles