Analysis of Toronto Neighbourhoods using Machine Learning

A New Immigrant’s Guide to settling in the City of Toronto

Published in

Towards Data Science

5 min readNov 27, 2020

Introduction

When I began this project, I came across a news article which read “Canada to welcome 1.2 million immigrants by 2023” [1]. This made me excited for the millions of people looking for a pathway to Canada since I recently relocated here. A 2020 US news ranking showed Canada as the 2nd best country in the world, so it is not a surprise that every year, thousands of people choose to migrate here [2]. Asides from having a stable economy and many growth opportunities, Canada has offered many immigrants a new home. In 2019, Canada opened its borders to 341,000 people with 35% of them settling in the City of Toronto [3]. Hence, it is safe to say that the City of Toronto is a top destination for most new immigrants.

Problem Statement

The City of Toronto has 140 neighbourhoods spanning 6 districts. As a new immigrant, a vital question to answer is “What neighbourhood do I settle in?”. The aim of this project is to group Toronto neighbourhoods in order of desirability using Machine Learning and Data Visualization techniques.

Basis

There are several factors to consider when settling down in any location. For this project, I performed my analysis using the following criteria:

Total number of Essential Venues in each neighbourhood
Primary and Secondary Benchmarks: Primary benchmarks considered were Unemployment rate, Crime rate and COVID-19 rates for each neighbourhood while the Secondary benchmark was housing price for a one-bedroom apartment in each neighbourhood.

Data Description

Most of the datasets were obtained from the City of Toronto Open Data Portal. Other datasets were scraped from the web. They include:

Neighbourhood Boundaries Map (GeoJSON): This file contains standard geospatial data and was critical for map visualizations
COVID-19 dataset for Toronto: Total cases as of October 22nd, 2020
Crime rates dataset for Toronto Neighbourhoods: for the Years 2014 to 2019
Neighbourhood Profiles/Census dataset: Based on data collected by Statistics Canada in the last Census campaign held in 2016
Housing rental prices: Contains median rental prices per neighbourhood

Methodology

The Python libraries used on this project were Numpy, Pandas, Geopandas, Plotly, Scikit learn, Requests and Geopy. All visualizations were done using Plotly library because the visualizations are very interactive and can be achieved with fewer lines of code.

The GitHub repo for this project can be found here while the Jupyter notebook can be viewed here.

The main steps for this project are summarized in the flowchart below:

Data Exploration

The Interactive Charts and Maps in the rest of this article are best viewed using a Computer or tablet

Exploring Venues in City of Toronto

Firstly, I obtained top 100 venues in each neighbourhood by sending a request via the Foursquare API. A total of 2118 venues and 291 unique venue categories were returned.

Using One-hot encoding, I converted the venue categories to numerical values for each neighbourhood to carry out further analysis. The total number of essential venues such as restaurants, schools, train stations, malls etc. were computed for each neighbourhood. From the Sunburst chart below, we can see all 6 Toronto districts and their respective neighbourhoods. The neighbourhoods are displayed based on proportion of the total number of essential venues present in them. Click/Tap on chart to explore further.

Sunburst Chart for all Toronto districts

Quick Facts Check: There are more coffee shops and restaurants in Toronto than there are neighbourhoods with over 900 restaurants spanning across the city

Exploring Toronto Neighbourhoods using Primary benchmarks

After a clean-up of the individual datasets for the primary benchmarks, I merged them into one Pandas dataframe as shown below.

The dataframe was converted to an interactive bubble chart below. Crime rates represented by the bubble size. Click/Tap on the legend on the Bubble chart to isolate a district and explore further.

Relationship between Unemployment, COVID-19 and Crime Rates in Toronto neighborhoods

Quick Stats Check: Average Unemployment rate is 8.3%. Average number of crimes committed per 100,000 people is 1378 and 1 in 100 persons had contracted COVID-19 as at October 2020.

Machine Learning

Clustering Toronto Neighbourhoods

A clustering algorithm, “k-means”, was used to group the neighbourhoods in order of desirability for new immigrants. k-means is an Unsupervised Machine Learning algorithm that groups the data points such that all neighbourhoods with similar data points are in the same cluster.

Steps for Clustering Toronto Neighbourhoods

The steps below were used to segment the neighbourhoods:

Determine optimum number of clusters using the “Elbow” method
Group neighbourhoods using total number of essential venues. These essential venues included places such as Schools, Train stations, Restaurants, Banks, Shopping Malls, Bus Stations etc. This resulted in 3 distinct neighbourhood clusters and the outcome was represented in the final Choropleth map as “Venue Density”
Group neighbourhoods using the primary benchmarks — Unemployment, Crime and COVID-19 rates. The result of this clustering attempt is shown below
Group the neighbourhoods in the “Low” cluster from Step 3 using the secondary benchmark i.e. Housing prices

Distribution of Neighbourhood after Clustering using Primary (L) and Secondary (R) benchmarks

Results

The outcome of the clustering steps above was used to rank the neighbourhoods into four categories. Neighbourhoods that belonged to the Mid & High clusters in Step 3 were named as the Least desirable while those with Low, Mid and High housing prices in Step 4 were named as Most Desirable, Desirable and Semi-Desirable respectively. The final neighbourhood desirability index was made into a choropleth map below using Plotly library.

Combined Choropleth Maps for City of Toronto neighbourhoods

Conclusion

From the results, we can make the following deductions:

Only 10% of Toronto neighbourhoods have high venue density with Mount Pleasant West, Church-Yonge Corridor, Yonge-St. Clair and Bay Street Corridor taking the lead
Most Desirable Neighbourhoods: Consider neighbourhoods in Scarborough area if searching for less pricey apartments. Other neighbourhoods to consider are Banbury Don-Mills and Annex in North York and York districts respectively
Looking for Entertainment: Look no further than Downtown Toronto which is also known as the Entertainment District. This area was classified Semi-desirable owing to the higher housing prices. However, if you’re looking for fun and have the $$$, it is a great place to settle in
Presence of Essential Venues: If you are keen on proximity to essential venues, the neighbourhoods to consider which are also in the Desirable category are Mount Pleasant West, Yonge-St, Clair and Greenwood-Coxwell
Avoid if you Can: Most neighbourhoods in the North-Western region of Toronto i.e. Etobicoke district were classified as the Least desirable due to the high crime and COVID-19 rates in those neighbourhoods. It is also interesting that this region is home to Jane and Finch which is a “red” neighbourhood.

References

All references used for this project have been hyperlinked within the write-up. For the complete Python code written on Jupyter Notebook, GitHub repo with the dataset and my social media pages, please use the links below: