Hands-on Tutorials

Introduction
Montreal is the second-most populous city in Canada and most populous city in the Canadian province of Quebec.
The city is a tourist destination with several recreational areas, historic monuments, numerous bars, restaurants and cafes. An activity that is touristy but also for its residents is drinking coffee. The activity can be done alone, accompanied by friends, and anytime at this city.
Out of 20 countries in the world that drink more coffee per capita, Canadians use about 6.5 kg /year per capita, more than the US and the UK.
On average, 72% of Canadians between 18 and 79 drink coffee on a daily basis but among regular coffee drinkers, the average consumption is 2.8 cups per day (Coffee Association of Canada)¹.
Business Problem
The objective of this project is analyse and explore an optimal location for a coffee shop. This report will be targeted at stakeholders who want to start from scratch, to buy an existing business, or anybody interested in a good coffee in Montreal, Canada.
Since there are lots of coffee shops in Montreal we will try to detect locations that are not already crowded with coffee shops. We are also particularly interested in areas with no coffee shops in the vicinity. We would also prefer locations as close to the city center (McGill University neighbourhood) as possible, assuming that the first two conditions are met.
Data
For the Montreal neighbourhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighbourhoods in the city.
To obtain data on the neighbourhoods and postal codes of Montreal, the Wikipedia page was scrapped. The library used for this task was BeautifulSoup. With the scraping, data were obtained from 124 postal codes, districts, and districts. It was still necessary to add the latitudes and longitudes of each zip code and for that, I used the pgeocode library.
Data cleaning was performed after verifying that a postal code (HOP) had no information about its neighbourhood, district, and consequently neither latitude nor longitude. I researched Montreal’s zip codes and found the H0 prefix an anomaly. The zero indicates a delivery rural area and therefore it is practically empty. (postal codes in Canada ). With that, that single zip code was erased.
Through the boxplot plot of the postal codes, an outlier of latitude and longitude was observed. The coordinates did not belong to Canada, so they were removed from the dataset.

After that, the data frame had a total of 122 neighbourhoods, and it was stored in such a way that it could be read into a pandas data frame so that it is in a structured format like the Montreal, Quebec dataset.

Methodology
In this session, the methodology used for the analysis of the project will be shown.
First, we checked the candidate neighbourhoods. It was created the latitude & longitude coordinates for centroids of them, McGill university neighbourhood.
The visualisation library Folium was used to create a map of Montreal with the neighbourhoods superimposed on top, and a grid of cells covering the area of interest which is approx. 20×20 kilometers centered around the McGill University neighbourhood.

To explore the neighbourhoods of the city of Montreal, the Foursquare API was used. The interface was used to get the most common categories of locations in each neighbourhood (cafes, restaurants, ice cream parlors, art galleries, etc) and then use this feature to group the neighbourhoods. It’s mandatory to create an account in the app to obtain your client credentials.
First, the API was used to get the top 100 venues that are in McGill university neighborhood in a radius of 2.5 kilometers.
Second, the analysis verified the frequency of 280 categories over the Montreal area, and after the coffee shop category was filtered.


Then, the k-means Clustering algorithm was performed to cluster the neighbourhood into 4 clusters. Again, the Folium library was used to visualize the distribution of the clusters over the city. Each cluster was examined and discriminated based on the frequency of coffee shops in each area, followed by the requirements of this project: areas with few or no coffee shops in the vicinity.

Finally, it was calculated and added to the analysis the distance of neighbourhoods where there are few or no coffee shops to the McGill university neighbourhood.

Results
This work analysed 122 neighbourhoods in Montreal with the objective of detecting places that are no longer full of coffee shops.
The area with the lowest density of cafeterias in Montreal is concentrated in the peripheral region of the analysed area. This region is related to cluster 0 (red dots on the map). In this cluster, 25 neighbourhoods have a low density.
Verdun South is the closest neighbourhood to the McGill university and has low density of coffee shop. It is 1.97 km away from the central coordinate.
There are also places closer and not crowded with coffee shops: Plateau Mont-RoyalSoutheast ( 0.75km), Verdun North (1.36km ), Place Desjardins, Downtown Montreal Northeast ( both 1.38km ).
Conclusions and Recommendations
Project requirements include detecting places that are not already crowded with cafes, or areas with low density of coffee shop nearby, and as close as possible to the city center (McGill University). The following five options are suggested that meet the project requirements : Verdun South, the closest neighbourhood to the McGill university with the lowest density of coffee shop with 1.97 km away from the central coordinate; Plateau Mont-RoyalSoutheast( 0.75km), Verdun North (1.36km), Place Desjardins (1.38km), and Downtown Montreal Northeast (1.38km).
This project only considers the frequency of coffee shops in the neighbourhoods. Other contributions are important for a final decision on the location of the cafeteria, such as other businesses close to it (which would be the other relevant categories that would complement it), cost of living in the neighbourhood, safety zones, among others. In addition, all analyses were performed using the free APIs or their open access option. The analysis can be done with paid resources so that more result options are achieved
The notebook of this analysis can be seen here.