Building a simple web scraped COVID-19 Visualization with Bokeh by Region

Every data problem starts with good visualization

Kanjo Melo
Towards Data Science

--

Photo by Jochem Raat on Unsplash

Like any data-enthusiast would state; an insightful data visualization enables us to obtain intuition into a problem and is always a good starting point for any sort of problem-solving, not just data-heavy ones.

For this project, I focused on building a visualization tool that would give some insight into which counties and municipalities in my region (Ontario, Canada), remained the hardest hit by the current COVID-19 Pandemic.

What you will need:

  1. A shapefile for the region you want to visualize. I used a .shp file format which is generated by most GIS (Geographic Information Software). Here is a link to the source. (This is a source that is almost two decades old, hence not the most up-to-date)
  2. The Bokeh and GoogleMaps packages installed. The code to install the package is as follows in the terminal/ command prompt:
pip install bokeh
pip install googlemaps

Next up we load the data and necessary libraries.

import requests
import numpy as np
import pandas as pd
import googlemaps
import geopandas as gpd
import seaborn
from shapely.geometry import Point, Polygon

The following simple piece of code enables automating the downloading, writing (or overwriting) of the dataset to a destination of choice.

url = ‘data_source_url'
myfile = requests.get(url)
open(‘data/conposcovidloc.csv’, ‘wb’).write(myfile.content)

A large part of why this could be a simple weekend project is because of the fellows at data.ontario.ca keeping the data clean. We do however need to shape the dataset into a manner that works better for this purpose.

covid_19 = pd.read_csv(‘covid_19_data_path.csv’)cv0 = pd.get_dummies(covid_19, columns = [‘OUTCOME1’,’CLIENT_GENDER’,’Age_Group’])
# this creates dummy variables for the categorical variables names
cv0 = cv0.drop(cv0.columns[0],axis = 1)
cv0 = cv0.replace(0,np.nan)
cv1 = cv0.groupby(['Reporting_PHU','Reporting_PHU_Latitude','Reporting_PHU_Longitude','Reporting_PHU_City']).count().reset_index()

The block of code above does a few things nicely. First, it makes categorical variables of the Outcome, Client Gender and Age_Group columns. However, this creates np.NaN objects where we would prefer zeros in the newly created categorical variables. The last line above then groups by the key columns for the visualization and the .count() method converts the NaN objects into zeros.

Counties = gpd.read_file(‘geo_data_path.shp’)
Counties = Counties.sort_values(‘OFF_NAME’)
for i,v in enumerate(city_order):
city = cv1[‘Reporting_PHU_City’][v]
county_city.append(city)

Counties[‘CITY’]=county_city

Next, we have loaded the geographical data. The geodata source used in this project did not contain city data for the public health cities which report the COVID-19 cases. city_order is a list containing the indexes of the list that we can generate from the column, Reporting_PHU_City, in our COVID-19 dataset. This creates a starting point to merge both datasets.

# longitude must always come before latitude
geometry = [Point(xy) for xy in zip(cv1[‘Reporting_PHU_Longitude’],cv1[‘Reporting_PHU_Latitude’])]
geo_cv = gpd.GeoDataFrame(cv1, geometry = geometry)

The simple block of code does quite a bit quite nicely. It creates the geometry from the longitude and latitude of the cities and creating a new GeoDataFrame from this.

Next, I performed some basic feature scaling using mean normalization to produce a more representative distribution of the municipalities worst hit by the pandemic to date. The new columns should then be merged into a single DataFrame that will be used for the Bokeh plot.

Here is a snapshot of what the DataFrame looks like:

The final two blocks of code generate the Bokeh plot with the help of the GoogleMaps library.

import json
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource,LinearColorMapper, ColorBar, Label, HoverTool
from bokeh.models import Range1d, ColorMapper
from bokeh.models.glyphs import MultiLine
from bokeh.palettes import brewer, mpl
api_key = ‘YOUR_API_KEY
gmaps = googlemaps.Client(key=api_key)
#Read data to json and convert to string-like object
Counties_json = json.loads(Counties.to_json())
json_data = json.dumps(Counties_json)
#Provide GeoJSONData source for plotting.
geosource = GeoJSONDataSource(geojson = json_data)
#Repeat for geo_cv GeoPandas DataFrame
geo_csv_json = json.loads(geo_cv.to_json())
json_data2 = json.dumps(geo_csv_json)geosource2 = GeoJSONDataSource(geojson = json_data2)

And for the plot itself.

fig = figure(title = ‘Ontario Covid 19 Cases by Municipality’, plot_height = 750 , plot_width = 900
,active_scroll=’wheel_zoom’)
fig.xgrid.grid_line_color = None
fig.ygrid.grid_line_color = None
# initialize the plot on south and eastern ontario
left, right, bottom, top = -83.5, -74.0, 41.5, 47.0
fig.x_range=Range1d(left, right)
fig.y_range=Range1d(bottom, top)
# Creating Color Map by Recovery Rate
palette1 = brewer[‘RdYlGn’][11]
cmap1 = LinearColorMapper(palette = palette1,
low = Counties[‘PERCENTAGE_TOTAL_log’].min(),
high = Counties[‘PERCENTAGE_TOTAL_log’].max())
cmap2 = LinearColorMapper(palette = palette1,
low = geo_cv[‘TOTAL_CASES’].min(),
high = geo_cv[‘TOTAL_CASES’].max())
#Add patch renderer to figure.
f2 = fig.patches(‘xs’,’ys’, source = geosource, line_color = ‘black’, line_width = 0.25, line_alpha = 0.9, fill_alpha = 0.6,
fill_color = {‘field’:’PERCENTAGE_TOTAL_log’,’transform’:cmap1})
fig.add_layout(ColorBar(color_mapper=cmap2, location=’bottom_right’,label_standoff=10))hover2 = HoverTool(renderers = [f2], tooltips = [(‘Municipality Name’,’@OFF_NAME’)])
fig.add_tools(hover2)
f1 = fig.circle(x=’Reporting_PHU_Longitude’,y=’Reporting_PHU_Latitude’,size = 10,
color = ‘black’, line_color = ‘white’, source = geosource2, fill_alpha = 0.55)
hover1 = HoverTool(renderers = [f1], tooltips=[(‘Health Unit’,’@Reporting_PHU’)
,(‘Health Unit City’,’@Reporting_PHU_City’)
,(‘Fatalities’,’@Outcome1_Fatal’)
,(‘Percentage Recovered’,’@PERCENTAGE_Recovered’),(‘Total Cases’,’@TOTAL_CASES’)
,(‘Male to Female Ratio’,’@GENDER_RATIO1')])b
fig.add_tools(hover1)
output_notebook()
show(fig)

Thanks for following along and I trust you found this valuable

Feel free to reach out to me on one of the following platforms

Twitter, LinkedIn, Github or follow me right here on Medium!

--

--