Coloring India Red, Orange and Green: Covid19 Chloropleth Map

Bokeh-based-plot/map showing different districts in India, with real-time data and outbreak severity of the novel coronavirus

YATHARTH AGGARWAL
Towards Data Science

--

Image created from Freepik, see credits in last

Note: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.

Covid-19 or novel coronavirus has made the world governments force nation-wide lockdowns, in order to battle against the deadly virus. The virus has now affected more than 4 million people worldwide and left 300k dead.

India introduced the third lockdown on May 4th for two weeks and is probably planning for lockdown number four. Some believe it is the biggest lockdown anywhere in the world with close to 1.3 billion people confined to their homes. India started colour coding its 733 districts into Red, Orange, and Green Zones depending upon the severity of the disease spread.

Red: a district with the highest caseload contributing to more than 80% cases of each state or a district with a doubling rate of fewer than 4 days.

Green: a district that has not reported any case over 28 days.

Orange: a district which does not fall into either red or green

The government has tried to use the country’s federal structure to map and isolate the pandemic, create levels of restrictions, allocate resources efficiently, and hold accountability.

Read more at Economic Times and India Today

As a geek sitting in lockdown, I tried to plot those districts in Python and this blog is about that.

Aim: The blog lays out step for creating a choropleth map for districts in India, and mark them with their colours. The map is a type of interactive plot that also shows the number of Covid19 affected people in each district.

Code has been built in Jupyter Notebook and it can be found on my Github repo: [https://github.com/yatharthaggarwal/CovidIndia]. I have been using the Anaconda toolkit for Python-based projects and would recommend the same to others.

Packages:

  1. GeoPandas: is an open-source package that helps users work with geospatial data. The one dependency we will focus on is shapely, on which GeoPandas relies on performing geometric operations. In our case, the shape of each district of India will be encoded as a polygon or multipolygon via the shapely package. Then used through GeoPandas to understand the geospatial data.
  2. Bokeh: it allows us to create more complex plots. Bokeh is a multi-functional, open-source package meant to help users create beautiful and interactive visualizations.

Importing Datasets

1. Covid-19 patient data

First, we need to load the COVID 19 data at the district level. There is NO data made available by GoI/ State governments district wise in India. Therefore we need to rely on crowdsourced-based portals like https://www.covid19india.org/.

Even the data provided by them is not correct to a very large percent. They have pushed much of the data in ‘unknown’ sections. But, we are going to take those rows where District is mentioned. The following picture shows the code snippet of loading the dataset, we are interested in Confirmed, Active, Recovered, and Deceased columns. Confirmed cases are a total of the other three.

districtdata = pd.read_csv ('https://api.covid19india.org/csv/latest/district_wise.csv')
districtdata.head()

2. India District Shapefile

In order to create a map, you will need a shapefile (.shp). The file tells Geopandas what shapes we are mapping out. In our case, we need a file that outlines each district. We use a geometry column so that the Geopandas package knows where to look for information for the shapes of each state.

An excellent updated shapefile can be found here: https://hub.arcgis.com/datasets/esriindia1::india-districts-boundary. One can download it from here and use it. Although there are many spelling mistakes, which I have corrected in the data cleaning section. See the code snippet below:

df1 = gpd.read_file('India_Districts/c44c9b96-f570-4ee3-97f1-ebad64efa4c2202044-1-1rb4x6s.8xx6.shp')
df1.head()

We can make use of plot function to see what is in shapefile. If one can find any official file please do comment it. This one does not contain districts made in 2020, but the map comes out beautifully.

# showing the India map based on the shapefile... See the districts boundary
df1.plot(figsize = (8,8))
India districts map from shapefile

3. District-wise zone data

The data has been published by the Ministry of Health and Family Welfare. I have directly downloaded it from https://www.kaggle.com/xordux/india-corona-severity-zones and placed it in the working directory. See the code snippet below:

zones = pd.read_csv('Zones in india.csv')
zones.head()

Data Cleaning

The data cleaning section is a very necessary part of any data-based project, as it provides means of correcting/removing the data values, working NaN values, and formatting the dataset as a whole. I am not putting the code snippet here as it just corrects a huge number of spelling mistakes. Check the code section in the repository directly which is linked again here.

Data Pre-Processing

Creating a common dataframe with all the data shown above.
In df1 (shapefile), we merged the Covid19 patients’ data based on District and State, and then the Zones dataframe was similarly merged. A few unnecessary columns were also dropped. In India, often different states keep a similar District name, therefore in the ‘merge’ function we need to check both [District’, ‘States’]. See the code snippet which includes geometry, zones, and patient data along with district names and state names:

newdf = df1.merge(districtdata[['State', 'District', 'Confirmed', 'Active', 'Recovered', 'Deceased']], on = ['District', 'State'])newdf = newdf.merge(zones[['State', 'District', 'Zone']], on = ['District', 'State'])newdf = newdf.drop(columns = ['statecode', 'state_ut', 'distcode', 'countrynam'])
newdf.head()

Now that the data has been converted to the appropriate format, we can plot the choropleth map of districts by following these steps:

  1. Creating a figure object
  2. Adding a patch renderer to the figure
  3. Creating a hover tool which shows the patients data
  4. Show the figure

Importing Bokeh library and creating data type format

import json
from bokeh.io import show
from bokeh.io import output_file, save
from bokeh.models import (CDSView, ColorBar, ColumnDataSource,
CustomJS, CustomJSFilter,
GeoJSONDataSource, HoverTool,
CategoricalColorMapper, Slider)
from bokeh.layouts import column, row, widgetbox
from bokeh.io import output_notebook
from bokeh.plotting import figure

Bokeh consumes GeoJSON format which represents geographical features with JSON. GeoJSON describes points, lines, and polygons (called Patches in Bokeh) as a collection of features. We, therefore, convert the merged file to the GeoJSON format.

Also, a categorical color mapper is created to map the Red zone with red color and similarly for Green and Orange zones.

a) Create GeoJSONDataSource object with our initial data

geosource = GeoJSONDataSource(geojson = newdf.to_json())

b) Define the color palette to use and mapping to categorical values of Zones

palette = ['red', 'orange', 'green']
color_mapper = CategoricalColorMapper(palette = palette, factors = ['Red', 'Orange', 'Green'])

Choropleth Map

The following code creates figure object (includes zoom and pan tools), adds a patch rendered (including geosource and mapping color mapper with column zones), and creates a hover tool to display Confirmed, Active, Recovered, and Deceased cases in a particular district.

And lastly, save the plot in an ‘HTML’ format. Now going through the above-mentioned steps:

  1. While creating a figure object, we can set title, height, width, and interactive tools for the plot.
p = figure(title = 'Red, Green and Orange Distric of Covid19: 19 May 2019',
plot_height = 700 ,
plot_width = 650,
toolbar_location = 'right',
tools = "pan, wheel_zoom, box_zoom, reset")

2. The patch renderer is the main place where we enter data source and color mapper.

states = p.patches('xs','ys', source = geosource,
fill_color = {'field' :'Zone',
'transform' : color_mapper},
line_color = 'black',
line_width = 0.25,
fill_alpha = 1)

3. The Hover tool lets us put the data which we want to show in the plot.

p.add_tools(HoverTool(renderers = [states],
tooltips = [('District','@District'),
('State','@State'),
('Zone','@Zone'),
('Confirmed cases','@Confirmed'),
('Active cases','@Active'),
('Recovered cases','@Recovered'),
('Deaths','@Deceased')
]))

4. Finally the plot in HTML format. It’s a whopping 150 MB. I need to see how can I reduce the size. Save function saves it in the current directory.

output_file('plot.html', mode='inline')
save(p)
show(p, notebook_handle = True)
Choropleth map showing all district zones in India with tools on the right side

Congrats !! We have successfully created our Choropleth map. One can simply open the plot.html file in any browser to view its content. Or just shoot the following command to see the plot in the new tab.

I have created a small video showing the interactive map.

https://www.youtube.com/watch?v=JuF4cNeWKzc

Choropleth map working

One can see the code here directly.

If you have any comments or suggestions on how we can further enhance the plot or code, I would love to hear them!

Connect with me: noobdatascientist@gmail.com

Thanks for reading!

The first picture (which also acts as a thumbnail) is created using Freepik’s: India Map, Corona Infection

--

--

Academically a Computer Science graduate and currently a freelancer. Worked with startups in varied fields of Data Science, Robotics, and AI.