“Powering-up” health facilities in SSA

An exploratory approach using open access data & spatial analysis in Python

Alexandros Korkovelos
Towards Data Science

--

Photo by Martin Sanchez on Unsplash

Table of contents

· Motivation
· A step-by-step guide
· Final Remarks
· Link to supporting material

Motivation

Globally, it is estimated that ~789 million people live without access to electricity. The great majority (about 75%) is located in Sub-Saharan Africa (SSA). This is a critical service gap that affects socioeconomic development and human well-being, especially in poor, rural populations in least developed areas.

At the same time, COVID-19 has brought into sharp focus the importance of electricity in the health sector. Access to electricity, in health clinics and posts, keeps people connected, allows for information management, the refrigeration of medicines and other services to protect vulnerable populations. Therefore, the lack of reliable power is undermining the quality of health care for millions of people.

This creates an urgent need to “power-up” health facilities in least developed areas so that there is a timely response to the COVID-19 crisis.

This blog presents an exploratory, GIS based method of estimating electricity requirements for health facilities, in areas where detailed data of this kind is scarce. It leverages existing open access datasets (and models) in order to provide a high level picture of annual electricity requirements in those facilities and later on indicate how these can be met as part of a least-cost electrification plan.

The following example focuses on the district of Mecanhelas in Mozambique (chosen arbitrarily). However, the code can be used to scaled up the analysis at any administrative level (e.g. national and/or regional).

A step-by-step guide

Step 1. Setting up python environment & Importing datasets

Import necessary modules

As part of any modeling exercise in Jupyter, the first step requires that the necessary python modules are imported. You may refer to the requirements.txt to check dependencies for this example.

# Import python modules
import geopandas as gpd
import pandas as pd
import pyproj
import numpy as np

from functools import reduce
from shapely.geometry import Point, Polygon, MultiPoint
from shapely.ops import nearest_points
import datapane as dp
# !datapane login --token="yourpersonaltoken"
import folium
from folium.features import GeoJsonTooltip
import branca.colormap as cm
import os
from IPython.display import display, Markdown, HTML

import matplotlib.pyplot as plt
from matplotlib.ticker import MaxNLocator

%matplotlib inline

Importing datasets

Here we import the datasets that we are going to work with. These include the following three:

1. Administrative boundaries (vector polygon): This defines the boundary box over our area of interest. In this example we use the administrative boundaries of Mecanhelas, a province in north-western Mozambique. You may retrieve this from GADM.

2. Locations of health facilities (vector points): In this case we use a spatial database of health facilities provided by Maina et al. The data set consists of 98,745 public health facilities in SSA, all of which are geolocated.

3. Population clusters (vector polygons): This refers to a vectorized version of built-up areas as described in Korkovelos et al. Population clusters for all countries in Sub-Saharan Africa are open access and available at PopClusters. (In this exercise we use Version 6 released 01–09–2020).

Note 1. For the purposes of this exercise both (2) and (3) have been clipped based on (1) before being imported in the notebook. You may used geopandas or Qgis to do this.

Note 2. All datasets are in WGS84 coordinate system prior importing. Please make sure that this is the case and correct if otherwise.

Importing admin boundary as geo-dataframe

# Define path and name of the file
admin_path = r"input_data\Testing_Sample"
admin_name = "mecanhelas_admin.gpkg"
# Create a new geo-dataframe
admin_gdf = gpd.read_file(admin_path + "\\" + admin_name)
# Create the axis first
fig, ax = plt.subplots(figsize=(10, 10))
admin_gdf.plot(ax=ax, edgecolor='black', alpha=0.2)
ax.set_aspect('equal', 'box')
txt = ax.set_title('Mecanhelas Administrative Boundary'.format(""))
png
Administrative area of Mecanhelas district in Mozambique — Image by Author

Importing health facilities

Note that for the code below to work properly this should be a “Point” layer; in case the geometry is characterized as “Multipoint” you should convert it into a “Point” to proceed. You may use Qgis to do this.

# Define path and name of the file
health_path = r"input_data\Testing_Sample"
health_name = "mec_health_index.gpkg"
# Create a new geo-dataframe
health_gdf = gpd.read_file(health_path + "\\" + health_name)
health_gdf.head(3)
png

… and visualizing

fig, ax = plt.subplots(figsize=(10, 10))

admin_gdf.plot(ax=ax, edgecolor='brown', alpha=0.2)
health_gdf.plot(ax=ax, legend=True, markersize=3, figsize=(15, 15), alpha=0.5)

ax.set_aspect('equal', 'box')
txt = ax.set_title('Location of health facilities in Mecanhelas'.format(""))
png
Location of health facilities in Mecanhelas, Mozambique — Image by Author

Importing population clusters

# Define path and name of the file
clusters_path = r"input_data\Testing_Sample"
clusters_name = "mecanhelas_clusters.gpkg"
clusters_gdf = gpd.read_file(clusters_path + "\\" + clusters_name)clusters_gdf.head(2)
png

… and creating a new geodataframe with the centroids of each cluster

clusters_centroid = gpd.GeoDataFrame(clusters_gdf, geometry=clusters_gdf.centroid)fig, ax = plt.subplots(figsize=(10, 10))

admin_gdf.plot(ax=ax, edgecolor='brown', alpha=0.2)
clusters_centroid.plot(ax=ax, legend=True, markersize=3, figsize=(15, 15), alpha=0.5)

ax.set_aspect('equal', 'box')
txt = ax.set_title('Population cluster centroids in Mecanhelas'.format(""))
png
“Vectorized” built-up are over the district of Mecanhelas, Mozambique — Image by Author

The imported data indicate that there are 10,703 population clusters (or settlements) in Mecanhelas, served by 19 health facilities.

Step 2. Spatial Processing of datasets

Once the datasets have been successfully imported we can calculate simple spatial statistics for the health facilities. That is, we use their location and extract values from nearby population clusters. There are two options that we may follow here.

Option 1 — Nearest hub analysis

Deploy a nearest hub analysis and calculate statistics associated with the “catchment” area of each health facility.

Note that NNA code below was retrieved from shakasom.

# Simple function getting the nearest hub for a given set of points
def calculate_nearest(row, destination, val, col="geometry"):
dest_unary = destination["geometry"].unary_union
nearest_geom = nearest_points(row[col], dest_unary)
match_geom = destination.loc[destination.geometry == nearest_geom[1]]
match_value = match_geom[val].to_numpy()[0]
return match_value

Assign the nearest Health facility to each cluster

clusters_centroid["index"] = clusters_centroid.apply(calculate_nearest, destination=health_gdf, val="index", axis=1)

Dissolve clusters based on health facility id & Calculate basic statistics for each health facility

stat_list = ['sum', 'mean', 'count']

HF_hub_stats = clusters_centroid.dissolve(by='index', aggfunc=stat_list)

Note! This results in a multipoint vector layer; each multipoint contains the location of clusters (centroids) served by the same health facility.

png

Merging result with initial health facility layer

This is to append the newly retrieved stats in the original health facility dataframe.

HF_hub = health_gdf.merge(HF_hub_stats, on="index")

After tidying up the results, option 1 yields the following geodataframe, in which basic stats are calculated per health facility based on cluster’s nearest distance.

png

Option 2 — Stats based on buffer zone

Apply a buffer zone to each health facility and calculate statistics for cluster falling within it. This approach is less computationally intensive.

First we need to project the layers into the proper CRS

In this case we use “epsg:32737” identified for Mozambique from epsg.io.

admin_gdf_prj = admin_gdf.to_crs({'init': 'epsg:32737'})
health_gdf_prj = health_gdf.to_crs({'init': 'epsg:32737'})
clusters_gdf_prj = clusters_gdf.to_crs({'init': 'epsg:32737'})
# Drop selected columns
clusters_gdf_prj.drop(["index"], axis=1, inplace=True)
clusters_gdf_prj.head(2)
png

… and we re-generate the cluster centroid layer, only this time using the projected clusters

clusters_centroid_proj = gpd.GeoDataFrame(clusters_gdf_prj, geometry=clusters_gdf_prj.centroid)

Adding a buffer zone to each health facility

health_gdf_prj['geometry'] = health_gdf_prj.geometry.buffer(1000)   # in meters

Apply spatial join

#Spatial join
health_gdf_prj_joined = gpd.sjoin(clusters_centroid_proj, health_gdf_prj, op='within', how='right')
health_gdf_prj_joined.head(2)
png

Group features based on index

# Basic statistics to calculate
stat_list = ['sum', 'mean', 'count']

# Apply groupby function
HF_buf_stats = health_gdf_prj_joined.dissolve(by='index', aggfunc=stat_list)

Merge with initial health facilities dataset

HF_buf = health_gdf_prj.merge(HF_buf_stats, on="index")

Convert result to geodataframe in original CRS

## Creating a geo-dataframe - appointing geometry attribute
HF_buf['geometry'] = list(zip(HF_buf['Lon'], HF_buf['Lat']))
HF_buf['geometry'] = HF_buf['geometry'].apply(Point)
HF_buf_gdf = gpd.GeoDataFrame(HF_buf, geometry='geometry', crs={'init': 'epsg:4326'})

After tidying up the results, option 2 yields the following geodataframe, in which basic stats are calculated per health facility, based on the buffer zone.

png

Descriptive analysis

Regardless of the selected approach (option 1 or option 2) the resulted geodataframe is of similar format. Next, we want to run a descriptive analysis over the result in order to get an idea of the type of data and potential classification techniques.

Result of Option 1

HF_hub_gdf.pop_sum.describe()count       19.000000
mean 26882.061818
std 10771.348988
min 5685.042735
25% 21419.189453
50% 25469.963919
75% 30545.254864
max 59730.107002
Name: pop_sum, dtype: float64
bin_values = np.arange(start=0, stop=HF_hub_gdf.pop_sum.sum())

HF_hub_gdf['pop_sum'].hist(figsize=[14,6])
plt.title("Population served by each health facility")

plt.show()
png
Graph by Author
# remove inconsistencies hospital type names    
HF_hub_gdf["fctype"].replace("\?", 'u', regex=True, inplace=True)

# remove inconsistencies hospital type names
HF_hub_gdf["fcname"].replace("\?", 'u',regex=True, inplace=True)
ls_of_hf = HF_hub_gdf.fctype.unique()
print('\nThis administrative area has {} different types of hospitals:\n'.format(len(ls_of_hf)))
print(*ls_of_hf, sep = '\n')
This administrative area has 2 different types of hospitals:

Centro de Saude Rural II
Centro de Saude Rural I

Result of Option 2

HF_buf_gdf.pop_sum.describe()count       19.000000
mean 4121.879924
std 6517.540750
min 253.035754
25% 819.507531
50% 991.796760
75% 2666.860946
max 20250.234550
Name: pop_sum, dtype: float64
bin_values = np.arange(start=0, stop=HF_buf_gdf.pop_sum.sum())

HF_buf_gdf['pop_sum'].hist(figsize=[14,6])
plt.title("Population served by each health facility")

plt.show()
png
Graph by Author
ls_of_hf = HF_buf_gdf.fctype.unique()
print('\nThis administrative area has {} different types of hospitals:\n'.format(len(ls_of_hf)))
print(*ls_of_hf, sep = '\n')
This administrative area has 2 different types of hospitals:

Centro de Saude Rural II
Centro de Saude Rural I

The descriptive statistics provide an idea of the number of people (potentially) served by each health facility. Other characteristics may be retrieved as well, based on the attributes of the original data (e.g. type of health facilities we identified in the area). These information can be used to help classify the health clinics into different “energy use” groups (see Step 3 below).

Step 3. Characterizing type of health facilities & Predicting electricity requirements

You may select any of the above approaches to proceed.

HF = HF_hub_gdf      # you may change the input gdf here# Initialize a Health Facility Category column
HF['HF_Cat'] = 0

Classification of health facilities

The classification process presented below is based on the idea that health facilities are grouped into four main categories namely:

  • Type 4: District/Referral Hospital (> 145 beds)
  • Type 3: Rural Hospital (~55 beds)
  • Type 2: Small Impatient Clinic (~14 beds)
  • Type 1: Rural Dispensary — No Inpatient (~4 emergency beds)

This categorization is consistent with similar approaches in literature (see for example Franco et al). It has also been adopted and used in the HOMER Powering Health Tool.

In this example, we use three values (we have calculated in the previous step) to help classify the health facilities accordingly. These include:

  1. “urban_mean”: This is the average urban status {urban:1 or rural:0} for all clusters connected with each health facility. Value > than 0.5 indicates that health facility serves a more urban settlements and thus is more likely to be of higher type.
  2. “pop_sum”: This is the sum of the population in all clusters associated with each health facility (based on the nearest hub or buffer analysis from the previous step). Higher potentially served population is likely to indicate higher type.
  3. “elecpop_sum”: This is the sum of electrified population in all clusters associated to each health facility. This indicates — to some extent — the electrification status on the vicinity thus can potentially be connected with the type of health facility.

In the absence of more elaborate (e.g. survey) data, cross-validation of this approach is very difficult at the moment. Therefore, this part of the methodology is open to user input. The idea is that the modeler can work in close collaboration with local stakeholders in order to calibrate the process or add “predictors” if more information become available.

Note! In our example, health facilities do contain some attributes such as for example their type (e.g. Centro de Saude Rural I or II). This — or any similar characteristics — can be used if available and help estimate electricity requirements. Nevertheless, in the following step we only work with the retrieved population cluster data stats, so that the methodology is replicable for other regions.

# District/Referral hospital with more than 145 beds
HF.loc[(HF['urban_mean'] >= 0.5) &
(HF['pop_sum'] >= 20000) &
(HF['elecpop_sum'] >= 0), 'HF_Cat'] = 4

# Rural hospital - 50 beds
HF.loc[(HF['urban_mean'] <= 0.5) &
(HF['pop_sum'] >= 20000) &
(HF['elecpop_sum'] >= 0), 'HF_Cat'] = 3

# Small impatient clinic - 14 beds
HF.loc[(HF['urban_mean'] <= 0.5) &
(HF['pop_sum'] < 20000) &
(HF['elecpop_sum'] > 0), 'HF_Cat'] = 2

# Rural dispensary - no inpatient - 4 emergency beds
HF.loc[(HF['urban_mean'] <= 0.5) &
(HF['pop_sum'] < 20000) &
(HF['elecpop_sum'] == 0), 'HF_Cat'] = 1

Plotting result of classification

ax = HF['HF_Cat'].value_counts().plot(kind='bar',figsize=[12,7])
ax.yaxis.set_major_locator(MaxNLocator(integer=True))
plt.title("Categorization of Health facilities in the selected admin area")
Text(0.5, 1.0, 'Categorization of Health facilities in the selected admin area')
png
Number (y-axis) and type (x-axis) of health facilities in Mecanhelas, Mozambique — Graph by Author

Assign electricity requirements based on Category type

Similar to classification (see above) estimating electricity requirement per health facility is not an easy task considering the lack of quantitative, ground data. In this example, we use the HOMER Powering Health Tool interface to quantify potential electricity requirement (kWh/year) per type of health facility. The modeler can select the appropriate equipment and customize electricity targets accordingly.

HF.loc[(HF['HF_Cat'] == 1), 'HF_kWh'] = 5.7*365
HF.loc[(HF['HF_Cat'] == 2), 'HF_kWh'] = 13.9*365
HF.loc[(HF['HF_Cat'] == 3), 'HF_kWh'] = 37*365
HF.loc[(HF['HF_Cat'] == 4), 'HF_kWh'] = 361.1*365

The estimated electricity demand for health facilities in Mecanhelas, based on the above assumptions, is estimated at ~225,314.5 kWh/year.

Print location of health facilities with estimated demand on a map

# Vizualize result on an interactive map exported as html 

#Define limits for map rendering
x_ave = HF["Lon"].mean()
y_ave = HF["Lat"].mean()

# Create the map using folium module
map_dem = folium.Map(location=[y_ave,x_ave], zoom_start=6, control_scale=True)

# Definition of a function that returns different color names based on lcoe result categorization
# Colors are in Hexa-code e.g. #RRGGBB
def colorvalue(x):
if x <= 0.5:
return "#ADFF2F"
elif x >= 1 and x < 5000:
return "#32CD32"
elif x >= 5000 and x < 10000:
return "#228B22"
elif x >= 10000 and x < 15000:
return "#008000"
elif x >= 15000 and x < 20000:
return "#006400"
else:
return "#000000"

# Then we create a marker for each cluster;
# We pass coordinates as attributes to appear on the rendered map
for index, row in HF.iterrows():
el_demand = row["HF_kWh"]
color_code = colorvalue(el_demand)
folium.CircleMarker([row["Lat"], row["Lon"]],
radius=2,
color=color_code,
popup="<b>Name:</b> {}, <br> <br> <b>Type:</b> {}, <br> <br> <b>Demand:</b> {} kWh/year, <br> <br> <b>".format(row["fcname"], row["fctype"], row["HF_kWh"]),
fill = True,
fill_opacity=0).add_to(map_dem)

# We define the limits of the legend and fix its printout format
# We use branca module to create a colormap legend and then add legend to the map
min_dem = HF["HF_kWh"].min()
max_dem = HF["HF_kWh"].max()
min_dem = float("{0:.2f}".format(min_dem))
max_dem = float("{0:.2f}".format(max_dem))
legend = cm.LinearColormap(['#ADFF2F','#32CD32','#228B22','#008000','#006400','#000000'],
index=None, vmin=min_dem, vmax=max_dem)
legend.add_to(map_dem)

expl_text = '#ADFF2F:1, #32CD32:2, #228B22:3, #008000:4, #006400:5'
iframe = folium.IFrame(expl_text, width=700, height=450)
popup = folium.Popup(iframe, max_width=3000)

Text = folium.Marker(location=[x_ave,y_ave], popup=popup,
icon=folium.Icon(icon_color='green'))
map_dem.add_child(Text)

# Create a new directory where the map(s) can be saved
try:
os.makedirs('maps')
except FileExistsError:
pass

map_dem_output = 'maps/map_{}_{}.html'.format("Health_Facilities", "Mecanhelas")
map_dem.save(map_dem_output)
# Publish map on datapane for easier rendering in websites
dp.Report(dp.Plot(map_dem)).publish(name='HF_elec_demand_Mecanhelas', visibility='PUBLIC')
# Add the link that leads to the final map output
display(Markdown('<a href="{}" target="_blank">Click here to render the map of electrification of health facilities </a>'.format(map_dem_output)))
Location and estimated electricity demand of health facilities in Mecanhelas, Mozambique. Map by Author — rendered via datapane — view full report here.

Step 4. Assign health facility targeted electricity demand back to population clusters

Electricity demand of each Health Facility is assigned to the closest cluster.

Find nearest cluster

HF["id"] = HF.apply(calculate_nearest, destination=clusters_centroid, val="id", axis=1)

Merge result to initial clusters

clusters_gdf = gpd.read_file(clusters_path + "\\" + clusters_name)final_clusters_HF_kWh = HF.merge(clusters_gdf, how="outer", on='id')
png

Step 5. Map results and export the updated population cluster dataset

final_clusters_HF_kWh_gdf = gpd.GeoDataFrame(final_clusters_HF_kWh, geometry='geometry', crs={'init': 'epsg:4326'})# Create the axis first
fig, ax = plt.subplots(figsize=(10, 10))

admin_gdf.plot(ax=ax, edgecolor='brown', alpha=0.2)

final_clusters_HF_kWh_gdf.sort_values('HF_kWh', ascending=True).plot(ax=ax, column='HF_kWh',
legend=True, markersize=3, cmap="viridis",
figsize=(10, 10), alpha=0.5)

ax.set_aspect('equal', 'box')
txt = ax.set_title('Electricity requirements (kWh/year) for HF in Mecanhelas'.format(""))
png
Image by Author

Export layer as geopackage

final_clusters_HF_kWh_gdf.to_file("Mecanhelas_HF_kWh.gpkg", driver="GPKG")

Note here that both method and results presented in this example are only illustrative; showing only how spatial analysis can be used to support similar planning activities.

Final Remarks

This is an exploratory approach, based only on existing, open access datasets. Thus, it can only provide a rough “first go” over the attempt to quantify electricity demand in health facilities in SSA especially in remote, rural areas where not much data is available.

The analysis can certainly be improved by:

  • Working with better attributed data of geolocated health facilities; such datasets might already exist within national agencies such as the Census Bureaus/Departments, Public Statistics, National surveys and mapping agencies as well as other Departments/Ministries.
  • Working in close collaboration with local stakeholders from the health and energy practice in the targeted area, in order to better calibrate and/or improve the method presented here.
  • Using detailed survey data (where available) and Machine Learning to improve classification and prediction of the type and the electricity requirements of health facilities, accordingly.

Upon satisfactory results, the final product (population clusters including electricity demand for Health Facilities) can be used in a geospatial electrification model for the development of a least-cost investment plan. An example is the GEP-OnSSET model, which has been recently modified to allow inclusion of productive uses of electricity.

For reference, you may check out the following maps indicating the least-cost electrification option for health facilities and/or the population clusters associated them in Mozambique.

Note!! The results below are only illustrative.

Map 1: Location of Health Facilities in Mozambique including their least-cost electrification technology (result available as pop-up).

Map by Author — rendered via datapane — view full report here.

Map 2: Least-cost electrification plan for population clusters associated with one or more health facilities in Mozambique (result available as pop-up for health facilities and as tooltip for clusters).

Map by Author — rendered via datapane — view full report here.

Link to supporting material

For a quick overview of the methodology and code supporting this blog, you may look at the nbviewer.

Or you can access (and contribute) the fully-fledged version of the code on GitHub.

This work is part of a multiyear research effort supporting the development of the Global Electrification Platform.

Thanks for reading!

Special thanks to Benjamin Stewart, Ashish Shrestha, Rhonda Lenai Jordan, Nicolina Erica Maria Lindblad, Babak Khavari, Andreas Sahlberg and Mark Howells who have all supported this work.

--

--