Hands-on Tutorials

Data Preparation for Geospatial Analysis & ML with Laguerre-Voronoi in Python

A weighted Voronoi tessellation of demographic and health survey data to be used in the prediction of social and economic well-being .

Sunayana Ghosh
Towards Data Science
6 min readApr 12, 2021

--

Laguerre Voronoi Tessellation of DHS Data for GADM Boundary of India

In this article the application of Laguerre-Vornoi tessellation to Demographic and Health Survey (DHS) data is explored. A pipeline for cleaning and transforming the DHS data is proposed along with the associated python code.

Demographic and Health Survey data

DHS surveys contain confidential information that could potentially be used to identify an individual through unique information or PII. To avoid this the DHS Program has developed an approach to degrade accuracy of the GPS coordinates so that true place of residence cannot be derived. In all DHS surveys the center GPS coordinate of the populated place in a cluster is recorded and separate degradation error values are applied depending on whether a cluster is urban or rural. A random error of 5 km maximum in rural areas and 2 km maximum in urban areas is applied, this decreases the likelihood of household identification by tenfold. The new list of coordinates can be thought of as having a circular error buffer zone of (5km or 2km) within which the actual value resides. This degradation poses a challenge for further data analysis and machine learning tasks on this data. Comprehensive and accurate measurements of economic well-being are fundamental inputs into both research and policy making. The final goal of the World Resources Institute Project is to be able to predict Demographic and Health Survey based estimates with Remote Sensing and OpenStreetMaps data for finest spatial micro-regions in India.

Laguerre Voronoi Diagrams

Introduced in 1985 in¹, Laguerre Voronoi diagrams are an extension of the concept of Voronoi diagrams for n points in the plane to that of Laguerre geometry for n circles in the plane. It is a partition of the Euclidean plane into polygonal cells defined from a set of circles and are also known as Power Diagrams. The diagrams used in this article were generated from the following GitHub Gist.

Example of power diagram of 32 points in the plane where each point has a different radius.

Laguerre Voronoi Tessellation of DHS Data

Due to the nature of the degradation introduced in the DHS Data, Laguerre Voronoi tessellation of the DHS data set is a viable model to create polygonal partition of the map of a country for further data analysis. India is taken as an example for introducing the pipeline.

Preprocessing DHS Data

  • Note that the intersection of 0 degrees latitude (Equator) and 0 degrees longitude(Prime Meridian) on the map falls in the middle of the Atlantic Ocean, in the Gulf of Guinea off the coast of western Africa.
Image showing the intersection of Equator and Prime Meridian
  • Hence all entries from any country specific DHS GeoDataFrame can be dropped which have both latitude and longitude entries as 0.0.
DHSGeographicData class handles the DHS data and the member method clean.
  • Next extract the columns important for computation of the Laguerre-Voronoi diagrams using the method DHSGeographicData.extract_dhs(). The shapefile IAGE71FL.shp for India from geographic data IAGE71FL.zip is used for extraction and the following GeoDataFrame is obtained:
The GeoDataFrame after the extraction step showing the columns extracted from IAGE71FL.shp
Plot of the latitude, longitude points represented by the geometry column of the extracted GeoDataFrame.
  • Then assign the weights to the different sites depending on whether they are urban or rural and extract the sites and weights using the method DHSGeographicData.get_sites_and_radii().

Generate the Weighted Voronoi

Using the Laguerre-Voronoi GitHub Gist the weighted Voronoi tessellation is obtained.

Computing the weighted Voronoi cell map for DHS data.
A plot of the weighted Voronoi tessellation of the DHS cluster for India

Combine DHS data with Voronoi cells

Next the DHS GeoDataFrame is combined with the Voronoi cells, such that every point in the DHS cluster is assigned exactly one Voronoi cell. The aim is to create a new ESRI shapefile where the geometry is made up of the Voronoi cells. The member method DHSGeographicData.combine_dhs_voronoi(poly_lst) is used for this

Member method of class DHSGeographicData to combine DHS data with Voronoi Cell map.

Clip the combined GeoDataFrame with GADM country outline

The GADM website is used for downloading country specific maps and spatial data. In the final step the combined GeoDataFrame of DHS data and Vornoi cells is clipped with the country boundary shapefile downloaded from GADM. The following steps were used for clipping the GeoDataFrame and storing the final output into a shapefile for further use.

The full pipeline

All the steps of the process can be found in dhs_data_voronoi.ipynb. The images below show the DHS and Voronoi combined GeoDataFrame clipped at the country boundary.

Combined GeoDataFrame clipped with administrative boundary of India as obtained from GADM

Conclusion

Spatial partitioning is the process of dividing a geographic area into a finite number of non-overlapping areas based on given set of constraints such as spatial attributes, e.g., physical or human geographic factors, for an overview of existing methods we refer to⁵. Weighted Voronoi is an example of a spatial partitioning method. Voronoi Diagrams are widely used to deal with human geographic problems. Some of the applications of Voronoi diagrams are in public facilities optimization, urban planning and zone design. In ecology, Voronoi Diagrams are used to study the growth patterns of forests and forest canopies and may also be helpful in developing predictive models for forest fires. The reason behind using weighted Voronoi diagrams for this project is due to the fact that spatial structure of some social and economic variables are a reflection of the fact that high variable values tend to concentrate near other high values and low values appear in geographical proximity to each other. Future work will show whether this approach is indeed useful in predicting the social and economic well-being.

About Me

My expertise lies in the areas of Computational Geometry , Geometry Processing and Software Development in C++ and Python. Currently I am developing my skills in the areas of Machine Learning related to Geospatial Computing and exploring the applications of geometry in either of these areas. I look forward to potential collaborations in this field related to socially relevant projects. You can connect with me on LinkedIn and Medium.

Acknowledgements

  • This work was done as part of the Solve For Good project: Creating a well-being data layer using machine learning, satellite imagery and ground-truth data.

I would like to thank :

References

[1] Imai, H., Iri, M. & Murota, K.(1985). Voronoi Diagram in the Laguerre Geometry and Its Applications, SIAM Journal of Computing, 14(1), 93–105. doi:10.1137/0214006

[2] Guidelines On The Use of DHS GPS Data

[3] What is at Zero Degrees Latitude and Zero Degrees Longitude?

[4] GitHub Gist by Devert Alexandre on Laguerre Vornoi Diagrams

[5] Wang, J., Kwan, M.-P., & Ma, L. (2014). Delimiting service area using adaptive crystal-growth Voronoi diagrams based on weighted planes: A case study in Haizhu District of Guangzhou in China. Applied Geography, 50, 108–119. doi:10.1016/j.apgeog.2014.03.001

--

--