Inverse Distance Weighting (IDW) is a geostatistical method designed to interpolate unknown values of a spatial variable at specific locations based on known values at surrounding points. The fundamental idea behind IDW follows Tobler’s first law of geography, which says that ‘Everything is related to everything else, but near things are more related than distant things’. Namely, the closer a spatial unit with a known value is to the spatial unit with an unknown value, the higher its influence on the interpolated value.
In this article, we test the IDW method to infer missing country-level population density levels using Africa as an example. For this, I use a world map enriched by population estimates and curated by Natural Earth (more on the public availability of the Data here), then artificially erase several data points, which I infer using IDW. Finally, I compare the original and the inferred values of the erased population densities.
All images were created by the author.
Data preparation
Here, I am going to rely on GeoPandas’ built-in map dataset, ‘naturalearth_lowres.’ This is a global map sourced by Natural Earth and enriched by country-level population estimates.
Once the global map is imported, let’s filter it down to Africa. Then, let’s compute the area of each country and divide it by the built-in population estimates to arrive at a population density value. Here, we note that the proper area computation would include transforming each country into a local coordinate system first; however, for the sake of simplicity, now we skip that part and focus on the interpolation aspect.
# Library import
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
# Load world countries dataset
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# Filter for Africa
gdf_africa = gdf[gdf['continent'] == 'Africa']
gdf_africa['area'] = gdf_africa.geometry.area
gdf_africa['pop_density'] = gdf_africa['pop_est'] / gdf_africa['area']
gdf_africa.head(5)

Next, let’s create a copy of the African map GeoDataFrame, and pick a few random countries. Then, replace the population density values of these countries by nan-s to simulate missing data.
# get a copy of the original data
gdf_africa_missing = gdf_africa.copy()
# Simulating missing data, with 4 randomly picked countries
# By introduce missing values
indices_to_replace = np.asarray([57, 78, 48, 65])
gdf_africa_missing.loc[indices_to_replace, 'pop_density'] = np.nan
Now, let’s visualize the original and the artificially modified maps of Africa.
# Show missing data
vmin = gdf_africa_missing.pop_density.min()
vmax = gdf_africa_missing.pop_density.max()
f, ax = plt.subplots(1, 2, figsize=(10, 5))
gdf_africa.plot(column='pop_density', ax=ax[0], cmap='pink',
edgecolor='k', vmin=vmin, vmax=vmax, legend=True)
gdf_africa_missing.plot(color='none', ax=ax[1], edgecolor='k')
gdf_africa_missing.plot(column='pop_density', ax=ax[1], cmap='pink',
edgecolor='k', vmin=vmin, vmax=vmax, legend=True)
ax[0].set_title("Population density - Original",
fontsize = 12, pad = 12)
ax[1].set_title("Population density - Missing values",
fontsize = 12, pad = 12)
for aax in ax:
aax.axis('off')

Spatial interpolation
After data preparation, it is time for us to perform the IDW interpolation to estimate the missing population density values, as follows.
# Importing scipy
from scipy.spatial import cKDTree
# Defining e function to perform the IDW
def idw_interpolation(xi, yi, zi, xi_interp, yi_interp, power=2):
tree = cKDTree(np.c_[xi, yi])
# k nearest neighbors
distances, idx = tree.query(np.c_[xi_interp, yi_interp], k=8)
weights = 1 / distances**power
weights /= weights.sum(axis=1)[:, None]
zi_interp = np.sum(weights * zi[idx], axis=1)
return zi_interp
# Prepare data for interpolation
gdf_africa_interpol = gdf_africa_missing.copy()
known = gdf_africa_interpol[gdf_africa_interpol['pop_density'].notna()]
unknown = gdf_africa_interpol[gdf_africa_interpol['pop_density'].isna()]
xi = known.geometry.centroid.x.values
yi = known.geometry.centroid.y.values
zi = known['pop_density'].values
xi_interp = unknown.geometry.centroid.x.values
yi_interp = unknown.geometry.centroid.y.values
# Perform IDW interpolation
zi_interp = idw_interpolation(xi, yi, zi, xi_interp, yi_interp)
# Assign interpolated values back to the GeoDataFrame
gdf_africa_interpol.loc[gdf_africa_interpol['pop_density'].isna(),
'pop_density'] = zi_interp
After careful computations, let’s visualize the original and the interpolated maps of Africa and try to compare the two maps just by eye-balling first.
# Plot the results
f, ax = plt.subplots(1, 2, figsize=(10, 5))
gdf_africa.plot(column='pop_density', ax=ax[0],
cmap='pink', edgecolor='k',
vmin=vmin, vmax=vmax, legend=True)
gdf_africa_interpol.plot(color='none',
ax=ax[1], edgecolor='k')
gdf_africa_interpol.plot(column='pop_density', ax=ax[1],
cmap='pink', edgecolor='k',
vmin=vmin, vmax=vmax, legend=True)
ax[0].set_title("Population density - Original",
fontsize = 12, pad = 12)
ax[1].set_title("Population density - Interpolated",
fontsize = 12, pad = 12)
for aax in ax:
aax.axis('off')

At first glance, the interpolated values look very similar to the original ones, which is great news. To further confirm this suspicion, let’s create a correlation analysis as well.
# Compare original and interpolated values for missing countries
missing_countries =
set(gdf_africa_missing[gdf_africa_missing['pop_density'].isna(
)].name)
missing_original =
gdf_africa[gdf_africa.name.isin(missing_countries)][['name',
'pop_density']].set_index('name')
missing_interpol =
gdf_africa_interpol[gdf_africa_interpol.name.isin(missing_countries)]
[['name', 'pop_density']].rename(columns={'pop_density':
'pop_density_interpol'}).set_index('name')
interpol_comparison = missing_original.merge(missing_interpol,
left_index=True,
right_index=True)
print(interpol_comparison.corr())
plt.plot(interpol_comparison.pop_density,
interpol_comparison.pop_density_interpol, 'o')
plt.xlabel("Original Population Density")
plt.ylabel("Interpolated Population Density")
plt.title("Comparison of Original and Interpolated Values")
plt.show()

Supporting the side-by-side map comparisons, the correlation analysis presents a significantly high correlation value between the values of the four interpolated population densities as compared to the ground truth values. This illustrates well that even such a short and simple script using the IDW method can efficiently infer the values of missing spatial data.
Conclusion
In this article, we reviewed how to use the Inverse Distance Weighting method to infer the values of missing spatial data records. For that, we used the population density values of Africa as a sample, and after artificially deleting a few data points, we interpolated their values with a high correlation to the original ground-truth information, which method may be useful in many real-world scenarios when we are truly missing spatial information.
In case you would like to advance your skills in spatial statistics and machine learning, check out my brand new book, Geospatial Data Science Essentials – 101 Practical Python Tips and Tricks!