The world’s leading publication for data science, AI, and ML professionals.

What is new in Geopandas 0.70?

Major changes and new improvements with examples and code illustrations.

Geopandas is the workhorse of doing Geospatial data science in Python and extends the datatypes of Pandas to perform spatial data operations. Geopandas 0.70 has just been released yesterday 17 February and with it comes some significant changes and improvements.

I highlight here some of the best new features available with Geopandas 0.70.

Native Clip functionality

Clipping has become easy with a native function to clip a GeoDataFrame to the spatial extent of other shapes. Clipping is one of the most common used Geospatial data processing functionality; however, in previous releases, Geopandas did not have a straight forward function to perform it. If you want a specific area of your geographic data, you have to perform clipping to get your area of interest.

Now with a new geopandas.clip function, you can easily clip your data with the spatial extent provided. Let us see an example.

import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon, LineString
gpd.__version__
'0.7.0'

We use Geopandas available datasets, capital cities, the world boundaries. Also, we subset Africa and also create a polygon to mark spatial extent we are interested.

capital_cities = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"))
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
africa = world[world["continent"] == "Africa"]
poly = Polygon([(-35, 0), (-35, 60), (60, 60), (60, 0), (0, 0)])
polygon = geopandas.GeoDataFrame([1], geometry=[poly], crs=world.crs)

Let us see all the data in one image.

fig, ax = plt.subplots(figsize=(12,10))
world.plot(ax=ax)
africa.plot(ax=ax, edgecolor="black", color = "brown", alpha=0.5)
capital_cities.plot(ax=ax, color="Red")
polygon.boundary.plot(ax=ax, color="Red")
Map of all data
Map of all data

The capital cities are marked as red dots, and Africa is shown as green. The Red Rectangle shows the extent we are interested in.

Now if we want to clip only the part of the red rectangle from the world boundaries, you can call geopandas.clip and provide the two GeoDataFrames. First, the world world boundaries and the polygon extents.

clipped = gpd.clip(world, polygon)
fig, ax = plt.subplots(figsize=(12,10))
clipped.plot(ax=ax, color="gray");

And You have got the clipped areas, parts of Africa, Europe and Asia as shown below.

Clipped boundaries
Clipped boundaries

You can use this function not only polygons and polygons but other data types like Lines and points.

Filter Rows while reading files

With current big datasets, the ability to filter rows while reading the data is essential. Imagine having a dataset with millions of rows and not being able to read due to memory issues. With this new functionality, you can provide rows or slices to filter out the data before reading it.

Let us see an example.

cities_filtered = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"),rows=30)

This will only read the first 30 rows of the data. If you want to slice it in the middle, you can add slice.

cities_filtered_slice = gpd.read_file(gpd.datasets.get_path("naturalearth_cities"),rows=slice(10,30))

Only, rows between 10 and 30 are returned with the above code.

Plot Geometry Collection

It was not possible to plot GeometryCollection data, for example, data with different data types (Points, Lines or Polygons). With the current release, you can plot various collections of geometric objects. Let us see an example.

a = LineString([(0, 0), (1, 1), (1,2), (2,2)])
b = LineString([(0, 0), (1, 1), (2,1), (2,2)])
x = a.intersection(b)
gc = gpd.GeoSeries(x)
type(x)
shapely.geometry.collection.GeometryCollection

aand b are just plain LineString, but their intersection returns a collection of Geometry (Lines and Points). If you create a Geoseries out of the collection of Geometries in x, you can now plot it with Geopandas. See the example shown below. You have both a Point and a Line plotted in the same Geodataframe plotted.

Collection of Geometries plotted with Geopandas
Collection of Geometries plotted with Geopandas

PROJ 6 replaces PROJ 4

Geopandas 0.70 release starts using a new projection interface. PROJ 6 replaces PROJ4 and with it brings a better interface and additional information.

Previously gpd.crs returned only a string like this.

{'init': 'epsg:4326'}

However, the current release brings a lot of metadata on Geographic Projections of the data.

world.crs

returns the following useful information

<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 - Ellipsoid: WGS 84 - Prime Meridian: Greenwich

Which you can also access individually, for example, if you want to access the datum of your projection, you can call.

world.crs.datum

And the result shows this part only

DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84",6378137,298.257223563, LENGTHUNIT["metre",1]], ID["EPSG",6326]]

There also other improvements or bug fixes not covered in this article but worth mentioning them. Spatial join in Geopandas can now handle Multindex correctly and preserves the index name of the left Geodataframe. When writing to a file to disk, you can now keep the index if you want. Plotting choropleth maps with missing data is now available with this release.

Conclusion

Thanks to all contributors of this current release. Geopandas 0.70 brings a lot of improvements in the Geospatial data science processes. Clipping, filtering data while reading and plot multi geometry data is now possible thanks to this release. Proj 6 also brings a better user interface with one of the least understood themes in the Geospatial world, Geographic projections.

The code for this article is available in this Google Colab link.

Google Colaboratory


Related Articles