The world’s leading publication for data science, AI, and ML professionals.

7 Geospatial data processing tips in Python

Part 2: How to easily and effectively incorporate spatial features in Python using Geopandas.

Photo by Lucas Ludwig on Unsplash
Photo by Lucas Ludwig on Unsplash

I love using Geopandas, and over the last two months, I have been sharing some of the best tips and tricks on processing geospatial data with Geopandas. This article is the second part of the series, where I will share another seven tips and tricks to make your life easier in dealing with geospatial data in Python.

The first part of this Geopandas Tips covers five pro tips, and you can read the article from the link below.

5 Geospatial Tips and Tricks in Python

Here is the list of tips we include in this article.


Tip #1 – CSV to Geodataframe

A lot of datasets come in CSV formats, and many of these datasets have coordinates (latitude and longitude). Converting these datasets to a Geodataframe opens up a whole lot of geospatial processing functionalities in Geopandas.

To convert a CSV to a Geodataframe, we read first the CSV file with Pandas as shown in the below snippet of code. Then, we can transform the data frame to a Geodataframe using points_from_xy function in Geopandas.

Converting a data frame to a Geodataframe using Geopandas - Image by Author
Converting a data frame to a Geodataframe using Geopandas – Image by Author

Having converted our data frame to a Geodotaframe, we can now perform geometry operations including plotting geospatial data, calculating distances and many more.

Tip #2 – Geometry Filter

With pandas, we can filter out data using rows or columns. With a Geodataframe, in addition to the pandas filtering, you have access to filter data by their geometries. The following example shows reading the famous taxi data. This dataset is a relatively large file so we can avoid reading the whole dataset using geometry filter where we read only points within Manhattan boundary.

Filtering data by the geometry - Image by Author
Filtering data by the geometry – Image by Author

The mask parameter takes a geometry so we are providing here a borough name and Geopandas will take care of excluding all other boroughs in reading the dataset.

Tip #3 – Dissolve

Usually, we have different polygons to work with geographic boundaries, including among other zip codes, neighbourhoods, districts, etc.. How do you merge these subunits without losing the statistical counts from the original dataset.

With non-geographic data, all we need is using groupby function. However, with spatial data, we need to also aggregate geometric features. In that case, we can use dissolve. The following example shows dissolving country boundaries to continents

Dissolving boundaries - Image by Author
Dissolving boundaries – Image by Author

The following visualization shows the country boundaries (upper left) dissolved into continent boundaries (lower right) using the above code.

Dissolve Example (Country boundaries to Continents) - Image by Author
Dissolve Example (Country boundaries to Continents) – Image by Author

Note that you need to have a column to use the aggregation dissolve. Also, you can carry out different statistical calculations. In this example, we have shown how to calculate the sum.

Tip #4 – Create Bubble Maps

Bubble maps are an excellent alternative to choropleth maps when you have point dataset and want to visualize the quantity by size. It is not that hard to create bubble maps in Geopandas, but I believe it is not apparent how to make one.

To create a bubble map, you only need to set the marker size parameter to any quantity column as shown in the following code snippet.

Creating a bubble map with Geopandas - Image by Author
Creating a bubble map with Geopandas – Image by Author

Besides, you need to take care of the overlapping point circles; therefore, I set here the alpha to a low value to be transparent. You can see the output visualization.

Bubble Map - Image by Author
Bubble Map – Image by Author

Tip #5 – Find Nearest Distance

Distances and finding what is nearby is part and parcel of spatial analysis. In this tip, I show you how to calculate distances and find out the nearest point effectively.

Let us say; we have cities. The nearest_city function takes a Point (Latitude and Longitude) and the cities Geodataframe and returns the nearest city to the provided coordinates.

Nearest Neighbourhood calculation - Image by Author
Nearest Neighbourhood calculation – Image by Author

You can have your data as well. It does not have to be cities per se. But with any point dataset, you can now calculate the nearest neighbourhood using this function.

Tip #6 – To Geojson/Geopackage

Who does not like shapefiles! Besides shapefiles, we can also export geo-processed data to other formats to store locally. In this example, I show how you can store Geodataframes as GeoJSON and Geopackages locally.

Storing Geospatial data as Geojson/Geopackage. - Image by Author
Storing Geospatial data as Geojson/Geopackage. – Image by Author

With GeoJSON, you need to specify the driver as GeoJSON. Saving in Geo package format, we need to set the layer name and the driver as GPKG.

Tip #7 – Read from PostGIS

Become a power user and set up your PostGIS database. With Geopandas, you can read the data using SQL. This code snippet goes through an example of reading data from a PostgreSQL database using Geopandas and Sqlalchemy.

Connecting to a PostgreSQL/PostGIS database - Image by Author
Connecting to a PostgreSQL/PostGIS database – Image by Author

SQL syntax can be wrapped as strings, and Geopandas will execute it. This method is a great way to access your data through a spatial database. You can get started in installing and using PostgreSQL following this article:

Spatial Data Science with PostgreSQL/PostGIS

Conclusion

I hope you enjoyed this round of Geospatial data processing tips in Python. In this article, we have seen seven tips to use Geopandas for geospatial data analysis effectively.

Some of these tips are directly geometric manipulations which are essential for processing Geospatial data: converting a Dataframe to Geodataframe, the geometry filter and dissolving. As a Geospatial data scientist, the heavy lifting of the geometric functionality is life-saving.

The other remaining tips touch on geospatial data visualization, spatial database connections, finding nearest neighbour and geospatial vector data format output.

If you like to follow these tips and tricks as I post them on Twitter, You can find them at @spatialML


Related Articles