The world’s leading publication for data science, AI, and ML professionals.

GeoPandas: A Practical Guide

Mapping earthquakes from 1965 to 2016

Getting Started

Photo by Andrew Buchanan on Unsplash
Photo by Andrew Buchanan on Unsplash

GeoPandas is a Python library designed to work with geospatial data. It makes it fairly easy to create visualizations based on geographical locations.

In this post, we will visualize the significant earthquakes that occurred between 1965 and 2016. The dataset is available on Kaggle.

GeoPandas has two main data structures which are GeoDataFrame and GeoSeries. They can be considered as subclasses of the Series and DataFrame of Pandas.

Let’s start by installing and importing GeoPandas along with the other libraries we will use.

pip install geopandas
import geopandas
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

A GeoDataFrame can be created by modifying a pandas DataFrame. Thus, we will first read the dataset into a pandas DataFrame.

eq = pd.read_csv("/content/earthquakes.csv")
eq.shape
(23412, 21)

The dataset contains over 23412 events most of which are earthquakes. We will filter the DataFrame so that it only contains earthquake data.

eq = eq[eq['Type'] == 'Earthquake']

There are also some redundant columns for our analysis so I will also filter out those columns.

eq = eq[['Date', 'Time', 'Latitude', 'Longitude', 'Depth', 'Magnitude']]
eq.head()
(image by author)
(image by author)

We have a DataFrame that contains the data, location, depth, and magnitude of over 20 thousand earthquakes. In order to use GeoPandas, we need to convert this pandas DataFrame to a GeoDataFrame.

We will use the GeoDataFrame function as follows:

gdf = geopandas.GeoDataFrame(eq, geometry=geopandas.points_from_xy(eq.Longitude, eq.Latitude))
gdf.head()
(image by author)
(image by author)

The difference between the GeoDataFrame and pandas DataFrame is a GeoSeries called "geometry". When a spatial method is applied on a GeoDataFrame, it will act on the geometry column.

Think of the "geometry" column as a re-formatted version of the latitude and longitude values.

We now have the earthquake data stored in a GeoDataFrame. The next step is to draw a map of the world which can easily be done using the "world" GeoDataFrame.

world = geopandas
.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
world.columns
Index(['pop_est', 'continent', 'name', 'iso_a3', 'gdp_md_est', 'geometry'], dtype='object')

It contains basic information about countries and their locations. Let’s draw an empty world map now.

world.plot(color='white', edgecolor='black', figsize=(12,8))
(image by author)
(image by author)

In order to draw the earthquake map, we will create an Axes object of the world map and then draw the earthquakes based on the "geometry" column.

ax = world.plot(color='white', edgecolor='black', figsize=(16,12))
gdf.plot(ax=ax, color='red', markersize=2)
plt.show()
(image by author)
(image by author)

This map contains all the significant earthquakes that occurred between 1965 and 2016. If you do a quick google search of earthquake fault lines, you will see that they overlap with the map above.

The markersize parameter adjusts the size of the markers that locate the earthquakes. You can also pass a column name and the size of the marker will be adjusted based on the value in that column. I thought about using the magnitude to resize the markers but the differences did not seem to be distinguishable.

We can also draw a map of earthquakes in a specific location. For instance, there have been lots of earthquakes in Japan. I’m not sure but it may even be the country that had the most earthquakes in the world.

One way to focus on a specific country is to filter the earthquakes based on latitude and longitude values. The latitude and longitude values for Japan are given as:

  • latitude = 36.204824
  • longitude = 138.252924

We can create a range around these values to be used as filtering ranges.

japan_lat = 36.204824
japan_long = 138.252924
japan_eq = eq[(eq.Latitude > 30) &amp; (eq.Latitude < 42) &amp; (eq.Longitude > 130) &amp; (eq.Longitude < 145)]
japan_eq = japan_eq.reset_index(drop=True)

I adjusted the range so that the location occupies the area of Japan. Please note that these values are not the borders of Japan.

Let’s create a GeoDataFrame that only contains the earthquakes that occurred in or around Japan.

japan_gdf = geopandas.GeoDataFrame(japan_eq, geometry=geopandas.points_from_xy(japan_eq.Longitude, japan_eq.Latitude))

We will plot the map of Japan and mark the earthquakes in japan_gdf.

ax = world[world.name == 'Japan'].plot(color='white', edgecolor='black', figsize=(12,8))
japan_gdf.plot(ax=ax, color='blue', markersize=japan_gdf['Magnitude']*4)
plt.title("Earthquakes, 1965-2016", fontsize=16)
plt.show()

In order to filter the map, we used the "name" column of the world GeoDataFrame.

Here is the resulting map of Japan’s earthquakes.

(image by author)
(image by author)

Lots of earthquakes had occurred in Japan. The density of the markers indicates that most of them are around the east coast.


Conclusion

GeoPandas is a functional library that expedites the process of creating geospatial visualizations. It provides many functions and methods to enrich the maps.

If you are working or plan to work with geospatial data, I highly recommend checking the documentation of GeoPandas. They also provide examples that will help to adapt the functions and methods more easily.

Thank you for reading. Please let me know if you have any feedback.


Related Articles