Simple Plotly Tutorials

Part 1: Creating Beautiful Animated Maps

Gabrielgilling
Towards Data Science

--

Image by Author

Animated maps are an efficient way to visualize and communicate data with geographic properties. In this tutorial, you will learn how to deploy the Plotly Express package in Python to quickly make beautiful maps with interactive features.

Plotly is one of the fastest growing visualization libraries available for data scientists, a testament to its ease of use and to the beautiful graphs it can produce. This is the first tutorial in a series of 3 showcasing Plotly capabilities with increasing complexity. This installment will show you how to use Plotly Express to quickly make animated maps. Later tutorials will in turn focus on further customizing Plotly graphs and visualizing them within Dash apps.

The Vancouver Crime Dataset

For this tutorial, we’ll make use of a Kaggle dataset, which provides data on different types of crime in the city of Vancouver, Canada.

Before starting to code, you’ll need to install the following packages in Python. Simply run the following pip installs in your terminal or in your Jupyter Notebook:

  • pip install plotly
  • pip install geopandas

Let’s load in the required packages and the dataset. We can create a Date column using the parse_dates argument as we read in the CSV file. Here, I choose to create the Date column using the MONTH and YEAR columns, but you can also include the HOUR and MINUTE columns if you’d like.

First 5 rows of the Vancouver Crime Dataset. Image by Author

It’s good practice to plot data before starting an analysis in order to spot anything unusual. When looking at the distribution of the Latitudeand Longitude features, I noticed a couple of 0 values that messed with how the data was plotted — with points showing up on the other side of the world, making it impossible to visualize the data in Vancouver!

Let’s get rid of them, and work with a sample of 2000 rows for the purpose of this tutorial.

Mapping Vancouver

1. Scatterplots

The “static” view

Before I do any “fancy” work with libraries like Plotly, I always like to look at a simple version of my maps in order to get a feel for how things should look like. Typically, this involves finding a shapefile or a geojson, and the City of Vancouver’s Open Data Portal has them handy. Since we’ll need the geojson file for mapping choropleth polygons later in the tutorial, let’s go ahead and download it. We load in the geojson with the Geopandas library and plot all of the incidents in our dataset over it.

Distribution of Crime Types Across Vancouver. Image by Author

This looks nice enough but the map looks crowded, making it difficult to derive insights. Let’s look at how Plotly Express can be used to show our audience how crime evolves with time.

The “dynamic” view

Mapboxes is a service that Plotly uses display scatter data on a map. Because it is a third-party service, you will need to generate an access token for yourself in order to display their maps. This can be easily done by following the instructions here. All you need to do is create an account with Mapbox and you will have access to your token — the entire process is free. Once you have your token, simply replace the “your_token” string in the code block below with yours.

px.set_mapbox_access_token("your_token")

You’re now ready to go!

Animations with Plotly Express functions can be quickly implemented by setting a feature as an animation_frame, which will use the feature’s values to subset and display your data. For instance, setting time_col as “YEAR” will allow you to visualize crime over all of the years in the dataset, “MONTH” for all months and so on.

We can specify the scatterplot’s color by setting a “color” parameter, the same way Seaborne’s “hue” parameter works. By default, the values passed to the “animate_frame” argument (which dictate the order in which maps are animated) aren’t ordered, so we add a“category_orders”argument: a sorted list with the values to iterate over.

Voila! We have ourselves a nice interactive map. If you want to show how the data evolves across months (or hours etc.) instead, simply change the time_colargument in the function above.

Animated Scatterplot of crime data in Vancouver. Gif by Author

Notes:

  • Because Plotly Scatter Maps rely on the Mapbox service, a nice advantage of using them is that they will automatically display the map based on the coordinates you provide them— no need to provide a geographic file! However, you will need to provide a geojson file when displaying choropleth maps, as the next section shows.
  • As the gif shows, you can click on the categories in the Type variable to filter the data points shown on the map! This provides great initial interactivity and can help you guide how you convey your insights.

2. Choropleth Maps

The “static” view

Another great way to plot the crime data is to visualize the amount of incidents per neighborhood using a choropleth map, and then to show how those numbers evolve with time. When displaying a choropleth map, we color the polygons (each polygon corresponding to a neighborhood) according to the underlying value we want to visualize. In the following example, we’ll shade each polygon according to the number of crime incidents that occurs within it, with darker shades representing higher amounts of crime incidents and lighter shades representing lower amounts of crime.

Before we plot anything, the dataset needs some additional manipulating. First, I noticed a couple of naming discrepancies between the geojson and crime data, so I renamed a couple of the neighborhoods to make sure their names are consistent. I also drop observations for the Stanley Park neighborhood, which unfortunately is missing in the geojson file.

We can now plot the reformatted data.

Choropleth Map of Cumulative Crime Incidents in Vancouver Neighborhoods. Image by Author

As we can see, each neighborhood in Vancouver is represented by a polygon whose color intensity is proportional to the number of crime incidents that occurred within it. Neighborhoods in the north of the city tend to have higher crime counts than those in the south, with the CBD (Central Business District) neighborhood having the highest crime counts over the entire sample, and the neighborhood with white shading having the lowest crime count.

The “dynamic” view

Let’s animate the graph above. For each timestamp, we want to visualize the distribution of cumulative incidents across the different neighborhoods in Vancouver.

We need to go through a couple of additional steps before we can work with the finalized Dataframe, which will contain the cumulative sum of all incidents per neighborhood at each timestamp in the data.

First 5 rows of the rolling counts Dataframe. Image by Author

Now here’s a slightly tricky part. If you look at the counts_rolling Dataframe we’ve just produced, you’ll notice that not all neighborhoods have values for each timestamp. That’s because some neighborhoods don’t have any recorded crimes on certain dates, therefore we need to fill in those “missing” values by using forward filling.

Preview of the finalized crime cumulative sums Dataframe. Image by Author

Here we go, we now have a complete Dataframe with values for every timestamp/neighborhood combination. We can finally plot the graph!

In order to use Plotly’s choropleth_mapbox function we must make sure the dataframe that contains the crime numbers and the geojson used to plot the map have the same identifier, so that the mapping function can properly associate each polygon with its crime counts. Here, I take advantage of the featureidkey argument to tell the mapping function that the polygon identifiers are in the “properties.name” location of the geojson file. These identifiers are the same neighborhood names contained in the crime dataframe.

Animated Choropleth map of crime counts in Vancouver. Gif by Author

Next steps

As we’ve just seen, making animated graphs with Plotly is a painless and quick affair. In the next tutorials, I will showcase how to further customize Plotly Express graphs. For instance, we might want the option of selecting a year in particular and visualizing how crime evolves over the months in that year in particular. Or we might want to have an option to filter the data by types of crime, and visualize its evolution over time. Luckily for us, Plotly makes it easy for us to build on top of Plotly Express graphs, adding layers of customization one step at a time. In later tutorials we’ll see how everything can be wrapped into a nice Dash app!

I’d like to give a huge shoutout to my coworkers on the Data Science and AI Elite team for inspiring me to write this blog post. In particular, thank you to Andre Violante and Rakshith Dasenahalli Lingaraju for their advice and suggestions, as well as Robert Uleman for his extremely thorough proofreading and code improvements!

--

--