The world’s leading publication for data science, AI, and ML professionals.

Holoviz Is Simplifying Data Visualization in Python

Python has so many data types; can we use one tool to plot them all?

Photo by Luke Chesser on Unsplash

Python has rapidly become the default language for working with data over the past ten years. With a wide range of capabilities and a vibrant open source community, Python is a powerful tool for everything from machine learning to web development.

While I often attribute much of Python’s success to the excellent open-source libraries available, the downside of relying on a variety of third-party tools can be a lack of consistency and continuity. We do often have built-in plotting tools, like Pandas’ native plotting API, but they can’t replicate the interactive plots of more modern tools like Bokeh and Plotly. The bottom line is that we spend a lot of time thinking about data format when it comes to plotting, and it would be nice to have a simple way to produce interactive plots across a variety of data types.

This is the problem HoloViz aims to solve by providing high-level tools that make it easier to apply plotting libraries to data. HoloViz has a set of libraries that it maintains, but in this post, we are going to be focusing on hvPlot. hvPlot essentially adds a custom plotting API that can be accessed directly from our data with the .hvplot() method.

HoloViz makes some other cool tools we will explore in future posts including Panel which is a tool for making apps and dashboards for your plots. We will explore creating dashboard apps in a future post, but for now, let’s dive into hvPlot to see how easily we can create interactive plots with a variety of data types.

hvPlot

hvPlot provides a high-level plotting API that can be used by many of the most popular Python data types including:

  • Pandas: DataFrame, Series
  • XArray: Dataset, DataArray
  • Streamz: DataFrame(s), Series(s)
  • GeoPandas: GeoDataFrame
  • NetworkX: Graph
  • Dask: DataFrame, Series
  • Rapids cuDF: GPU DataFrame, Series
  • Intake: DataSource

We can see that hvPlot works by adding a new high-level plotting API to some of Python’s most popular libraries in order to provide a consistent and powerful plotting tool for all of the data you may work with. I was impressed that they support all of these libraries, as they account for a majority of the data I personally work with in Python.

I was also really excited to see that they allow us to choose between Bokeh, Plotly, and Matplotlib as the ultimate plotting backend. I am generally a big proponent of Plotly, however, I will admit that Bokeh has improved a lot since I last used it. Because of this, I will be using Bokeh as the hvPlot backend for this post.

Pandas

Pandas is without a doubt one of the most used and important Python libraries. Being able to easily load data into tables is what allows Python to compete with other popular data tools like R and Excel.

While Pandas is an excellent library, one area where it lacks is plotting. In my last post, I discussed some of the best plotting Python libraries, however, Pandas’ built-in plotting tool was not one of them. In Pandas’ defense, it isn’t a plotting library, and its built-in tool is meant to give users a quick and dirty view of a DataFrame. Let’s take a look at a plot of that data made using Pandas.

Created By Author
Created By Author

Pandas uses Matplotlib under the hood, which means we could style this further using the axis object. But why would we spend time styling this plot, when we could use hvPlot to create an interactive and appealing plot with minimal effort?

_Note: Medium doesn’t support embedding HTML directly, so I have published an online Jupyter notebook where you can interact with the plots._

Created By Author
Created By Author

Being able to produce a much nicer-looking plot with the same amount of code can be a major time saver for someone who generates a lot of quick visualizations when they work with data. This also allows anyone to access interactive plotting at basically any skill level, which will only continue to help Python serve a wide range of use cases.

More Complex Pandas Plots

Let’s take a look at a more complex plot that we can make using hvPlot. For this, I have imported the Seaborn penguin sample dataset that contains information about three different penguin species.

Created By Author
Created By Author

We can see that we can layer plots on top of each other and even produce tables of our data in the same window. Being able to produce these kinds of plots with such little effort is a major time saver.

Geographic Data

Geographic plots are by far my favorite kinds of data visualizations, however, they can be quite a pain to produce in Python. Much of this friction comes from dependency issues that can waste a massive amount of valuable time. Unfortunately, my experience with using Holoview to plot geographic data was not as smooth as I hoped, but I did find a reliable way to get it set up.

Before we get into dependency issues, let’s take a look at a visualization of air temperature data from xarray’s example data. We will use matplotlib to create the plot and cartopy to handle the projection.

Created By Author
Created By Author

It’s honestly not bad! However, I’ve always felt like there is a lot of benefit to interactive plots when it comes to spatial data. Let’s remake this plot using hvPlot instead.

Created By Author
Created By Author

Now, this is a great-looking plot. It’s nice that we can provide a time slice of data and automatically get a slider to scrub through the different days. Out of all of the spatial plotting I’ve done in Python, this is probably the fastest tool I’ve come across for plotting data projected on Earth with a proper coastline.

Note: If you run into issues replicating this plot, I go through steps for avoiding dependency issues at the end of the post.

Stream Data

Stream data is particularly hard to visualize in Python using traditional tools. It’s possible, but not as easy as I would like. Fortunately, hvPlot handles this really well. We can use the streams library to create a random data stream and then plot that stream with hvPlot. Keep in mind that we aren’t working with a pure pandas DataFrame anymore, so we have to import the proper hvPlot API for streamz.

Created By Author
Created By Author
Created By Author
Created By Author

Graph/Network Data

Network data is another format that isn’t well suited to the static plots of native Python libraries. For this example, let’s use a network where the nodes are Star Wars characters and the edges represent a shared scene.

Created By Author
Created By Author

If you go to the interactive notebook, you’ll notice how when you hover over a node it highlights all of its edges. Little things like that are what make interactive plots so effective as visualization tools, and it’s fantastic how easy hvPlot has made creating interactive plots.

What are the Downsides?

While I really enjoy the kinds of plots that hvPlot enables us to make with such little effort, there are some minor downsides to using a higher-level API to replace the built-in plotting methods of these libraries. The first being that the more levels of abstraction you add, the harder it can be to trace errors when they occur.

The most annoying problem that I ran into were dependency issues that I spent quite a lot of time-solving. The Geoviews package that helps facilitate the creation of geographic plots with hvPlot is very picky with dependencies in Conda. It isn’t alone here though, geographic libraries have long had dependency issues; anyone who has had to work with GDAL as a dependency has probably run into an issue with their environment at one time or another.

Luckily, I found a few steps to get it to work reliably for me on two machines.

  1. Create a fresh conda environment
  2. Install only pyviz, geoviews, and hvplot using conda-forge
  3. Update nbconvert (To avoid invisible plots with no error message)

I know it’s annoying to create a separate environment just to try hvPlot with geographic data, however, feel free to try installing your other needed libraries from here since there is a good chance everything will continue working.

Wrapping Up

Overall, I was impressed with hvPlot, and I will likely use it for some of my work going forward. For most of the data types, it really did feel like a strict upgrade from the quick plots I would make with matplotlib while exploring a dataset. This was especially true for the Pandas’ plotting API; I really like how easy it is to create interactive plots directly from DataFrames.

In the majority of my tests, it didn’t feel like I had to spend time fiddling with hvPlot, and for the most part, it just worked. I will say that there are fewer examples on the internet to look at if you do run into issues, however, that will likely continue to improve with how well the library works today.

Resources

Note: If you are enjoying reading my and others’ content here on Medium, consider subscribing using the link below to support the creation of content like this and unlock unlimited stories!

Join Medium with my referral link – Will Norris

Citations

  • Star Wars Data – Open License, Fan Created

Gabasova, E. (2016). Star Wars social network. DOI: https://doi.org/10.5281/zenodo.1411479

National Centers for Environmental Prediction/National Weather Service/NOAA/U.S. Department of Commerce. 1994, updated monthly. NCEP/NCAR Global Reanalysis Products, 1948-continuing. Research Data Archive at NOAA/PSL: https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html.

Allison Marie Horst. (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. DOI: https://doi.org/10.5281/zenodo.3960218


Related Articles