The world’s leading publication for data science, AI, and ML professionals.

Making Interactive Visualizations with Python Altair

A comprehensive practical guide

Photo by Mika Baumeister on Unsplash
Photo by Mika Baumeister on Unsplash

Data visualization is a fundamental piece of Data Science. If used in exploratory data analysis, data visualizations are highly effective at unveiling the underlying structure within a dataset or discovering relationships among variables.

Another common use case of data visualizations is to deliver results or findings. They carry much more informative power than plain numbers. Thus, we often use data visualization in storytelling, a critical part of the data science pipeline.

We can enhance the capabilities of data visualizations by adding interactivity. The Altair library for Python is highly efficient at creating interactive visualizations.

In this article, we will go over the basic components of interactivity in Altair. We will also do examples to put these components into action. Let’s start by importing the libraries.

import numpy as np
import pandas as pd
import altair as alt

We also need a dataset for the examples. We will use a small sample from the Melbourne housing dataset available on Kaggle.

df = pd.read_csv("/content/melb_data.csv", usecols = ['Price','Landsize','Distance','Type', 'Regionname'])
df = df[(df.Price < 3_000_000) &amp; (df.Landsize < 1200)].sample(n=1000).reset_index(drop=True)
df.head()
(image by author)
(image by author)

I have only read a small part of the original dataset. The usecols parameter of the read_csv function allows for reading only the given columns of the csv file. I have also filtered out the outliers with regards to the price and land size. Finally, a random sample of 1000 observations (i.e. rows) is selected using the sample function.


Altair is a powerful library in terms of data transformations and creating interactive plots. There are three components of interactivity.

  • Selection: Captures interactions from the user. In other words, it selects a part of the visualization.
  • Condition: Changes or customizes the elements based on the selection. In order to see an action, we need to attach a selection to a condition.
  • Bind: It is a property of the selection and creates a two-way binding between a selection and input.

These concepts will be more clear as we go through the examples.


Let’s first create a static scatter plot and then we will add interactive features to it.

alt.Chart(df).mark_circle(size=50).encode(
   x='Price',
   y='Distance',
   color='Type'
).properties(
   height=350, width=500
)
(image by author)
(image by author)

Before starting on the interactive plots, it is better to briefly mention the basic structure of Altair syntax. We start by passing the data to a top-level Chart object. The data can be in the form of a Pandas dataframe or a URL string pointing to a json or csv file.

Then we describe the type of visualization (e.g. mark_circle, mark_line, and so on). The encode function specifies what to plot in the given dataframe. Thus, anything we write in the encode function must be linked to the dataframe. Finally, we specify certain properties of the plot using the properties function.

Some part of the plot seems too overlapped in terms of the dots. It would look better if we can also view data points that belong to a specific type.

We can achieve this in two steps. The first step is to add a selection with the type column and bind it to the legend.

selection = alt.selection_multi(fields=['Type'], bind='legend')

It is not enough just to add a selection. We should somehow update the plot based on the selection. For instance, we can adjust the opacity of the data points according to the selected category by using the condition property with the opacity parameter.

alt.Chart(df).mark_circle(size=50).encode(
   x='Price',
   y='Distance',
   color='Type',
   opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
).properties(
   height=350, width=500
).add_selection(
   selection
)
(GIF by author)
(GIF by author)

For the second example, we will create a scatter plot of the distance and land size columns and a histogram of the price column. The histogram will be updated based on the selected area on the scatter plot.

Since we want to select an area on the plot, we need to add a selection interval on the scatter plot.

selection = alt.selection_interval()

This selection will will be added as a selection property to the scatter plot. For the histogram, we will use the selection as a transform filter.

chart1 = alt.Chart(df).mark_circle(size=50).encode(
  x='Landsize',
  y='Distance',
  color='Type'
).properties(
  height=350, width=500
).add_selection(
  selection
)
chart2 = alt.Chart(df).mark_bar().encode(
  alt.X('Price:Q', bin=True), alt.Y('count()')
).transform_filter(
  selection
)

The chart1 and chart2 variables contain the scatter plot and the histogram, respectively. We can now combine and display them. Altair is quite flexible in terms of combining multiple plots or subplots. We can even use the logical operators.

chart1 | chart2
(imge by author)
(imge by author)

As we can see, the histogram is updated based on the selected data points on the scatter plot. Thus, we are able see the price distribution of the selected subset.

In order to better understand the concepts of the selection and condition, let’s switch the roles on the scatter plot and histogram. We will add the selection to the histogram and use it as a transform filter on the scatter plot.

selection = alt.selection_interval()
chart1 = alt.Chart(df).mark_circle(size=50).encode(
   x='Landsize',
   y='Distance',
   color='Type'
).properties(
   height=350, width=500
).transform_filter(
   selection
)
chart2 = alt.Chart(df).mark_bar().encode(
   alt.X('Price:Q', bin=True), alt.Y('count()')
).add_selection(
   selection
)
chart1 | chart2
(image by author)
(image by author)

You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.


Conclusion

The sky is the limit! We can create lots of different interactive plots. Altair is also quite flexible in terms of the ways to add interactive components to the visualization.

Once you have a comprehensive understanding of the elements of interactivity, you can enrich your visualizations. These elements are selection, condition, and bind.

As with any other subject, practice makes perfect. The syntax may look a little bit confusing at first. However, once you understand the logic and the connections between the elements we have mentioned, creating interactive plots will become fairly easy.

Thank you for reading. Please let me know if you have any feedback.


Related Articles