Altair is a statistical visualization library for Python. Its syntax is clean and easy to understand as we will see in the examples. It is also very simple to create interactive visualizations with Altair.
In the previous article, we created some basic plots to get familiar with the syntax and structure. It was kind of an introductory guide for Altair.
Note: Here is a list of the articles in Altair series.
- Part 1: Introduction
- Part 2: Filtering and transforming data (This article)
- Part 3: Interactive plots and dynamic filtering
- Part 4: Customizing visualizations
Altair is highly flexible in terms of data transformations. We can apply many different kinds of transformations while creating a visualization. In this article, we will focus more on the filtering and data transformation in the visualizations. There will also be an example that shows how to use dynamic filtering by combining multiple plots.
We start by importing Altair. If you are using Google Colab, it is already installed and can be imported directly. If not, you can easily install it using pip.
import altair as alt
We will do the examples based on a marketing dataset which is available on Kaggle.
marketing = pd.read_csv("/content/DirectMarketing.csv")
marketing.head()

The dataset contains information about customers and the amount of money they spent during a marketing campaign.
We can create a basic scatter plot to check the relationship between the salary and spent amount.
(alt.
Chart(marketing).
mark_circle(size=50).
encode(x='Salary', y='AmountSpent', color='Age'))

We pass the data to the top-level Chart object. In the next line, the type of plot is defined. We specify what to plot in the encode function. The columns to be used in x and y axes as well as the separating color column are specified in this function.
It is possible to filter data while creating a visualization. For instance, we can only plot the data points for which the salary is less than 120000.
(alt.
Chart(marketing).
mark_circle(size=50).
encode(x='Salary', y='AmountSpent', color='Age').
transform_filter(alt.FieldLTPredicate(field='Salary', lt=120000)).
properties(height=400, width=500))

We pass the filtering condition to the transform_filter function. The FieldLTPredicate indicates a less-than condition based on a given column.
We also add the properties function to adjust the size of the plot.
The same filtering operation can also be done by using the datum module of Altair. It is simpler in terms of the syntax. The following code will create the same plot as above.
from altair import datum
(alt.
Chart(marketing).
mark_circle(size=50).
encode(x='Salary', y='AmountSpent', color='Age').
transform_filter(datum.Salary < 120000).
properties(height=400, width=500))
It is similar to Pandas style for filtering dataframes.
We can also specify a condition for filtering based on a categorical column. For instance, the data points that belong to a set of discrete values can be filtered using the FieldOneOfPredicate method.
(alt.
Chart(marketing).
mark_circle(size=50).
encode(x='Salary', y='AmountSpent', color='Age').
transform_filter(alt.FieldOneOfPredicate(field='Children',
oneOf= [0,2,3])).
properties(height=400, width=500))
This code will only plot the data points for which the children column takes a value from the given list (0, 2, or 3).

One cool feature of Altair is that we can create plots that allow for dynamic filtering. It works as follows.
- We create two plots and concatenate them vertically or horizontally.
- One plot is used to select a filtering condition.
- Based on the selected filtering condition, the data points on the other plots are updated.
The plot that is used to select the filtering condition also bears some informative power. It is not a button or slider just to select a condition. It will be more clear when we do an example.
We will create two plots. One will the a scatter plot that consists of the salary and amount spent columns. The other one will be a bar plot that shows the average salary for the categories in the age column. The second plot will also be used as a filter for the first plot. Thus, we will be able to select an age group and the scatter plot will be updated to show the data points that belong to the selected age group.
selection = alt.selection_multi(fields=['Age'])
first = (alt.
Chart().
mark_circle(size=50).
encode(x='Salary', y='AmountSpent').
transform_filter(selection).
properties(height=300, width=500))
second = (alt.
Chart().
mark_bar().
encode(
x='Age:O',y='mean(Salary):Q',
color=alt.condition(selection, alt.value('steelblue'),
alt.value('lightgray'))
).
properties(height=300, width=300).
add_selection(selection))
alt.hconcat(first, second, data=marketing)
We use the selection function to define a selection predicate which will be passed to the plots in the following steps.
The first plot is a scatter plot similar to the ones in the previous examples. The only difference is the filtering condition in the transform_filter function which is the selection predicate we have just created.
The second plot is bar plot that shows the average salary for each group in the age column. We have calculated the averages by applying the following transformation in the encode function.
y='mean(Salary):Q'
Since the bar plot will be used to select conditions, we add the selection predicate to this plot by using the add_selection function.
The following figures illustrate how scatter plot is updated based on the selected age group. It is an interactive plot so we just click on the bars to select an age group.



Conclusion
The first article was an introduction to Altair. In this one, we have focused more on filtering methods and data transformations that can be done in the visualizations.
There is still much more this library offers. I will be writing more tutorials about Altair. Stay tuned for more advanced features of this library.
Thank you for reading. Please let me know if you have any feedback.