
Altair is a statistical Data Visualization library for Python. It provides a simple and easy-to-understand syntax for creating both static and interactive visualizations.
What I like most about Altair is the data transformation and filtering features. It provides flexible and versatile methods to transform and filter data while creating a data visualization.
In that sense, Altair can also be considered as a data analysis tool as well. We will go over 3 examples that demonstrate how Altair expedite the exploratory data analysis process.
We will use a small sample from the Melbourne housing dataset available on Kaggle for the examples. We first import the libraries and read the dataset.
import numpy as np
import pandas as pd
import altair as alt
cols = ['Type','Price','Distance','Date','Landsize','Regionname']
melb = pd.read_csv(
"/content/melb_data.csv", usecols = cols, parse_dates = ['Date']
).sample(n=1000).reset_index(drop=True)
melb.head()

Example 1
We will create a bar plot that shows the average house price in each region. One option is to calculate the average values by using the functions of Pandas and then plot the results.
However, we can do all at once with Altair.
alt.Chart(melb).mark_bar().encode(
x = 'Regionname', y = 'avg_price:Q'
).transform_aggregate(
avg_price = 'mean(Price)', groupby = ['Regionname']
).properties(
height = 300, width = 500
)
The syntax starts with a top-level Chart object followed by the plot type. The encode function is used to specify what to plot in the dataframe passed to the Chart object.
As you may have noticed, the y encoding is not a column in the dataframe. It is aggregated column which is calculated in the next step using the transform_aggregate function. The "Q" letter in the y encoding stands for quantitative.
The properties function is used to adjust the properties of the visualization. Here is the plot generated by the code above.

Example 2
The distance column indicates the distance of the house to the central business district. Let’s say we want to create the plot in the previous examples for houses that have a distance of more than 3 miles.
We can easily accomplish this task by implementing the transform_filter function in our code.
alt.Chart(
melb, height=300, width=500
).mark_bar().encode(
x = 'Regionname', y = 'avg_price:Q'
).transform_filter(
alt.FieldGTPredicate(field='Distance', gt=3)
).transform_aggregate(
avg_price = 'mean(Price)',groupby = ['Regionname']
)
The FieldGTPredicate handles "greater than" conditions. Altair also provides predicates for other conditions such as "equal", "less than", "range", and so on.
In the previous example, we used the properties function to adjust the size. In this example, the same operation is done inside the chart object.
Here is the bar plot of the filtered values.

Example 3
This example involves a lookup operation which is similar to the merge function of Pandas.
Consider we have another dataframe that contains some information about the owner of these houses.
melb['OwnerId'] = np.arange(1,1001)
df = pd.DataFrame({
'OwnerId': melb['OwnerId'],
'Age': np.random.randint(20, 40, size=1000),
'Salary': np.random.randint(5000, 10000, size=1000)
})
df.head()

We have added an id column in the original dataframe and created a new one that contains id, age, and salary information of the customers.
We want to plot the average salary of owners for each house type. We can use Pandas functions to merge the dataframes and group the data points (i.e. rows) by house type and calculate the average values.
Another option is to use the lookup transformation of Altair as follows:
alt.Chart(
df, height=300, width=500
).mark_bar().encode(
x = 'mean(Salary):Q', y = 'Type:O'
).transform_lookup(
lookup='OwnerId',
from_=alt.LookupData(data=melb, key='OwnerId', fields=['Type'])
)
The lookup parameter refers to the column to be used for merging. Here is the generated plot:

Conclusion
We have done 3 examples that demonstrate the transformation and filtering capabilities of Altair. With regards to these operations, Altair serves as a data analysis and manipulation tool as well.
Pandas, of course, much more powerful for such operations. However, being able to perform basic data wrangling operations while creating a visualization adds significant value to Altair.
We have done only bar plot examples but transformation features can be extended to any other plot appropriately.
Thank you for reading. Please let me know if you have any feedback.