Altair is a statistical visualization library for Python. Its syntax is clean and easy to understand as we will see in the examples. It is also very simple to create interactive visualizations with Altair.
Altair is highly flexible in terms of data transformations. We can apply many different kinds of transformations while creating a visualization. It makes the library even more efficient for exploratory data analysis.
What I think makes Altair special is the heavier statistical side than some other popular Python Data Visualization libraries such as Matplotlib and Seaborn.
In this article, we will create some basic plots to get familiar with the syntax and structure of Altair. We will also see how data transformations are implemented in the process of creating the plots.
Note: Here is a list of the articles in Altair series.
- Part 1: Introduction (This article)
- Part 2: Filtering and transforming data
- Part 3: Interactive plots and dynamic filtering
- Part 4: Customizing visualizations
We start by importing Altair. If you are using Google Colab, it is already installed and can be imported directly. If not, you can easily install it using pip.
import altair as alt
We will be using an insurance dataset that you can obtain from Kaggle. We will read the dataset into a Pandas dataframe.
import numpy as np
import pandas as pd
insurance = pd.read_csv("/content/insurance.csv")
insurance.head()

The dataset contains some measures (i.e. features) about the customers of an insurance company and the amount that is charged for the insurance.
Scatter plot
A scatter plot is mainly used to visualize the relationship between two numerical variables.
(alt.
Chart(insurance).
mark_circle(size=40).
encode(x='charges', y='bmi').
properties(height=400, width=500))
I put each step in a separate line to emphasize the chain-like operations. We start by passing the data to a top-level Chart object. The data can be in the form of a Pandas dataframe or a URL string pointing to a json or csv file.
The second line describes the type of visualization (e.g. mark_circle, mark_line, and so on). The encode function specifies what to plot in the given dataframe. Thus, anything we write in the encode function must be linked to the dataframe. Finally, we specify certain properties of the plot using the properties function.
Here is the plot created with the code above.

We can make the plots more informative. For instance, we can use the color parameter in the encode function to separate data points based on a categorical variable. It is similar to the hue parameter of Seaborn.
We can also make the plot interactive just by adding the interactive function at the end.
(alt.
Chart(insurance).
mark_circle(size=50).
encode(x='charges', y='bmi', color='smoker').
properties(height=400, width=500).
interactive())

It is possible to add more functionality to this plot. We can use the tooltip parameter to display additional variables when we hover on points. It is like the hover parameter of Seaborn.
(alt.
Chart(insurance).
mark_circle(size=50).
encode(x='charges', y='bmi', color='smoker', tooltip=
['age','sex']).
properties(height=400, width=500).
interactive())

Bar plot
Altair makes it simple and efficient to implement data transformations in the process of creating visualizations. For instance, we can create a bar plot that shows the average charges of each category in the region column.
(alt.
Chart(insurance).
mark_bar().
encode(x='region', y='mean(charges):Q').
properties(height=300, width=400))

We specify the transformation as a string (‘mean(charges):Q’) which is equivalent to the following syntax:
y=alt.X(field='charges', aggregate='mean', type='quantitative')
Let’s calculate the same averages using the groupby function of Pandas to confirm the results.
insurance[['region','charges']].groupby('region').mean()

The results are the same as expected. We implemented this calculation in the visualization.
Histogram
Histograms are mainly used to visualize the distribution of continuous variables. It divides the value range of continuous variables into discrete bins and shows how many values exist in each bin.
The following code will create a histogram of the bmi variable.
(alt.
Chart(insurance).
mark_bar().
encode(alt.X('bmi:Q', bin=True), y='count()').
properties(height=300, width=500))

We use the make_bar function with a data transformation step to create a histogram. Inside the encode function, we divide the value range of bmi variable into discrete bins and count the number of data points in each bin.
Grid of plots
It is extremely simple to create multiple plots in the same visualization.
We first need to assign the plots to variables which will then be used to combine plots or create a grid of plots.
p1 = (alt.
Chart(insurance).
mark_bar().
encode(x='region', y='mean(charges):Q').
properties(height=200, width=300))
p2 = (alt.
Chart(insurance).
mark_bar().
encode(alt.X('bmi:Q', bin=True), y='count()').
properties(height=200, width=300))
Once we have the variables, we can use the logical operators to combine them.
p1 | p2

p1 & p2

As you can see, it is just like a math operation to combine plots. The p1 + p2 syntax will combine the plots in the same figure but it is not appropriate in our case. If we had a line plot and a bar plot, it would be an option to consider.
We can create a grid of plots by combining several plots in this way. For instance, (p1 & p2) | (p3 & p4) creates a grid of 4 plots (2 rows and 2 columns).
Conclusion
This article can be considered as an introduction to Altair. There is much more this library offers.
What I like most about Altair is the ease and simplicity of data transformations. It facilitates the data analysis process as well.
In the second part of Altair series, I focus on how filtering and data transformations are used in the visualizations.
I will be writing more tutorials about Altair. Stay tuned for more advanced features of this library.
Thank you for reading. Please let me know if you have any feedback.