Become more productive in visualizing data in Python
"Beauty will save the world."
- F. Dostoevsky

Introduction
Have you ever gotten frustrated after looking at your visualization in Python? Have you ever thought that it can be done better with less effort and time? If so, this post is perfect for you because I would like to share about the Altair library, which will boost your productivity and make your visualisations more appealing.
I suppose you already know how visualization is vital for any analysis and how it helps convey and translate an idea to a wider audience. Also, visualizing data is one of the first steps to explore it and understand where to dig deeper. Therefore, I would like to focus on the basic grammar of Altair using a scatter plot and then share with you some examples of various graphs. Before that, let us talk about Altair and get to know why it is so powerful.
Why Altair?

Altair is a declarative statistical visualization library, which uses Vega and Vega-Lite grammars that help to describe the visual appearance and interactive behaviour of a visualization in a JSON format.
The key idea behind Altair that you are declaring links between data columns and visual encoding channels (e.g., x and y axes, colour, size, etc.) and the rest of the visualization process is handled by the library. Thus, it gives you more time to focus on data and analysis rather than explaining how to visualize data [1].
Altair’s components
- Data: DataFrame used for visualization
- Mark: How would you like the data to be visualized (line, bar, tick, point)?
- Encoding: How the data will be represented (positions for x and y, colour, size)?
- Transform: How would you like to transform the data before applying visualization (aggregate, fold, filter, etc.)?
- Scale: Function for inputting and rendering data on the screen
- Guide: Visual aids such as legend, ticks on the x and y axes.
As for the mark component, you can use the following basic mark properties:

Understanding Altair’s grammar with a scatter plot
Let us get our hands dirty and learn Altair’s grammar using a scatter plot.
Installation
$ pip install altair vega_datasets
The equivalent for conda is
$ conda install -c conda-forge altair vega_datasets
Data
I will be using the following Vega datasets:
- data.gapminder()
- data.stocks()
- data.movies()
Let’s import packages and look at the data
import pandas as pd
import altair as alt
from vega_datasets import data



Step 1: Simple scatter plot
Chart() is a fundamental object in Altair, which accepts a single argument – a DataFrame. Let us look at a simple scatter plot using Chart(), _markpoint() and encode() objects.
alt.Chart(df_gm_2005).mark_point().encode(
alt.X('life_expect'),
alt.Y('fertility'))

Step 2: Adding interactiveness
By adding interactive() object to a scatter plot we can make it interactive. Also, let us define the size of the bubbles with alt.Size() to add more information to the plot.
alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X('life_expect'),
alt.Y('fertility'),
alt.Size('pop')
).interactive()

Step 3: Adding colour
We can change the colour of the bubbles by adding alt.Color() in encode() object. It is great that we do not need to worry about each colour for each country because Altair does it for you.
alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X('life_expect'),
alt.Y('fertility'),
alt.Size('pop'),
alt.Color('country'),
alt.OpacityValue(0.7)
).interactive()

Step 4: Adding more information
We can add information to each dot by specifying Tooltip() in encode().
alt.Chart(df_gm_2005).mark_point(filled=True).encode(
alt.X('life_expect'),
alt.Y('fertility'),
alt.Size('pop'),
alt.Color('country'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('country'),
alt.Tooltip('fertility'),
alt.Tooltip('life_expect'),
alt.Tooltip('pop'),
alt.Tooltip('year')]
).interactive()

Step 5: Making plot dynamic
Already looks amazing for the 2005 year’s data. Let’s add a bar to change the data and make the plot dynamic.
select_year = alt.selection_single(
name='Select', fields=['year'], init={'year': 1955},
bind=alt.binding_range(min=1955, max=2005, step=5)
)
alt.Chart(df_gm).mark_point(filled=True).encode(
alt.X('life_expect'),
alt.Y('fertility'),
alt.Size('pop'),
alt.Color('country'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('country'),
alt.Tooltip('fertility'),
alt.Tooltip('life_expect'),
alt.Tooltip('pop'),
alt.Tooltip('year')]
).add_selection(select_year).transform_filter(select_year).interactive()

Step 6: Changing the size and adding a title
Lastly, let us change the size of the plot and add a title
select_year = alt.selection_single(
name='Select', fields=['year'], init={'year': 1955},
bind=alt.binding_range(min=1955, max=2005, step=5)
)
scatter_plot = alt.Chart(df_gm).mark_point(filled=True).encode(
alt.X('life_expect'),
alt.Y('fertility'),
alt.Size('pop'),
alt.Color('country'),
alt.OpacityValue(0.7),
tooltip = [alt.Tooltip('country'),
alt.Tooltip('fertility'),
alt.Tooltip('life_expect'),
alt.Tooltip('pop'),
alt.Tooltip('year')]
).properties(
width=500,
height=500,
title="Relationship between fertility and life expectancy for various countries by year"
).add_selection(select_year).transform_filter(select_year).interactive()
scatter_plot.configure_title(
fontSize=16,
font="Arial",
anchor="middle",
color="gray")

The final output looks great and we can derive various insights from such a sophisticated visualization.
Other useful plots with Altair
Now, knowing the basics of Altair’s grammar, let us look at some other plots.
Box plot
box_plot = alt.Chart(df_gm_2005).mark_boxplot(size=100, extent=0.5).encode(
y=alt.Y('life_expect', scale=alt.Scale(zero=False))
).properties(
width=400,
height=400,
title="Distribution of life expectancy for various countries in 2005 year"
).configure_axis(
labelFontSize=14,
titleFontSize=14
).configure_mark(
opacity=0.6,
color='darkmagenta'
)
box_plot.configure_title(
fontSize=16,
font="Arial",
anchor="middle",
color="gray")

Histogram
histogram = alt.Chart(df_gm_2005).mark_bar().encode(
alt.X("life_expect", bin=alt.Bin(extent=[0, 100], step=10)),
y="count()"
).properties(
width=400,
height=300,
title="Distribution of population for various countries in 2005 year"
).configure_axis(
labelFontSize=14,
titleFontSize=14
).configure_mark(
opacity=0.5,
color='royalblue'
)
histogram.configure_title(
fontSize=16,
font="Arial",
anchor="middle",
color="gray")

Bar chart
bar_chart = alt.Chart(df_gm_ir).mark_bar(color='seagreen',
opacity=0.6
).encode(
x='pop:Q',
y="year:O"
).properties(
width=400,
height=400,
title="Population of Ireland"
)
text = bar_chart.mark_text(
align='left',
baseline='middle',
dx=3
).encode(
text='pop:Q'
)
bar_chart + text

Line chart
line_chart = alt.Chart(df_stocks).mark_line().encode(
x='date',
y='price',
color='symbol'
).properties(
width=400,
height=300,
title="Daily closing stock prices"
)
line_chart.configure_title(
fontSize=16,
font="Arial",
anchor="middle",
color="gray")

Multiple scatter plots
mult_scatter_plots = alt.Chart(df_movies).mark_circle().encode(
alt.X(alt.repeat("column"), type='quantitative'),
alt.Y(alt.repeat("row"), type='quantitative'),
color='Major_Genre:N'
).properties(
width=150,
height=150
).repeat(
row=['US_Gross', 'Worldwide_Gross', 'IMDB_Rating'],
column=['US_Gross', 'Worldwide_Gross', 'IMDB_Rating']
).interactive()
mult_scatter_plots

Final thoughts
Altair is a great tool to boost your productivity in visualizing data, where you only need to specify links between data and visual encoding channels. This allows you to put your thoughts directly to a plot without worrying about the time consuming "how" part.
For more details please find
- My Jupyter notebook used for the blog post
- Official documentation with an example gallery of various plots.
Thanks for reading and please do comment below about your ideas on visualizing data with Altair. To see more posts from me, please subscribe to Medium and LinkedIn.
Reference
- Overview page¶. Overview – Altair 4.1.0 documentation. (n.d.). https://altair-viz.github.io/getting_started/overview.html.