The world’s leading publication for data science, AI, and ML professionals.

Create Stunning Visualizations with Altair

Become more productive in visualizing data in Python

"Beauty will save the world."

  • F. Dostoevsky
Photo by corina ardeleanu on Unsplash
Photo by corina ardeleanu on Unsplash

Introduction

Have you ever gotten frustrated after looking at your visualization in Python? Have you ever thought that it can be done better with less effort and time? If so, this post is perfect for you because I would like to share about the Altair library, which will boost your productivity and make your visualisations more appealing.

I suppose you already know how visualization is vital for any analysis and how it helps convey and translate an idea to a wider audience. Also, visualizing data is one of the first steps to explore it and understand where to dig deeper. Therefore, I would like to focus on the basic grammar of Altair using a scatter plot and then share with you some examples of various graphs. Before that, let us talk about Altair and get to know why it is so powerful.

Why Altair?

Example gallery of Altair (sourced by author)
Example gallery of Altair (sourced by author)

Altair is a declarative statistical visualization library, which uses Vega and Vega-Lite grammars that help to describe the visual appearance and interactive behaviour of a visualization in a JSON format.

The key idea behind Altair that you are declaring links between data columns and visual encoding channels (e.g., x and y axes, colour, size, etc.) and the rest of the visualization process is handled by the library. Thus, it gives you more time to focus on data and analysis rather than explaining how to visualize data [1].

Altair’s components

  1. Data: DataFrame used for visualization
  2. Mark: How would you like the data to be visualized (line, bar, tick, point)?
  3. Encoding: How the data will be represented (positions for x and y, colour, size)?
  4. Transform: How would you like to transform the data before applying visualization (aggregate, fold, filter, etc.)?
  5. Scale: Function for inputting and rendering data on the screen
  6. Guide: Visual aids such as legend, ticks on the x and y axes.

As for the mark component, you can use the following basic mark properties:

Screenshot of Altair's documentation
Screenshot of Altair’s documentation

Understanding Altair’s grammar with a scatter plot

Let us get our hands dirty and learn Altair’s grammar using a scatter plot.

Installation

$ pip install altair vega_datasets

The equivalent for conda is

$ conda install -c conda-forge altair vega_datasets

Data

I will be using the following Vega datasets:

  1. data.gapminder()
  2. data.stocks()
  3. data.movies()

Let’s import packages and look at the data

import pandas as pd
import altair as alt
from vega_datasets import data
Gapminder dataset
Gapminder dataset
Stocks dataset
Stocks dataset
Movies dataset
Movies dataset

Step 1: Simple scatter plot

Chart() is a fundamental object in Altair, which accepts a single argument – a DataFrame. Let us look at a simple scatter plot using Chart(), _markpoint() and encode() objects.

alt.Chart(df_gm_2005).mark_point().encode(
 alt.X('life_expect'),
 alt.Y('fertility'))
Simple scatter plot
Simple scatter plot

Step 2: Adding interactiveness

By adding interactive() object to a scatter plot we can make it interactive. Also, let us define the size of the bubbles with alt.Size() to add more information to the plot.

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
 alt.X('life_expect'),
 alt.Y('fertility'),
 alt.Size('pop')
).interactive()
Simple interactive scatter plot
Simple interactive scatter plot

Step 3: Adding colour

We can change the colour of the bubbles by adding alt.Color() in encode() object. It is great that we do not need to worry about each colour for each country because Altair does it for you.

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
 alt.X('life_expect'),
 alt.Y('fertility'),
 alt.Size('pop'),
 alt.Color('country'),
 alt.OpacityValue(0.7)
).interactive()
Interactive scatter plot with colourful bubbles
Interactive scatter plot with colourful bubbles

Step 4: Adding more information

We can add information to each dot by specifying Tooltip() in encode().

alt.Chart(df_gm_2005).mark_point(filled=True).encode(
 alt.X('life_expect'),
 alt.Y('fertility'),
 alt.Size('pop'),
 alt.Color('country'),
 alt.OpacityValue(0.7),
 tooltip = [alt.Tooltip('country'),
 alt.Tooltip('fertility'),
 alt.Tooltip('life_expect'),
 alt.Tooltip('pop'),
 alt.Tooltip('year')]
).interactive()
Information for each country is displayed now
Information for each country is displayed now

Step 5: Making plot dynamic

Already looks amazing for the 2005 year’s data. Let’s add a bar to change the data and make the plot dynamic.

select_year = alt.selection_single(
 name='Select', fields=['year'], init={'year': 1955},
 bind=alt.binding_range(min=1955, max=2005, step=5)
)
alt.Chart(df_gm).mark_point(filled=True).encode(
 alt.X('life_expect'),
 alt.Y('fertility'),
 alt.Size('pop'),
 alt.Color('country'),
 alt.OpacityValue(0.7),
 tooltip = [alt.Tooltip('country'),
 alt.Tooltip('fertility'),
 alt.Tooltip('life_expect'),
 alt.Tooltip('pop'),
 alt.Tooltip('year')]
).add_selection(select_year).transform_filter(select_year).interactive()
Dynamic visualization with selecting a year
Dynamic visualization with selecting a year

Step 6: Changing the size and adding a title

Lastly, let us change the size of the plot and add a title

select_year = alt.selection_single(
 name='Select', fields=['year'], init={'year': 1955},
 bind=alt.binding_range(min=1955, max=2005, step=5)
)
scatter_plot = alt.Chart(df_gm).mark_point(filled=True).encode(
 alt.X('life_expect'),
 alt.Y('fertility'),
 alt.Size('pop'),
 alt.Color('country'),
 alt.OpacityValue(0.7),
 tooltip = [alt.Tooltip('country'),
 alt.Tooltip('fertility'),
 alt.Tooltip('life_expect'),
 alt.Tooltip('pop'),
 alt.Tooltip('year')]
).properties(
 width=500,
 height=500,
 title="Relationship between fertility and life expectancy for various countries by year"
).add_selection(select_year).transform_filter(select_year).interactive()
scatter_plot.configure_title(
 fontSize=16,
 font="Arial",
 anchor="middle",
 color="gray")
The final result for the scatter plot
The final result for the scatter plot

The final output looks great and we can derive various insights from such a sophisticated visualization.

Other useful plots with Altair

Now, knowing the basics of Altair’s grammar, let us look at some other plots.

Box plot

box_plot = alt.Chart(df_gm_2005).mark_boxplot(size=100, extent=0.5).encode(
 y=alt.Y('life_expect', scale=alt.Scale(zero=False))
).properties(
 width=400,
 height=400,
 title="Distribution of life expectancy for various countries in 2005 year"
).configure_axis(
 labelFontSize=14,
 titleFontSize=14
).configure_mark(
 opacity=0.6,
 color='darkmagenta'
)
box_plot.configure_title(
 fontSize=16,
 font="Arial",
 anchor="middle",
 color="gray")
Box plot
Box plot

Histogram

histogram = alt.Chart(df_gm_2005).mark_bar().encode(
 alt.X("life_expect", bin=alt.Bin(extent=[0, 100], step=10)),
 y="count()"
).properties(
 width=400,
 height=300,
 title="Distribution of population for various countries in 2005 year"
).configure_axis(
 labelFontSize=14,
 titleFontSize=14
).configure_mark(
 opacity=0.5,
 color='royalblue'
)
histogram.configure_title(
 fontSize=16,
 font="Arial",
 anchor="middle",
 color="gray")
Histogram
Histogram

Bar chart

bar_chart = alt.Chart(df_gm_ir).mark_bar(color='seagreen',
 opacity=0.6
).encode(
 x='pop:Q',
 y="year:O"
).properties(
 width=400,
 height=400,
 title="Population of Ireland"
)
text = bar_chart.mark_text(
 align='left',
 baseline='middle',
 dx=3
).encode(
 text='pop:Q'
)
bar_chart + text
Bar chart
Bar chart

Line chart

line_chart = alt.Chart(df_stocks).mark_line().encode(
 x='date',
 y='price',
 color='symbol'
).properties(
 width=400,
 height=300,
 title="Daily closing stock prices"
)
line_chart.configure_title(
 fontSize=16,
 font="Arial",
 anchor="middle",
 color="gray")
Line chart
Line chart

Multiple scatter plots

mult_scatter_plots = alt.Chart(df_movies).mark_circle().encode(
 alt.X(alt.repeat("column"), type='quantitative'),
 alt.Y(alt.repeat("row"), type='quantitative'),
 color='Major_Genre:N'
).properties(
 width=150,
 height=150
).repeat(
 row=['US_Gross', 'Worldwide_Gross', 'IMDB_Rating'],
 column=['US_Gross', 'Worldwide_Gross', 'IMDB_Rating']
).interactive()
mult_scatter_plots
Multiple scatter plots
Multiple scatter plots

Final thoughts

Altair is a great tool to boost your productivity in visualizing data, where you only need to specify links between data and visual encoding channels. This allows you to put your thoughts directly to a plot without worrying about the time consuming "how" part.

For more details please find

Thanks for reading and please do comment below about your ideas on visualizing data with Altair. To see more posts from me, please subscribe to Medium and LinkedIn.

Reference

  1. Overview page¶. Overview – Altair 4.1.0 documentation. (n.d.). https://altair-viz.github.io/getting_started/overview.html.

Related Articles