The world’s leading publication for data science, AI, and ML professionals.

3 Visualization Layers for Information-Rich Charts with Altair and Python

How to Improve your Storytelling by Adding Useful Information on Your Chart

Layer Overview (Image by the Author)
Layer Overview (Image by the Author)

Tips and Tricks, TUTORIAL – PYTHON – ALTAIR

1. Introduction

Creating an initial chart from your data is easy, regardless if you use R or Python. Creating a pleasing and informative chart that is reproducible and written in clear and maintainable code involves more effort.

In this article, I will share my workflow and structure to achieve just that. I will show you how to structure your code and what building blocks I rely on. In the following section, I explain what data and visualization package I am using for this article. In section 3, I will outline the chart layering structure, while in section 4, I will provide three examples of how you may apply the layer structure. Lastly, in section 5, I conclude this article.

Example Storyline based on Gapminder Data (Image by the Author)
Example Storyline based on Gapminder Data (Image by the Author)

2. Setup

To showcase the functions, I will make use of the Gapminder data set. The Gapminder data set contains data about life expectancy, GDP per capita, and population for the country, spanning many decades. Furthermore, I use Jupyter Notebooks in combination with Pandas.

If you wonder how I structure my code, it is called method chaining or piping (in the R community). I explain the basics in one of my other articles.

As you might know, I started my Data Science career using R, and hence, I am very familiar with ggpot2. As I started using Python more and more often in my customer projects, I began to look for a similar visualization package in Python. I am happy to say that with Plotnine there is an excellent alternative.

However, since I like to try new things in general and visualization packages specifically, I wanted to try out the Altair package. Altair is a Python data visualization package that follows a declarative approach, just like ggplot2 and Plotnine.

Gapminder Dataset (Image by the Author)
Gapminder Dataset (Image by the Author)

3. Visualization Layers and Workflow

Before introducing the three Visualization layers to you, I quickly mention the data frames are essential for this setup.

Visualization Layers and Interdependencies (Image by the Author)
Visualization Layers and Interdependencies (Image by the Author)

Original Data Frame is the data frame you would like to visualize. In the running example in section 4, I use the Gapminder data set. From the original data frame, I generate the Summarized Data Frame. I refer to it as "summarized" as I often calculate summaries from the original data, i.e., averages, quartiles, and counts. Please see the figure below as an example.

The Chart Layer refers to the visualization you want to prepare, e.g., a histogram, a boxplot, or a boxplot. In the figure below, it is a histogram. The Graphical Layer holds any elements to add more information to the Chart Layer. For example, statistical summary information such as quartiles or averages displayed as horizontal or vertical lines. I use the Textual Layer to display text, such as interpretations, highlights, and statistical information.

4. Applying the Visualization Layers (Running Examples)

4.1 Histogram of Life Expectancy

For the Original Data Frame, I use the Gapminder Data, but I will only use data from the year 2007.

For the Summarized Data Frame, I summarize the data by the quantiles 0.25, 0.5 (Median), 0.75, and the mean (average).

Gapminder Dataset Summarized (Image by the Author)
Gapminder Dataset Summarized (Image by the Author)

To create the Chart Layer, I use the Altair mark_bar() to create a histogram. Please note that I also set a title, subtitle, and width & height of the chart.

Chart Layer Example (Image by the Author)
Chart Layer Example (Image by the Author)

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_rule() to create vertical lines based on the statistical summaries.

Graphical Layer Example (Image by the Author)
Graphical Layer Example (Image by the Author)

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the elements from the Graphical Layer.

Textual Layer Example (Image by the Author)
Textual Layer Example (Image by the Author)

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

All Layer Example (Image by the Author)
All Layer Example (Image by the Author)

4.2 Boxplot of Life Expectancy by Continent

There is no change for the Original Data Frame. For the Summarized Data Frame, I summarize the data by grouping by continent and using the describe() function, then selecting only the lifeExp variable, and finally, creating additional variables with rounded values and text labels.

Gapminder Dataset Summarized (Image by the Author)
Gapminder Dataset Summarized (Image by the Author)

To create the Chart Layer, I use the Altair mark_boxplot() to create a boxplot. Please note that I also set a title, subtitle, and width & height of the chart. Please note that the alt.EncodingSortfield() did not work for me.

Chart Layer Example (Image by the Author)
Chart Layer Example (Image by the Author)

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_tick() to create vertical lines to display the mean for each continent.

Graphical Layer Example (Image by the Author)
Graphical Layer Example (Image by the Author)

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the count of countries by continent and the mean value from the Graphical Layer.

Textual Layer Example (Image by the Author)
Textual Layer Example (Image by the Author)

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

All Layer Example (Image by the Author)
All Layer Example (Image by the Author)

4.3 Scatter/ Pointplot of Life Expectancy for each African Country

For this visualization I enhance the Original Data Frame. First, I filter for African countries and then I create a variable named Quartile Range that describes if the life expectancy is below-equal the first quartile, above-equal the third quartile, or part of the inter-quartile range. Finally, I create a sort vector. The vector lists all African countries arranged by life expectancy.

For the Summarized Data Frame, I summarize the data by using the describe() function, then selecting only the lifeExp variable, and finally, creating additional variables with rounded values and text labels.

Summarized Gapminder Data (Image by the Author)
Summarized Gapminder Data (Image by the Author)

To create the Chart Layer, I use the Altair mark_point() to create a scatter plot. Please note that I also set a title, subtitle, and width & height of the chart.

Chart Layer Example (Image by the Author)
Chart Layer Example (Image by the Author)

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_rule() to create vertical lines based on the statistical summaries.

Graphical Layer Example (Image by the Author)
Graphical Layer Example (Image by the Author)

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the elements from the Graphical Layer.

Textual Layer Example (Image by the Author)
Textual Layer Example (Image by the Author)

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

All Layer Example (Image by the Author)
All Layer Example (Image by the Author)

5. Conclusion

Creating and maintaining reproducible data visualizations is not a simple task. In this article, I shared my workflow and structure to achieve just that. I showed you how to structure your code and what building blocks I rely on. In the previous sections, I explained what data and visualization package I used. I outlined the chart layering structure and provided three examples of how you may apply the layer structure.

Please feel free to contact me with any questions, comments, and feedback. Thank you.


Related Articles