
Tips and Tricks, TUTORIAL – PYTHON – ALTAIR
1. Introduction
Creating an initial chart from your data is easy, regardless if you use R or Python. Creating a pleasing and informative chart that is reproducible and written in clear and maintainable code involves more effort.
In this article, I will share my workflow and structure to achieve just that. I will show you how to structure your code and what building blocks I rely on. In the following section, I explain what data and visualization package I am using for this article. In section 3, I will outline the chart layering structure, while in section 4, I will provide three examples of how you may apply the layer structure. Lastly, in section 5, I conclude this article.

2. Setup
To showcase the functions, I will make use of the Gapminder data set. The Gapminder data set contains data about life expectancy, GDP per capita, and population for the country, spanning many decades. Furthermore, I use Jupyter Notebooks in combination with Pandas.
If you wonder how I structure my code, it is called method chaining or piping (in the R community). I explain the basics in one of my other articles.
As you might know, I started my Data Science career using R, and hence, I am very familiar with ggpot2. As I started using Python more and more often in my customer projects, I began to look for a similar visualization package in Python. I am happy to say that with Plotnine there is an excellent alternative.
However, since I like to try new things in general and visualization packages specifically, I wanted to try out the Altair package. Altair is a Python data visualization package that follows a declarative approach, just like ggplot2 and Plotnine.

3. Visualization Layers and Workflow
Before introducing the three Visualization layers to you, I quickly mention the data frames are essential for this setup.

Original Data Frame is the data frame you would like to visualize. In the running example in section 4, I use the Gapminder data set. From the original data frame, I generate the Summarized Data Frame. I refer to it as "summarized" as I often calculate summaries from the original data, i.e., averages, quartiles, and counts. Please see the figure below as an example.
The Chart Layer refers to the visualization you want to prepare, e.g., a histogram, a boxplot, or a boxplot. In the figure below, it is a histogram. The Graphical Layer holds any elements to add more information to the Chart Layer. For example, statistical summary information such as quartiles or averages displayed as horizontal or vertical lines. I use the Textual Layer to display text, such as interpretations, highlights, and statistical information.
4. Applying the Visualization Layers (Running Examples)
4.1 Histogram of Life Expectancy
For the Original Data Frame, I use the Gapminder Data, but I will only use data from the year 2007.
For the Summarized Data Frame, I summarize the data by the quantiles 0.25, 0.5 (Median), 0.75, and the mean (average).

To create the Chart Layer, I use the Altair mark_bar() to create a histogram. Please note that I also set a title, subtitle, and width & height of the chart.

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_rule() to create vertical lines based on the statistical summaries.

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the elements from the Graphical Layer.

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

4.2 Boxplot of Life Expectancy by Continent
There is no change for the Original Data Frame. For the Summarized Data Frame, I summarize the data by grouping by continent and using the describe() function, then selecting only the lifeExp variable, and finally, creating additional variables with rounded values and text labels.

To create the Chart Layer, I use the Altair mark_boxplot() to create a boxplot. Please note that I also set a title, subtitle, and width & height of the chart. Please note that the alt.EncodingSortfield() did not work for me.

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_tick() to create vertical lines to display the mean for each continent.

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the count of countries by continent and the mean value from the Graphical Layer.

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

4.3 Scatter/ Pointplot of Life Expectancy for each African Country
For this visualization I enhance the Original Data Frame. First, I filter for African countries and then I create a variable named Quartile Range that describes if the life expectancy is below-equal the first quartile, above-equal the third quartile, or part of the inter-quartile range. Finally, I create a sort vector. The vector lists all African countries arranged by life expectancy.
For the Summarized Data Frame, I summarize the data by using the describe() function, then selecting only the lifeExp variable, and finally, creating additional variables with rounded values and text labels.

To create the Chart Layer, I use the Altair mark_point() to create a scatter plot. Please note that I also set a title, subtitle, and width & height of the chart.

The Graphical Layer I will base on the summarized Data Frame and the Altair mark_rule() to create vertical lines based on the statistical summaries.

The Textual Layer is a result of the Summarized Data Frame. In Altair, you may add textual annotations using the mark_text() function. The text labels describe the elements from the Graphical Layer.

Finally, I put all layers on top of each other and configure the layout using the configure_title() function.

5. Conclusion
Creating and maintaining reproducible data visualizations is not a simple task. In this article, I shared my workflow and structure to achieve just that. I showed you how to structure your code and what building blocks I rely on. In the previous sections, I explained what data and visualization package I used. I outlined the chart layering structure and provided three examples of how you may apply the layer structure.
Please feel free to contact me with any questions, comments, and feedback. Thank you.