The world’s leading publication for data science, AI, and ML professionals.

Enchanced Tabular Data Visualization (Pandas)

Simple but efficient techniques to improve pandas dataframe representation

From Pixabay
From Pixabay

In this article, we’ll discuss some useful options and functions to efficiently visualize dataframes as a set of tabular data in pandas. Let’s start with creating a dataframe for our further experiments:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(20, 40))
# Renaming columns
df.columns = [x for x in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN']
# Adding some missing values 
df.iloc[3,4] = np.nan
df.iloc[2,0] = np.nan
df.iloc[4,5] = np.nan
df.iloc[0,6] = np.nan
df.head()
Image by Author
Image by Author

Attention: the code from this article was run in pandas version 1.3.2. Some of the functions are quite new and will throw an error in the older versions.

Customizing the number of displayed columns and float precision

Looking at the dataframe above, we might want to fix 2 things:

  1. Display all the columns of the dataframe. For now, the columns from k to D inclusive are hidden.
  2. Limit the precision of float values.

Let’s check the default values for the number of columns displayed and the float precision:

print(pd.options.display.max_columns)
print(pd.options.display.precision)
Output:
20
6

We want to display all the columns (len(df.columns)) and have a precision of 2 decimal points, so we have to re-assign both options:We want to display all the columns (len(df.columns)) and have a precision of 2 decimal points, so we have to re-assign both options:

pd.options.display.max_columns = len(df.columns)
pd.options.display.precision = 2
print(pd.options.display.max_columns)
print(pd.options.display.precision)
df.head()
Output:
40
2
Image by Author
Image by Author

Suppressing scientific notation

Now that we can see the values in all the columns and in a more digestible form, another issue appeared: some float numbers are displayed in scientific notation (e.g. 7.85e-1 instead of 0.785). To fix it, we should explicitly assign the necessary format (in our case – 2 decimal points) to the float_format attribute:

pd.options.display.float_format = '{:.2f}'.format
df.head()
Image by Author
Image by Author

Hiding the index and header

For our next experiments, let’s slice out a smaller dataframe (df1) from the main one (df):

df1 = df.iloc[:5,:8]
df1
Image by Author
Image by Author

We can want to hide the index or header (or both) from the dataframe. For this purpose, we should create an instance of the Styler class by using DataFrame.style attribute and apply to it the methods hide_index() or hide_columns(), correspondingly. We’ll use this attribute for all our next experiments.

df1.style.hide_index()
Image by Author
Image by Author
df1.style.hide_columns()
Image by Author
Image by Author

Chaining these methods (df1.style.hide_index().hide_columns()) will hide both the index and header. We can also notice that NaN values are displayed as nan when using DataFrame.style attribute.

Always displaying the index or header

In some other cases, we might want, just the opposite, to always keep visible the index (or the header) when scrolling through the dataframe. It’s especially convenient for large dataframes, so let’s return to our original df. The method to use here is set_sticky(). Depending on what we want to stick, we should pass in axis='index' or axis='columns':

df.style.set_sticky(axis='index')
Image by Author
Image by Author
df.style.set_sticky(axis='columns')
Image by Author
Image by Author

Highlighting particular values: null, minimum, maximum, values from a certain range

To highlight null values, we can use a built-in function highlight_null():

df1.style.highlight_null()
Image by Author
Image by Author

The default color is red, but we can change it by passing in an optional parameter null_color. Also, it’s possible to visualize the null values of only one or several selected columns (or rows). We use the subset parameter for it, passing in the name of the column (or the row index) or a list of names (indices):

df1.style.highlight_null(null_color='lime', subset=['e', 'g'])
Image by Author
Image by Author

To highlight minimum and maximum values in each column of the dataframe, we can apply the methods highlight_min() and highlight_max():

df1.style.highlight_min()
Image by Author
Image by Author
df1.style.highlight_max()
Image by Author
Image by Author

The default color can be changed by passing in an optional parameter color. Also here we can use the subset parameter to select only one or several columns to visualize minimum or maximum values. And of course, we can chain both methods:

df1.style.highlight_min(color='cyan', subset='d').highlight_max(color='magenta', subset='d')
Image by Author
Image by Author

By default, the minimum and maximum values are displayed by column. If we need such information by row, we have to specify axis='columns':

df1.style.highlight_min(axis='columns')
Image by Author
Image by Author

In this case, if we want to select only one or several rows instead of the whole dataframe, we should pass in the corresponding value for subset: the row index or indices.

Finally, it’s possible to highlight the values from a selected range using the highlight_between() method. Apart from the already familiar parameters color and subset, we have to assign the left and/or right parameters and, optionally, inclusive, which is 'both' by default (other possible values are 'neither', 'left', or 'right'):

df1.style.highlight_between(left=-0.1, right=0.1, inclusive='neither')
Image by Author
Image by Author

Displaying dataframe values as a heatmap

There are two curious methods for highlighting the cells or the text inside them in a gradient, Heatmap-like style, based on a numeric scale of their values: background_gradient() and text_gradient(). Both methods require the installation of matplotlib (not necessarily import).

df1.style.background_gradient()
Image by Author
Image by Author
df1.style.text_gradient()
Image by Author
Image by Author

Apart from subset, we can tune the following parameters:

  • cmap – a matplotlib colormap ('PuBu' by default),
  • axis – coloring the values column-wise (axis='index'), row-wise (axis='columns'), or the whole dataframe (by default),
  • low, high – extending the range of the gradient at the low/high ends based on the corresponding fraction of the original data-based range,
  • vmin, vmax – defining a data value that corresponds to the colormap minimum/maximum value (by default, it will be the min/max data value),
  • text_color_threshold – only used in background_gradient(), determines light/dark change of text color to enchance text visibility across cell background colors (by default, 0.408).

Let’s try to adjust some of these parameters:

df1.style.text_gradient(cmap='cool', subset=3, axis='columns', vmin=-2)
Image by Author
Image by Author

Conclusion

There are many other ways to flexibly customize table visualization in Python: applying a more advanced text formatting, controlling data slicing, changing text font, modifying cell boundary properties, assigning hover effects, etc. In general, it’s possible to apply whatever function we need with any custom logic and, when using the DataFrame.style attribute, use a great variety of CSS styling elements. In this article, we considered some of the most common tasks, which, being used more often than others, resulted in creating built-in functions for these purposes, with a simple and clear syntax yet highly customizable outputs.

Thanks for reading!


You can find interesting also these articles:

When a Python Gotcha Leads to Wrong Results

How To Read Your Horoscope in Python

Generating a Word Cloud in Python for "The Little Prince"


Related Articles