In this article, we’ll discuss some useful options and functions to efficiently visualize dataframes as a set of tabular data in pandas. Let’s start with creating a dataframe for our further experiments:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(20, 40))
# Renaming columns
df.columns = [x for x in 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN']
# Adding some missing values
df.iloc[3,4] = np.nan
df.iloc[2,0] = np.nan
df.iloc[4,5] = np.nan
df.iloc[0,6] = np.nan
df.head()

Attention: the code from this article was run in pandas version 1.3.2. Some of the functions are quite new and will throw an error in the older versions.
Customizing the number of displayed columns and float precision
Looking at the dataframe above, we might want to fix 2 things:
- Display all the columns of the dataframe. For now, the columns from
k
toD
inclusive are hidden. - Limit the precision of float values.
Let’s check the default values for the number of columns displayed and the float precision:
print(pd.options.display.max_columns)
print(pd.options.display.precision)
Output:
20
6
We want to display all the columns (len(df.columns)
) and have a precision of 2 decimal points, so we have to re-assign both options:We want to display all the columns (len(df.columns)
) and have a precision of 2 decimal points, so we have to re-assign both options:
pd.options.display.max_columns = len(df.columns)
pd.options.display.precision = 2
print(pd.options.display.max_columns)
print(pd.options.display.precision)
df.head()
Output:
40
2

Suppressing scientific notation
Now that we can see the values in all the columns and in a more digestible form, another issue appeared: some float numbers are displayed in scientific notation (e.g. 7.85e-1 instead of 0.785). To fix it, we should explicitly assign the necessary format (in our case – 2 decimal points) to the float_format
attribute:
pd.options.display.float_format = '{:.2f}'.format
df.head()

Hiding the index and header
For our next experiments, let’s slice out a smaller dataframe (df1
) from the main one (df
):
df1 = df.iloc[:5,:8]
df1

We can want to hide the index or header (or both) from the dataframe. For this purpose, we should create an instance of the Styler class by using DataFrame.style
attribute and apply to it the methods hide_index()
or hide_columns()
, correspondingly. We’ll use this attribute for all our next experiments.
df1.style.hide_index()

df1.style.hide_columns()

Chaining these methods (df1.style.hide_index().hide_columns()
) will hide both the index and header. We can also notice that NaN
values are displayed as nan
when using DataFrame.style
attribute.
Always displaying the index or header
In some other cases, we might want, just the opposite, to always keep visible the index (or the header) when scrolling through the dataframe. It’s especially convenient for large dataframes, so let’s return to our original df
. The method to use here is set_sticky()
. Depending on what we want to stick, we should pass in axis='index'
or axis='columns'
:
df.style.set_sticky(axis='index')

df.style.set_sticky(axis='columns')

Highlighting particular values: null, minimum, maximum, values from a certain range¶
To highlight null values, we can use a built-in function highlight_null()
:
df1.style.highlight_null()

The default color is red, but we can change it by passing in an optional parameter null_color
. Also, it’s possible to visualize the null values of only one or several selected columns (or rows). We use the subset
parameter for it, passing in the name of the column (or the row index) or a list of names (indices):
df1.style.highlight_null(null_color='lime', subset=['e', 'g'])

To highlight minimum and maximum values in each column of the dataframe, we can apply the methods highlight_min()
and highlight_max()
:
df1.style.highlight_min()

df1.style.highlight_max()

The default color can be changed by passing in an optional parameter color
. Also here we can use the subset
parameter to select only one or several columns to visualize minimum or maximum values. And of course, we can chain both methods:
df1.style.highlight_min(color='cyan', subset='d').highlight_max(color='magenta', subset='d')

By default, the minimum and maximum values are displayed by column. If we need such information by row, we have to specify axis='columns'
:
df1.style.highlight_min(axis='columns')

In this case, if we want to select only one or several rows instead of the whole dataframe, we should pass in the corresponding value for subset
: the row index or indices.
Finally, it’s possible to highlight the values from a selected range using the highlight_between()
method. Apart from the already familiar parameters color
and subset
, we have to assign the left
and/or right
parameters and, optionally, inclusive
, which is 'both'
by default (other possible values are 'neither'
, 'left'
, or 'right'
):
df1.style.highlight_between(left=-0.1, right=0.1, inclusive='neither')

Displaying dataframe values as a heatmap
There are two curious methods for highlighting the cells or the text inside them in a gradient, Heatmap-like style, based on a numeric scale of their values: background_gradient()
and text_gradient()
. Both methods require the installation of matplotlib (not necessarily import).
df1.style.background_gradient()

df1.style.text_gradient()

Apart from subset
, we can tune the following parameters:
cmap
– a matplotlib colormap ('PuBu'
by default),axis
– coloring the values column-wise (axis='index'
), row-wise (axis='columns'
), or the whole dataframe (by default),low
,high
– extending the range of the gradient at the low/high ends based on the corresponding fraction of the original data-based range,vmin
,vmax
– defining a data value that corresponds to the colormap minimum/maximum value (by default, it will be the min/max data value),text_color_threshold
– only used inbackground_gradient()
, determines light/dark change of text color to enchance text visibility across cell background colors (by default, 0.408).
Let’s try to adjust some of these parameters:
df1.style.text_gradient(cmap='cool', subset=3, axis='columns', vmin=-2)

Conclusion
There are many other ways to flexibly customize table visualization in Python: applying a more advanced text formatting, controlling data slicing, changing text font, modifying cell boundary properties, assigning hover effects, etc. In general, it’s possible to apply whatever function we need with any custom logic and, when using the DataFrame.style
attribute, use a great variety of CSS styling elements. In this article, we considered some of the most common tasks, which, being used more often than others, resulted in creating built-in functions for these purposes, with a simple and clear syntax yet highly customizable outputs.
Thanks for reading!
You can find interesting also these articles:
When a Python Gotcha Leads to Wrong Results