The world’s leading publication for data science, AI, and ML professionals.

A Quick and Easy Guide to Conditional Formatting in Pandas

Discover how you can apply conditional formatting on Pandas DataFrames

Photo by Small Business Computing
Photo by Small Business Computing

The Pandas library in Python has been predominantly used for data manipulation and analysis, but did you know that Pandas also allows for Conditional Formatting of DataFrames?

Conditional formatting is a feature that allows you to apply specific formatting to cells that fulfill certain conditions. It is common in spreadsheet applications like Microsoft Excel and it helps to draw viewer’s attention to important data points and values. It also allows you to visually dissect a dataset based on colors, making it easier to work with large datasets.

In this article, I will provide a simple guide on how you can apply conditional formatting on Pandas DataFrames. The codes presented in this article can be found as a notebook at this GitHub repo.

The dataset

In this article, we will be using the built-in Iris dataset from the seaborn package. For simplicity, we will take a random sample of 10 observations.

Image by Author
Image by Author

A primer on Pandas DataFrame Styler

Before we dive deeper, it is necessary to first introduce the concept of the Pandas DataFrame Styler. Pandas has a DataFrame.style **** property, which returns a Styler object. It is this Styler object that enables us to access and modify various styling properties of a Pandas DataFrame. Calling .style on our DataFrame df will display the DataFrame as-is, but it is displayed as a pandas.io.formats.style.Styleobject rather than the usual pandas.core.frame.DataFrame.

Image by Author
Image by Author

Observe that calling .style on df changes the precision of the floats. It doesn’t actually affect anything when we implement conditional formatting so we will leave it as it is. However, if you’re particular about this, feel free to use the code below to configure the precision accordingly.

Image by Author
Image by Author

Under-the-hood, the Styler object uses Cascading Style Sheets (CSS) to customise various parameters influencing the display of the DataFrame. This is done by passing style functions into the Styler object using .apply() or .applymap(). The output of style functions should therefore be strings containing CSS attribute-value pair, in the 'attr: value' format. If nothing is to be applied, the functions should return an empty string ''or None.

For more details on the DataFrame.style property and the Styler object, check out the following links from Pandas official documentation:

Conditional cell highlighting

One way to conditionally format your Pandas DataFrame is to highlight cells which meet certain conditions. To do so, we can write a simple function and pass that function into the Styler object using .apply() or .applymap():

  • .applymap(): applies a function to the DataFrame element-wise;
  • .apply(): applies a function to the DataFrame row-wise or column-wise.

Let’s now explore a few scenarios.

1. Highlighting any cells that meet a condition

Suppose we want to highlight all cells that have a value of 5.1 in our DataFrame df with a yellow background. We can write a simple function and use .applymap() in the following way:

Image by Author
Image by Author

2. Highlighting any cells that do not meet a condition

Notice from the previous color snippet that color was set to '' if the condition was not met. Instead of an empty string, we can certainly set a color if we also want to highlight cells that do not meet the condition. Here, we will keep cells with a value of 5.1 yellow, and highlight cells with a value other than 5.1 pastel blue. Do note that when setting a color, you can use any valid HTML/CSS color names or a hexadecimal color code. For purpose of demonstration, we will use a hexadecimal color code.

Image by Author
Image by Author

We can take one step further and generalise the style function. Doing so means that we can pass in as arguments to the function, the colors to highlight the cells with if the condition is true and if the condition is false. The code snippet below will display the same output as above.

3. Highlighting cells that meet a condition in selected columns only

What if we only want to apply conditional highlighting on certain columns, instead of the entire DataFrame? You can do so by passing a list of column names into the subset argument in .applymap().

In the code below, we are applying conditional formatting only on the "sepal_length" and "petal_length" columns.

Image by Author
Image by Author

4. Highlighting rows based on categorical values

We can also highlight rows of a DataFrame based on values in a categorical column. This would allow us to visually segment the dataset, which is especially helpful when working with large datasets. Here, if we want to highlight rows depending on the species of flower, we can do this:

Image by Author
Image by Author

Notice that there are a few key differences in the above code:

  • First, the style function, highlight_rows(), now takes in each row as an argument, as opposed to the previous highlight_cells() function which takes in each cell value as an argument.
  • Second, since we are applying a style function row-wise, we use .apply() with axis=1 instead of .applymap().
  • Third, because we are applying the function row-wise, the output of the style function is a list of CSS strings, instead of a single string.

5. Highlighting columns based on column names

We can also highlight cells based on column names. We can define the style function to take in each column as an argument instead, and then apply that function column-wise by specifying axis=0.

Suppose we want to highlight the "species" column in a different color as the other columns, we can do this:

Image by Author
Image by Author

Conditional text formatting

Another way in which we may want to format your DataFrame is to customise the text in each cell. The scenarios covered in the previous section can also be applicable to text formatting.

If you have realised the pattern, the CSS attribute-value pair for cell highlighting that we have configured takes the following format: background-color: <set color>. We can apply other types of conditional formatting by specifying different CSS attribute-value pair formats. Here are some common attributes we can customise on Styler objects and their corresponding CSS string formats:

  1. Font color: 'color: <set color>'
  2. Font type: 'font-family: <set font type>'
  3. Font size: 'font-size: <set font size>'
  4. Font weight: 'font-weight: <set font weight>'

We can even specify more than one type of attribute in a single style function by chaining them with a semicolon, like this: "background-color: 'yellow'; color: 'blue'; font-size: '15px'". Of course, this would only apply to a single condition. If we want to specify different types of formatting for different conditions, we would need to define separate style functions.

Exporting as an Excel file

Now, having done all those conditional formatting in Pandas, the next thing we might want to do is to export the DataFrame as an Excel file. Of course, we would want to retain all the formatting in the exported Excel file. We can do this using the .to_excel() method, specifying a .xlsx filename and the engine (either openpyxl or xlsxwriter).

Image by Author
Image by Author

Putting it all together

A lot has been covered so far. Let’s put these ideas into context through a concrete example. Suppose we want to format our DataFrame df in the following ways:

  • highlight rows based on the species of flower;
  • set font color as red and font weight as bold when sepal length or sepal width is between 3.5mm and 5.5mm;
  • set font size as 15px and font type as cursive when petal length or petal width is between 1.5mm and 3.5mm.

Let’s work through this example step by step:

Step 1: Highlight rows based on species of flowers

We have done this previously, with the highlight_rows() style function.

Image by Author
Image by Author

Step 2: Set font color and weight when sepal length or width is between 3.5mm and 5.5mm

Since we are interested in values of sepal length and sepal width, we can define a style function that takes in a cell value as an argument, and then use .applymap()while specifying the columns of interest using the subset argument.

Here, we are formatting font color and font weight, so the CSS string that the style function should return needs to be in the 'color: <set color>; font-weight: <set font weight>' format. Here’s the code and the output:

Image by Author
Image by Author

Step 3: Set font size and type when petal length or width is between 1.5mm and 3.5mm

Similarly, we define another style function to return a CSS string in the 'font-size: <set font size>; font-family: <set font type>' format. Notice that we can apply different formatting for different conditions by simply chaining methods together.

Image by Author
Image by Author

Step 4: Export as an Excel file

Okay, our DataFrame is starting to look odd, so we shall stop formatting it. I’m sure you get the drift. Let’s do one final thing – exporting it as an Excel file.

Image by Author
Image by Author

There appears to be some issues retaining the font type formatting, but the rest of the formatting looks fine.

Other resources

Since DataFrame.style uses CSS under-the-hood, we can conditionally format DataFrames in many other ways. Pandas also offers other interesting features. Do check out the following resources to learn more:

Conclusion

Congratulations! You have learned the basics of conditional formatting in Pandas. A quick recap – You have understood the role of the Pandas DataFrame Styler object in customising how a DataFrame should be displayed. You have learned how to use CSS in the 'attr: value' format to specify various settings, including color of cell highlights, font color, font size and font types.

You have also seen how to export a Stylerobject as an Excel file in a way that retains the formatting. There are plenty of other ways you can conditionally format your DataFrames. Nothing is stopping you from experimenting further, creating more complex formatting and having some fun along the way!


Before you go…

If this article has been of value and you wish to support me as a writer, do consider signing up for a Medium membership. It’s $5 a month, and the fees directly support me and other writers whose articles you read. As a member, you get unlimited access to stories published on Medium. If you sign up using this link, I’ll get to earn a small commission. Feel free to also join my email list if you wish to be notified whenever I publish.


Related Articles