The world’s leading publication for data science, AI, and ML professionals.

Customize Your Pandas Data Frame For Effective Data Analysis

Booster your analysis with Pandas Styler!

Photo by Daniel Kuruvilla on Unsplash
Photo by Daniel Kuruvilla on Unsplash

Introduction

Pandas is a powerful package for data manipulation and Analytics, and I believe this library is not new to many of you. However, sometimes when I analyze data in tabular form, all I observe is plain numbers, making it hard to notice characteristics of the values. I think it would be nice if I could apply more visual components or conditional formatting to my Pandas data frame.

Fortunately, I found Styler object. This table visualization tool really makes data frame tempting, informative and easy to track in many cases.

What is Styler Object?

We can style our table after the Data has been loaded in a Data Frame. The basic mechanism of action under Styler is that it makes an HTML table and applies CSS styling to manage different attributes such as colors, background, fonts, etc. The DataFrame.style property returns a Styler object. It supports automatic render in Jupyter Notebook as it has a _repr_html_ method defined on it.

Let’s see what Styler can do with our data table.

First, let’s create a simple data so I can explain this effective tool in details. In my dataset, you can see the product names and their prices, as well as a list of people who purchase them.

Figure 1: DataFrame - Image by Author
Figure 1: DataFrame – Image by Author

Hiding column

If I do not need "owner" column and want to temporarily hide it from the data, I may use hide_columns() method.

df.style.hide_columns('price')

As you can see, the result below has "owner" column hidden.

Output:

Figure 2: Owner Column Hidden - Image by Author
Figure 2: Owner Column Hidden – Image by Author

Highlighting null values

Noticing null values is important in analytics. In order to observe those null values, we can highlight them for easier observation. highlight_null attribute is used in this case. For instance, in my dataset, I can see that my table data doesn’t have the product name that Mike purchased.

df.style.highlight_null(null_color="yellow")

Output:

Figure 3: Highlight null values - Image by Author
Figure 3: Highlight null values – Image by Author

Highlighting min/max values

If I want to find out who purchases the most or least, I can highlight the minimum/maximum purchased amount in the list with highlight_min and highlight_max.

Let’s see who spends the most:

df.groupby('owner').agg({'price':'sum'}).reset_index().style.highlight_max(color = 'green') 

Output:

Figure 4: Highlighting maximum value - Image by Author.
Figure 4: Highlighting maximum value – Image by Author.

With spending amount of 669, Rose is the person that purchased the most.

Similarly, by applying highlight_minit is obvious that Jen is the one who spent the least.

df.groupby('owner').agg({'price':'sum'}).reset_index().style.highlight_min(color = 'red')

Output:

Figure 5: Highlighting minimum value - Image by Author.
Figure 5: Highlighting minimum value – Image by Author.

Highlighting quantile

You can also highlight values defined by a quantile. For example, in this case I want to highlight 90% percentile values for all the product prices. I can do it with highlight_quantile

df[['price']].style.highlight_quantile(q_left=0.9, axis=None, color='yellow') 

Output:

Figure 6: Highlight 90% quantile values - Image by Author
Figure 6: Highlight 90% quantile values – Image by Author

Creating heatmap

Heatmaps help display values using color shades. The more intense the color shades, the greater the values. You can choose to customize your heatmap with background gradient or text gradient methods.

For example, to create a heatmap for a dataset including values in 5 different categories A, B, C, D, E:

Figure 7: Dataset df3 - Image by Author
Figure 7: Dataset df3 – Image by Author

First, we can applystyle.background_gradient We can even combine with seaborn to get better color.

import seaborn as sns
cm = sns.light_palette("pink", as_cmap=True)
df3.style.background_gradient(cmap=cm) 

Output:

Figure 8: Heatmap gradient background— Image by Author
Figure 8: Heatmap gradient background— Image by Author

If we do not want to customize our table with background colors, we may consider usingtext_gradient method.

cm = sns.light_palette("green", as_cmap=True)
df3.style.text_gradient(cmap=cm)

Output:

Figure 9: Text gradient - Image by Author
Figure 9: Text gradient – Image by Author

Creating bar chart

You can now have the df.style.bar centered on the zero value and you can pass a list of [color_negative, color_positive] to the values to get your bar charts.

I will display my values in the form of negative/positive bar charts. By looking at this, is it easy to differentiate the positive and negative values.

df3.style.bar(subset=['A', 'B'], align="zero", color=['pink', 'lightgreen'])

Output:

Figure 10: Positive and Negative bar chart
Figure 10: Positive and Negative bar chart

Adding caption

In an analytic report, there are usually a lot of data frames. In order to know what each data frame describes and distinguish them from each other, captions can be added for more detailed information.

For example, I want to name my heatmap table as "Chi Nguyen Medium Data". I can do it by using set_caption property of style

df3.style.set_caption("Chi Nguyen Medium Data").background_gradient()

Output:

Figure 11: Caption added - Image by Author
Figure 11: Caption added – Image by Author

Styling with your own method

Even though there are many available methods to style your data frame, in some cases you will need your own customized function to fulfill your task. Let’s try with an example to see how you can create your function.

In my dataset, I want to illustrate all values that are greater than their column mean value. I will highlight the greater values with blue color and smaller values with pink. Here is how I define my function and return the result.

#Define function
def mean_greater(m):
    greater = m > m.mean()
    return ['background-color: lightblue' if i else 'background-color: pink' for i in greater]
#Apply function
df3.style.apply(mean_greater)

Output:

Figure 12: Highlight values
Figure 12: Highlight values

Conclusion

Although styling data frame is just a very small step in the analysis, I believe customizing flexibly different styling methods can help booster our analytical speed as well as provide us with more comprehensive observations and data insights.

It is never excessive to accumulate your skills little by little, starting from the easiest and seemingly least important ones.

In order to receive updates regarding my upcoming posts, kindly subscribe as a member using the provided Medium Link.

Reference

Table Visualization – pandas 1.3.4 documentation


Related Articles