The world’s leading publication for data science, AI, and ML professionals.

Make Your Tables Look Glorious

Simple formatting tricks to make your pandas DataFrames presentation-ready

Photo by Pierre Bamin on Unsplash
Photo by Pierre Bamin on Unsplash

With 2023 delivering everything we all hope for and dream of, the new year also brings with it something else: year end reports and presentations.

Whether we like it or not, visual impressions count. And while graphics are generally easier to understand and better convey a message than an array of numbers, we sometimes can’t get away from using a table of figures. But that doesn’t mean that the table can’t be pretty!

Now, fair is fair – Jupyter notebooks do make a decent-looking table, but using just Pandas, we can do a fair bit of customisation to really make the table our own, and – more importantly – get our message across.

In this article, we’ll see how to:

  1. Format dates
  2. Format absolute numbers
  3. Format currencies
  4. Format ratios
  5. Export formatted DataFrames

Let’s get to it, imagining for a moment that we’re at the Widget Company, presenting sales data for two types of widget to an internal sales team.

Aside: of course, you won’t only have to report results, projections and summaries in January, so the tips I’ll summarise here will hopefully be more perennial.


The data

We’ll start with creating a dummy data set. Nothing fancy here – just a bit of simulation using pandas and numpy .

import pandas as pd
import numpy as np

# simulated data for widget A
df_a = pd.DataFrame(
    {
        'Month':pd.date_range(
            start = '01-01-2012',
            end = '31-12-2022',
            freq = 'MS'
        ),
        'Quotes':np.random.randint(
            low = 1_000_000,
            high = 2_500_000,
            size = 132
        ),
        'Numbers':np.random.randint(
            low = 300_000,
            high = 500_000,
            size = 132
        ),
        'Amounts':np.random.randint(
            low = 750_000,
            high = 1_250_000,
            size = 132
        )
    }
)

df_a['Product'] = 'A'

# simulated data for widget B
df_b = pd.DataFrame(
    {
        'Month':pd.date_range(
            start = '01-01-2012',
            end = '31-12-2022',
            freq = 'MS'
        ),
        'Quotes':np.random.randint(
            low = 100_000,
            high = 800_000,
            size = 132
        ),
        'Numbers':np.random.randint(
            low = 10_000,
            high = 95_000,
            size = 132
        ),
        'Amounts':np.random.randint(
            low = 450_000,
            high = 750_000,
            size = 132
        )
    }
)

df_b['Product'] = 'B'

# put it together & sort
df = pd.concat([df_a,df_b],axis = 0)
df.sort_values(by = 'Month',inplace = True)
df.reset_index(drop = True,inplace = True)

So far, so simple. Let’s calculate a few "interesting" statistics – average sale amounts and product conversion:

# average sale
df['Average sale'] = df['Amounts'] / df['Numbers']

# conversion
df['Product conversion'] = df['Numbers'] / df['Quotes']

… which gives us the following (abbreviated) DataFrame:

Image by author
Image by author

Here we have quite a typical summary table that we might want to present to stakeholders: numbers, currency amounts, and a few ratios all expressed across time.

Let’s work our way through the data set, formatting elements as we go.

For brevity’s sake, forthcoming images of the DataFrame will only show the first six rows of the data, but any code snippets apply to the DataFrame in its entirety.


Formatting dates

First up, the date column.

There’s arguably nothing wrong with the formatting, but it could be better. For instance, since all of the monthly data is reflected as at the first of each month, there’s probably little sense in keeping the day element of each Month entry as it tells the reader very little.

Eagle-eyed readers will note that I am not using the date format common in the US; I would of course recommend using formats which are generally acceptable in your locale.

Let’s show only the year and the month number:

# remove day of month from month column
df.style.format({'Month':'{:%Y-%m}'})
Image by author
Image by author

A small change, but already much better!

Now, we can improve readability even further by using the name of each month rather than the month number and we can do this without having to alter the underlying data.

# use full name of month
df.style.format({'Month':'{:%B %Y}'})
Image by author
Image by author

Maybe a little too wordy now – let’s use abbreviations instead (e.g. "Jan" instead of "January").

# use abbreviated month name
df.style.format({'Month':'{:%b %Y}'})
Image by author
Image by author

Short and snappy. I’d like to take another stab at year and month number, aiming for something more readable than the YYYY-MM but stopping short of using names (abbreviated or not). So let’s re-format the data, but rather than use a YYYY-MM format, let’s switch to using the year and month number, separated by the letter "M".

# year and month number, separated by letter 'M'
df.style.format({'Month':'{:%Y M%m}'})
Image by author
Image by author

Not too shabby, but I prefer the abbreviated name, so we’ll proceed with that.

Aside: if you are interested in using the YYYY M-MM format I’ve used above, but don’t like the look of leading zeroes, the format string {:%Y M%#m} will remove that pesky zero.


Formatting numbers with a thousand separator

A fairly straightforward formatting experience here as we separate thousands of Quotes and Numbers using commas.

What is important to note however, is that if we also want to retain the formatting we applied to the Month column (we do), then we need to extend the formatting dictionary.

# thousands separator for absolute numbers
df.style.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}'
    }
)
Image by author
Image by author

That looks decent – much easier to get a sense of the scale of those absolute numbers.

Next up is currency amounts, where it is important to reflect both the size of the number and the currency it is denominated in.


Formatting currencies

The Widget Company just so happens to produce and sell its widgets in a country that uses a currency denoted by £ (I hope somewhere warmer and sunnier than the country where I earn my £).

Let’s reflect that in the table, reminding ourselves that:

  • At an overall level, using decimal points is probably a little much
  • At a lower level – say for instance, the average sale value – using decimals can be useful.

So we add currency formatting for Amounts and Average sale to our formatting dictionary:

# currency formatting
df.style.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}'
    }
)
Image by author
Image by author

There are different ways of displaying currency, and it’s quite easy to change the format around – for instance, if we wanted to show Average sale as a number followed by the currency symbol we could do the following:

# different currency representation
df.style.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'{:,.2f} (£)'
    }
)
Image by author
Image by author

Personally, I prefer the currency symbol -- number format but we could of course apply the number formatting and include the currency symbol in the column name – e.g. Average sale (£) .


Formatting percentages

Another fairly straightforward formatting step, it’s much easier to view ratios when they are expressed as percentages rather than floating point numbers.

I’d again recommend tailoring the exact format to the use case. If a high degree of accuracy is not required, using few (or no) decimals in your percentage can really clean the table up.

Anyway, our formatting code now becomes:

# percentage formatting
df.style.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
)
Image by author
Image by author

I think our data is now formatted to a decent degree. Let’s move in to changes which affect the overall look and feel of the table.


Hide the index

I think default DataFrame indexes are ugly. There, I said it.

They are obviously important but they can be an eyesore, or even worse, a distraction. Put yourself in the stakeholder’s shoes: you’re being shown a table that not only has unnecessary row numbers, but the numbering starts at zero! Pretty strange if you’re not used to Python.

Of course, there are various ways we could address this. We could set Month to be the DataFrame’s index (even better, set a Month x Productmulti-index). Or, for presentation’s sake, we could set the index to be an array of empty strings.

Or, we could just hide the index on display. It’s much cleaner and leaves the DataFrame’s index unperturbed.

# suppress the index
df.style.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
).hide_index()
Image by author
Image by author

Now we’re getting there.


Conditional formatting

Highlighting certain elements in our data is a great way of getting a message across, or drawing the audience’s attention to a certain aspect of the data.

We’ll start with highlighting rows if an element of the row meets a given condition — in this case, highlight all rows containing information relating to product A.

We do this in two steps:

  1. Define the function highlight_product which returns a string if the given condition is met (that is, if the row relates to the specified product). The string contains a format command that we will pass through to the Styler.
  2. The resulting format command is fed through using the apply command.
# function to conditionally highlight rows based on product
def highlight_product(s,product,colour = 'yellow'):
    r = pd.Series(data = False,index = s.index)
    r['Product'] = s.loc['Product'] == product

    return [f'background-color: {colour}' if r.any() else '' for v in r]

# apply the formatting
df.style
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
).hide_index()
Image by author
Image by author

And so rows relating to product A highlighted. Easy peasy.

As it so happens, this also makes the table more readable as it’s now easier to distinguish between the two product types.

Pro tip: we can provide colour hex codes to pandas, making bespoke formatting more accessible – this particular shade of blue is actually one of my favourite Microsoft Excel colours. Tailoring highlighting colours to match a company’s colour palette is a really neat touch.

We can of course use different conditional tests. Here’s an example of applying a conditional threshold to Average sale:

# function to highlight rows based on average sale
def highlight_average_sale(s,sale_threshold = 5):
    r = pd.Series(data = False,index = s.index)
    r['Product'] = s.loc['Average sale'] > sale_threshold

    return ['background-color: yellow' if r.any() else '' for v in r]

# apply the formatting
df.iloc[:6,:].style
.apply(highlight_average_sale,sale_threshold = 20, axis = 1)
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
).hide_index()
Image by author
Image by author

… and we see rows with Average sale > £20 very starkly highlighted in yellow.

Instead of applying the highlighting to an entire row, we could limit format changes to a subset of the DataFrame. As an example, we’ll apply two threshold tests to Product conversion, changing the font colour and weight if the condition is met.

As usual, we need a to specify some formatting functions first:

# functions to change font colour based on a threshold
def colour_threshold_lessthan(value,threshold,colour = 'red'):
    if value < threshold:
        return f'color: {colour}'
    else:
        return ''

def colour_threshold_morethan(value,threshold,colour = 'green'):
    if value > threshold:
        return f'color: {colour}'
    else:
        return ''

# functions to change font weight based on a threshold    
def weight_threshold_lessthan(value,threshold):
    if value < threshold:
        return f'font-weight: bold'
    else:
        return ''

def weight_threshold_morethan(value,threshold):
    if value > threshold:
        return f'font-weight: bold'
    else:
        return ''

# apply the formatting
df.style
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.applymap(colour_threshold_lessthan,threshold = 0.05,subset = ['Product conversion'])
.applymap(weight_threshold_lessthan,threshold = 0.05,subset = ['Product conversion'])
.applymap(colour_threshold_morethan,threshold = 0.2,subset = ['Product conversion'])
.applymap(weight_threshold_morethan,threshold = 0.2,subset = ['Product conversion'])
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
)
.hide_index()
Image by author
Image by author

Nice!

Notice how we use applymap here rather than apply, and use the subset argument to restrict the formatting to a subset of the DataFrame.

I think I’d make two changes here:

  • Code-wise, I’d use lambda functions rather than defining so many similar helper functions. That would make for much cleaner code.
  • I would hesitate to present a table where certain cells have different font formats, unless the formatting is applied to row or column totals.

Table level changes: text alignment and captions

Just a quick overall enhancement — we’ll align the text and add a caption to our DataFrame (clear signage is always good!).

# align the text
df.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.applymap(lambda u: 'color: red' if u < 0.15 else '',subset = ['Product conversion'])
.applymap(lambda u: 'font-weight: bold' if u < 0.15 else '',subset = ['Product conversion'])
.applymap(lambda u: 'color: green' if u > 0.2 else '',subset = ['Product conversion'])
.applymap(lambda u: 'font-weight: bold' if u > 0.2 else '',subset = ['Product conversion'])
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    }
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()

… which yields:

Image by author
Image by author

Pro tip: when specifying multi-line captions, line breaks need to be represented with <br> rather than n .


Putting it all together

Let’s put these tips and tricks together now, including adding a column "total" along the bottom of the DataFrame.

The column total needs some extra leg work:

  • We’ll get raw totals by taking sums.
  • The average and conversion needs to be recalculated using the new totals.
  • There’s no meaningful "total" calculation for the Product column, so we’ll replace that element with an empty string.
  • We’ll replace the Month entry with a missing value, so that we can overwrite it without complicating things (sneaky!).
# create a total "row" - i.e. column total
total = df.sum()
total['Month'] = pd.NaT
total['Product'] = ''
total['Average sale'] = total['Amounts'] / total['Numbers']
total['Product conversion'] = total['Numbers'] / total['Quotes']
total = total.to_frame().transpose()
Image by author
Image by author

It’s then pretty straightforward to add the total to the DataFrame using pd.concat . We also write up a quick function to make the text in the total row bold.

# function to highlight the total row
def highlight_total(s):
    r = pd.Series(data = False,index = s.index)
    r['Month'] = pd.isnull(s.loc['Month'])

    return ['font-weight: bold' if r.any() else '' for v in r]

Putting it all together, using our newly-concatenated DataFrame d :

# stack and reset index
d = pd.concat([df,total],axis = 0)
d.reset_index(drop = True,inplace = True)

# apply formatting
d.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7',axis = 1)
.apply(highlight_total,axis = 1)
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    },
    na_rep = 'Total'
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()

… and voila!

Image by author
Image by author

Not too shabby!

Pro tip: notice how we used a missing value in Month , and the na_rep argument to fill in the missing value with a string. This is an easy way to use a string on a date-time column.


Export the magic

If you’re still using PowerPoint for your work Presentations, then you’ll probably be familiar with the pain of taking screenshots of Jupyter notebooks and pasting them into your slides.

This is obviously horribly inefficient (not to mention boring), but there is some good news – you can export your tables to an image using code. Even better news is that there is a handy Python package to do just that – enter dataframe_image²:

import dataframe_image as dfi

# style the table
d_styled = d.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7',axis = 1)
.apply(highlight_total,axis = 1)
.format(
    {
        'Month':'{:%b %Y}',
        'Quotes':'{:,.0f}',
        'Numbers':'{:,.0f}',
        'Amounts':'£{:,.0f}',
        'Average sale':'£{:,.2f}',
        'Product conversion':'{:.2%}'
    },
    na_rep = 'Total'
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()

# export the table to PNG
export_destination = r'C:Users...Presentations'
dfi.export(
    d_styled,
    os.path.join(
        export_destination,
        'styled_dataframe.png'
    )
)
Image by author
Image by author

The resulting image is slightly different to what we see in the notebook, but it is oh-so-good-looking (if I do say so myself!).


Wrapping up

Let’s wrap it up.

This one has been a bit of an image-heavy article, but it’s been necessary to demonstrate how we can use pandas to format dates, absolute numbers, currencies, and ratios. We’ve also used dataframe_image to export our fancy tables to image format.

We’ve covered quite a bit of ground here, but this is by no means an exhaustive discussion. Using some more advanced functionality and a bit of HTML can deliver some awesome-looking tables; I unfortunately am a bit of an HTML-newbie, so have quite quickly reached the limit of my capabilities. I’d recommend checking out the official documentation³ to get a flavour of what’s actually possible; you’ll also find a better explanation of the Styler and the difference between apply and applymap there.

Lastly, whilst I’m new todataframe_image , I found it really simple to use and it does – indeed – do what it says on the tin. nbconvert users may never need to use it, but it’s a great tool to have in the back pocket.

Hopefully this article has set you up for a year of great-looking data tables. Let me know how you get your (tabular) points across— I’m always keen to hear of better ways to communicate ideas!


References and resources

  1. string – Common string operations – Python 3.11.1 documentation
  2. GitHub – dexplo/dataframe_image: A python package for embedding pandas DataFrames as images into pdf and markdown documents
  3. Table Visualization – pandas 1.5.2 documentation (pydata.org)

A useful summary of date formats: datetime – Basic date and time types – Python 3.11.1 documentation.


Related Articles