
With 2023 delivering everything we all hope for and dream of, the new year also brings with it something else: year end reports and presentations.
Whether we like it or not, visual impressions count. And while graphics are generally easier to understand and better convey a message than an array of numbers, we sometimes can’t get away from using a table of figures. But that doesn’t mean that the table can’t be pretty!
Now, fair is fair – Jupyter notebooks do make a decent-looking table, but using just Pandas
, we can do a fair bit of customisation to really make the table our own, and – more importantly – get our message across.
In this article, we’ll see how to:
- Format dates
- Format absolute numbers
- Format currencies
- Format ratios
- Export formatted DataFrames
Let’s get to it, imagining for a moment that we’re at the Widget Company, presenting sales data for two types of widget to an internal sales team.
Aside: of course, you won’t only have to report results, projections and summaries in January, so the tips I’ll summarise here will hopefully be more perennial.
The data
We’ll start with creating a dummy data set. Nothing fancy here – just a bit of simulation using pandas
and numpy
.
import pandas as pd
import numpy as np
# simulated data for widget A
df_a = pd.DataFrame(
{
'Month':pd.date_range(
start = '01-01-2012',
end = '31-12-2022',
freq = 'MS'
),
'Quotes':np.random.randint(
low = 1_000_000,
high = 2_500_000,
size = 132
),
'Numbers':np.random.randint(
low = 300_000,
high = 500_000,
size = 132
),
'Amounts':np.random.randint(
low = 750_000,
high = 1_250_000,
size = 132
)
}
)
df_a['Product'] = 'A'
# simulated data for widget B
df_b = pd.DataFrame(
{
'Month':pd.date_range(
start = '01-01-2012',
end = '31-12-2022',
freq = 'MS'
),
'Quotes':np.random.randint(
low = 100_000,
high = 800_000,
size = 132
),
'Numbers':np.random.randint(
low = 10_000,
high = 95_000,
size = 132
),
'Amounts':np.random.randint(
low = 450_000,
high = 750_000,
size = 132
)
}
)
df_b['Product'] = 'B'
# put it together & sort
df = pd.concat([df_a,df_b],axis = 0)
df.sort_values(by = 'Month',inplace = True)
df.reset_index(drop = True,inplace = True)
So far, so simple. Let’s calculate a few "interesting" statistics – average sale amounts and product conversion:
# average sale
df['Average sale'] = df['Amounts'] / df['Numbers']
# conversion
df['Product conversion'] = df['Numbers'] / df['Quotes']
… which gives us the following (abbreviated) DataFrame:

Here we have quite a typical summary table that we might want to present to stakeholders: numbers, currency amounts, and a few ratios all expressed across time.
Let’s work our way through the data set, formatting elements as we go.
For brevity’s sake, forthcoming images of the DataFrame will only show the first six rows of the data, but any code snippets apply to the DataFrame in its entirety.
Formatting dates
First up, the date column.
There’s arguably nothing wrong with the formatting, but it could be better. For instance, since all of the monthly data is reflected as at the first of each month, there’s probably little sense in keeping the day element of each Month
entry as it tells the reader very little.
Eagle-eyed readers will note that I am not using the date format common in the US; I would of course recommend using formats which are generally acceptable in your locale.
Let’s show only the year and the month number:
# remove day of month from month column
df.style.format({'Month':'{:%Y-%m}'})

A small change, but already much better!
Now, we can improve readability even further by using the name of each month rather than the month number and we can do this without having to alter the underlying data.
# use full name of month
df.style.format({'Month':'{:%B %Y}'})

Maybe a little too wordy now – let’s use abbreviations instead (e.g. "Jan" instead of "January").
# use abbreviated month name
df.style.format({'Month':'{:%b %Y}'})

Short and snappy. I’d like to take another stab at year and month number, aiming for something more readable than the YYYY-MM
but stopping short of using names (abbreviated or not). So let’s re-format the data, but rather than use a YYYY-MM
format, let’s switch to using the year and month number, separated by the letter "M".
# year and month number, separated by letter 'M'
df.style.format({'Month':'{:%Y M%m}'})

Not too shabby, but I prefer the abbreviated name, so we’ll proceed with that.
Aside: if you are interested in using the YYYY M-MM
format I’ve used above, but don’t like the look of leading zeroes, the format string {:%Y M%#m}
will remove that pesky zero.
Formatting numbers with a thousand separator
A fairly straightforward formatting experience here as we separate thousands of Quotes
and Numbers
using commas.
What is important to note however, is that if we also want to retain the formatting we applied to the Month
column (we do), then we need to extend the formatting dictionary.
# thousands separator for absolute numbers
df.style.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}'
}
)

That looks decent – much easier to get a sense of the scale of those absolute numbers.
Next up is currency amounts, where it is important to reflect both the size of the number and the currency it is denominated in.
Formatting currencies
The Widget Company just so happens to produce and sell its widgets in a country that uses a currency denoted by £ (I hope somewhere warmer and sunnier than the country where I earn my £).
Let’s reflect that in the table, reminding ourselves that:
- At an overall level, using decimal points is probably a little much
- At a lower level – say for instance, the average sale value – using decimals can be useful.
So we add currency formatting for Amounts
and Average sale
to our formatting dictionary:
# currency formatting
df.style.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}'
}
)

There are different ways of displaying currency, and it’s quite easy to change the format around – for instance, if we wanted to show Average sale
as a number followed by the currency symbol we could do the following:
# different currency representation
df.style.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'{:,.2f} (£)'
}
)

Personally, I prefer the currency symbol -- number
format but we could of course apply the number formatting and include the currency symbol in the column name – e.g. Average sale (£)
.
Formatting percentages
Another fairly straightforward formatting step, it’s much easier to view ratios when they are expressed as percentages rather than floating point numbers.
I’d again recommend tailoring the exact format to the use case. If a high degree of accuracy is not required, using few (or no) decimals in your percentage can really clean the table up.
Anyway, our formatting code now becomes:
# percentage formatting
df.style.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
)

I think our data is now formatted to a decent degree. Let’s move in to changes which affect the overall look and feel of the table.
Hide the index
I think default DataFrame indexes are ugly. There, I said it.
They are obviously important but they can be an eyesore, or even worse, a distraction. Put yourself in the stakeholder’s shoes: you’re being shown a table that not only has unnecessary row numbers, but the numbering starts at zero! Pretty strange if you’re not used to Python.
Of course, there are various ways we could address this. We could set Month
to be the DataFrame’s index (even better, set a Month x Product
multi-index). Or, for presentation’s sake, we could set the index to be an array of empty strings.
Or, we could just hide the index on display. It’s much cleaner and leaves the DataFrame’s index unperturbed.
# suppress the index
df.style.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
).hide_index()

Now we’re getting there.
Conditional formatting
Highlighting certain elements in our data is a great way of getting a message across, or drawing the audience’s attention to a certain aspect of the data.
We’ll start with highlighting rows if an element of the row meets a given condition — in this case, highlight all rows containing information relating to product A.
We do this in two steps:
- Define the function
highlight_product
which returns a string if the given condition is met (that is, if the row relates to the specified product). The string contains a format command that we will pass through to theStyler
. - The resulting format command is fed through using the
apply
command.
# function to conditionally highlight rows based on product
def highlight_product(s,product,colour = 'yellow'):
r = pd.Series(data = False,index = s.index)
r['Product'] = s.loc['Product'] == product
return [f'background-color: {colour}' if r.any() else '' for v in r]
# apply the formatting
df.style
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
).hide_index()

And so rows relating to product A highlighted. Easy peasy.
As it so happens, this also makes the table more readable as it’s now easier to distinguish between the two product types.
Pro tip: we can provide colour hex codes to pandas
, making bespoke formatting more accessible – this particular shade of blue is actually one of my favourite Microsoft Excel colours. Tailoring highlighting colours to match a company’s colour palette is a really neat touch.
We can of course use different conditional tests. Here’s an example of applying a conditional threshold to Average sale
:
# function to highlight rows based on average sale
def highlight_average_sale(s,sale_threshold = 5):
r = pd.Series(data = False,index = s.index)
r['Product'] = s.loc['Average sale'] > sale_threshold
return ['background-color: yellow' if r.any() else '' for v in r]
# apply the formatting
df.iloc[:6,:].style
.apply(highlight_average_sale,sale_threshold = 20, axis = 1)
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
).hide_index()

… and we see rows with Average sale > £20
very starkly highlighted in yellow.
Instead of applying the highlighting to an entire row, we could limit format changes to a subset of the DataFrame. As an example, we’ll apply two threshold tests to Product conversion
, changing the font colour and weight if the condition is met.
As usual, we need a to specify some formatting functions first:
# functions to change font colour based on a threshold
def colour_threshold_lessthan(value,threshold,colour = 'red'):
if value < threshold:
return f'color: {colour}'
else:
return ''
def colour_threshold_morethan(value,threshold,colour = 'green'):
if value > threshold:
return f'color: {colour}'
else:
return ''
# functions to change font weight based on a threshold
def weight_threshold_lessthan(value,threshold):
if value < threshold:
return f'font-weight: bold'
else:
return ''
def weight_threshold_morethan(value,threshold):
if value > threshold:
return f'font-weight: bold'
else:
return ''
# apply the formatting
df.style
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.applymap(colour_threshold_lessthan,threshold = 0.05,subset = ['Product conversion'])
.applymap(weight_threshold_lessthan,threshold = 0.05,subset = ['Product conversion'])
.applymap(colour_threshold_morethan,threshold = 0.2,subset = ['Product conversion'])
.applymap(weight_threshold_morethan,threshold = 0.2,subset = ['Product conversion'])
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
)
.hide_index()

Nice!
Notice how we use applymap
here rather than apply
, and use the subset
argument to restrict the formatting to a subset of the DataFrame.
I think I’d make two changes here:
- Code-wise, I’d use lambda functions rather than defining so many similar helper functions. That would make for much cleaner code.
- I would hesitate to present a table where certain cells have different font formats, unless the formatting is applied to row or column totals.
Table level changes: text alignment and captions
Just a quick overall enhancement — we’ll align the text and add a caption to our DataFrame (clear signage is always good!).
# align the text
df.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7', axis = 1)
.applymap(lambda u: 'color: red' if u < 0.15 else '',subset = ['Product conversion'])
.applymap(lambda u: 'font-weight: bold' if u < 0.15 else '',subset = ['Product conversion'])
.applymap(lambda u: 'color: green' if u > 0.2 else '',subset = ['Product conversion'])
.applymap(lambda u: 'font-weight: bold' if u > 0.2 else '',subset = ['Product conversion'])
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
}
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()
… which yields:

Pro tip: when specifying multi-line captions, line breaks need to be represented with <br>
rather than n
.
Putting it all together
Let’s put these tips and tricks together now, including adding a column "total" along the bottom of the DataFrame.
The column total needs some extra leg work:
- We’ll get raw totals by taking sums.
- The average and conversion needs to be recalculated using the new totals.
- There’s no meaningful "total" calculation for the
Product
column, so we’ll replace that element with an empty string. - We’ll replace the
Month
entry with a missing value, so that we can overwrite it without complicating things (sneaky!).
# create a total "row" - i.e. column total
total = df.sum()
total['Month'] = pd.NaT
total['Product'] = ''
total['Average sale'] = total['Amounts'] / total['Numbers']
total['Product conversion'] = total['Numbers'] / total['Quotes']
total = total.to_frame().transpose()

It’s then pretty straightforward to add the total to the DataFrame using pd.concat
. We also write up a quick function to make the text in the total row bold.
# function to highlight the total row
def highlight_total(s):
r = pd.Series(data = False,index = s.index)
r['Month'] = pd.isnull(s.loc['Month'])
return ['font-weight: bold' if r.any() else '' for v in r]
Putting it all together, using our newly-concatenated DataFrame d
:
# stack and reset index
d = pd.concat([df,total],axis = 0)
d.reset_index(drop = True,inplace = True)
# apply formatting
d.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7',axis = 1)
.apply(highlight_total,axis = 1)
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
},
na_rep = 'Total'
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()
… and voila!

Not too shabby!
Pro tip: notice how we used a missing value in Month
, and the na_rep
argument to fill in the missing value with a string. This is an easy way to use a string on a date-time column.
Export the magic
If you’re still using PowerPoint for your work Presentations, then you’ll probably be familiar with the pain of taking screenshots of Jupyter notebooks and pasting them into your slides.
This is obviously horribly inefficient (not to mention boring), but there is some good news – you can export your tables to an image using code. Even better news is that there is a handy Python package to do just that – enter dataframe_image
²:
import dataframe_image as dfi
# style the table
d_styled = d.style
.set_properties(**{'text-align':'center'})
.apply(highlight_product,product = 'A',colour = '#DDEBF7',axis = 1)
.apply(highlight_total,axis = 1)
.format(
{
'Month':'{:%b %Y}',
'Quotes':'{:,.0f}',
'Numbers':'{:,.0f}',
'Amounts':'£{:,.0f}',
'Average sale':'£{:,.2f}',
'Product conversion':'{:.2%}'
},
na_rep = 'Total'
)
.set_caption('Sales data <br> Produced by Team X')
.hide_index()
# export the table to PNG
export_destination = r'C:Users...Presentations'
dfi.export(
d_styled,
os.path.join(
export_destination,
'styled_dataframe.png'
)
)

The resulting image is slightly different to what we see in the notebook, but it is oh-so-good-looking (if I do say so myself!).
Wrapping up
Let’s wrap it up.
This one has been a bit of an image-heavy article, but it’s been necessary to demonstrate how we can use pandas
to format dates, absolute numbers, currencies, and ratios. We’ve also used dataframe_image
to export our fancy tables to image format.
We’ve covered quite a bit of ground here, but this is by no means an exhaustive discussion. Using some more advanced functionality and a bit of HTML can deliver some awesome-looking tables; I unfortunately am a bit of an HTML-newbie, so have quite quickly reached the limit of my capabilities. I’d recommend checking out the official documentation³ to get a flavour of what’s actually possible; you’ll also find a better explanation of the Styler
and the difference between apply
and applymap
there.
Lastly, whilst I’m new todataframe_image
, I found it really simple to use and it does – indeed – do what it says on the tin. nbconvert
users may never need to use it, but it’s a great tool to have in the back pocket.
Hopefully this article has set you up for a year of great-looking data tables. Let me know how you get your (tabular) points across— I’m always keen to hear of better ways to communicate ideas!
References and resources
- string – Common string operations – Python 3.11.1 documentation
- GitHub – dexplo/dataframe_image: A python package for embedding pandas DataFrames as images into pdf and markdown documents
- Table Visualization – pandas 1.5.2 documentation (pydata.org)
A useful summary of date formats: datetime – Basic date and time types – Python 3.11.1 documentation.