The world’s leading publication for data science, AI, and ML professionals.

Automate Your Excel Report By Creating A Summary Sheet Using Python

Harmonizing Python and Excel to summarize your data with presentable and professional formatting.

Python + Excel

How to harmonize Python and Excel to summarize your data with a presentable and professional formatting. A way to improve your Excel reporting.

Photo by Luca Bravo on Unsplash
Photo by Luca Bravo on Unsplash

Introduction

Creating a unique and presentable summary page is the final step to any Excel based reporting work. Without this, you are left with sheets of numbers and formulas with no easy to draw conclusions from your underlying data. By creating a summary sheet, viewers of your workbook will instantly be able to understand the data in an aesthetically pleasing format.

Previously in this Python + Excel series, you learnt how to format an Excel spreadsheet using Python. This time we will look at how to use the formatting code to make a summary sheet/dashboard within Excel.

In my last article I wrote some reasoning into why you would want to use Excel with python, how python with pandas and xlwings could greatly improve your Excel heavy workflow, and included a short tutorial with examples to get you started on stylizing your Excel spreadsheet.

You can find all this information in the link below; it might be a good starting point as the same topics will not be covered in as much detail in this article.

Stylize and Automate Your Excel Files with Python

My goal for this article is to help you summarise your data while using the Excel/python formatting skills learnt in my last article. We will be pivoting, grouping and sorting data. Followed by charting, stylizing and adding a logo to a summary page. These skills combined will enable you to summarize and format your data in one simple python script. Hopefully this article is as helpful as the last!


Summarizing Data

If your workflow is Excel heavy, I am sure you have workbooks stretching over multiple tabs, many sheets of PivotTables and even more formulas cross referencing every sheet. This is great … if your goal is to confuse everyone who tries to understand your spreadsheets.

A benefit of using Python to summarize your data is the ability to store your PivotTables and other data, as a DataFrame within a single variable. Referencing in this way is easier to debug rather than attempting to trace numbers through multiple tabs in a spreadsheet.

pandas – this module is essential to successfully summarising your data.

According to the pandas site:

"pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool"

In short, pandas contains functions which will do all the data analysis you usually do in Excel. Here are a few of the functions you will find interesting coming from an Excel-based background and documentation to each are hyperlinked if you would like more information:

These are a few examples from the multitude of pandas functions available to manipulate or quickly summarise your data.


Steps to creating your summary sheet

As previously mentioned, the code to format your spreadsheet in this tutorial has been explained in detail in my previous post "Stylize and Automate your Excel Files with Python". This tutorial will build upon the previous with a few new functions to create a quick, presentable and comprehensive summary sheet.

I have created a test dataset of fruit and veg sales in csv format to use for both this and the previous tutorial.

Data can be downloaded from Github: link

  1. The first step is to import the modules we will be using. Almost all of the tutorial will be done using pandas and xlwings.
import pandas as pd
import numpy as np
import xlwings as xw
import matplotlib.pyplot as plt
  1. Import the csv data into a DataFrame using Pandas. There are two ways of doing this, directly reading from Github or downloading to your local drive and referencing. Both are mentioned in the full script. The snippet below will read directly from Github.
df = pd.read_csv(r"https://raw.githubusercontent.com/Nishan-Pradhan/xlwings_dashboard/master/fruit_and_veg_sales.csv")
  1. The following steps initialize an Excel Workbook, rename Sheet1 and copy our DataFrame to Excel.
wb = xw.Book()
sht = wb.sheets["Sheet1"]
sht.name = "fruit_and_veg_sales"
sht.range("A1").options(index=False).value = df
  1. Following on from this, we want to create a new sheet named ‘Dashboard’ and reference the sheet.
wb.sheets.add('Dashboard')
sht_dashboard = wb.sheets('Dashboard')

We now have an Excel Workbook which contains two sheets. ‘fruit_and_veg_sales’ has our data and ‘Dashboard’ is blank.

  1. We will now start using pandas to manipulate our data and produce summaries of our data to include on our blank sheet. The first step to take is to check the names of the columns we have in our dataset. A quick shortcut to see all of our column names in Python is to run the following line: print(df.columns)
df.columns | Image by Author
df.columns | Image by Author

From here we can see that many of the column names are awkward containing spaces, brackets, US Dollar symbols, capital letters and brackets.

To prevent us from making trivial errors and to save a great deal of time debugging, it would be wise to copy and paste these column names when referring to them with pandas. (renaming the columns might be preferable in a larger scale project)

  1. The first summary we will create for our Dashboard sheet will be a pivot of our data showing the Total Profit per Item sold. To do this we will take advantage of pandas’ pd.pivot_table() function.
pv_total_profit = pd.pivot_table(df, index='Item', values='Total Profit ($)', aggfunc='sum')

Here we create a new DataFrame called pv_total_profit. This DataFrame has an index containing one of each value in our Item column. The values shown are from the ‘Total Profit ($)’ column in our data and the final input into our function we specified was aggfunc='sum', this tells Pandas that we would like to aggregate our data by summing our valuescolumn, ‘Total Profit ($)’ in this case.

pv_total_profit DataFrame Pivot | Image by Author
pv_total_profit DataFrame Pivot | Image by Author
  1. We now do the same step again, only this time we want to pivot our data to show us Quantity of each Item sold.
pv_quantity_sold = pd.pivot_table(df,index='Item',values='Quantity Sold',aggfunc='sum')
  1. You could make an entire dashboard or report of simple pivots and that will look great, however, to make this tutorial more interesting we will use a grouping function: df.groupby() will group your data however you specify.

Since our data pertains to sales figures over the past year, it might be useful to see our data grouped by months rather than days. df.groupby()will achieve this, but first we need to make sure our "Date Sold" column is actually being read by pandas as a date.

To check this run print(df.dtypes)

You should be able to see from this, our "Date Sold" column is being read as an object and not in datetime format. This means we will not be able to group our DataFrame easily by month yet.

We can change the data type of this column to datetime like so:

df["Date Sold"] = pd.to_datetime(df["Date Sold"], format='%d/%m/%Y')

Here, pd.to_datetime() formats our column and we specify the format our raw data is in to ensure it is correctly converted from an object to datetime. If you run df.dtypes once again, you will now see that "Date Sold" is in datetime64[ns] format, this is what we need for our grouping to work.

Data Types: Before (left), After Conversion (Right) | Image by Author
Data Types: Before (left), After Conversion (Right) | Image by Author
  1. With our data now in the correct format, we can use the following line of code to sum and aggregate our data, as well as display the relevant columns.
gb_date_sold = df.groupby(df["Date Sold"].dt.to_period('m')).sum()[["Quantity Sold",'Total Revenue ($)',  'Total Cost ($)',"Total Profit ($)"]]

Our data grouped by month now looks like this:

gb_date_sold | Image by Author
gb_date_sold | Image by Author
  1. One final groupby will give us a fourth dataset to use for our dashboard.
gb_top_revenue = (df.groupby(df["Date Sold"]).sum().sort_values('Total Revenue ($)',ascending=False).head(8))[["Quantity Sold",'Total Revenue ($)','Total Cost ($)',"Total Profit ($)"]]

This groupby is showing us our Top 8 days by Total Revenue. We get this by sorting our DataFrame by ‘Total Revenue’ in descending order (Highest revenue at the top), then we use head(8) to give us the top 8 lines of our sorted data.

  1. Now that we have our 4 summaries of data, we can move on to the creation of our report. First we start off with some static formatting.
# Background
sht_dashboard.range('A1:Z1000').color = (198,224,180)
# A:B column width
sht_dashboard.range('A:B').column_width = 2.22
# Title
sht_dashboard.range('B2').value = 'Sales Dashboard'
sht_dashboard.range('B2').api.Font.Name = 'Arial'
sht_dashboard.range('B2').api.Font.Size = 48
sht_dashboard.range('B2').api.Font.Bold = True
sht_dashboard.range('B2').api.Font.Color = 0x000000
sht_dashboard.range('B2').row_height = 61.2
# Underline Title
sht_dashboard.range('B2:W2').api.Borders(9).Weight = 4
sht_dashboard.range('B2:W2').api.Borders(9).Color = 0x00B050
# Subtitle
sht_dashboard.range('M2').value = 'Total Profit Per Item Chart'
sht_dashboard.range('M2').api.Font.Name = 'Arial'
sht_dashboard.range('M2').api.Font.Size = 20
sht_dashboard.range('M2').api.Font.Bold = True
sht_dashboard.range('M2').api.Font.Color = 0x000000
# Line dividing Title and Subtitle
sht_dashboard.range('L2').api.Borders(7).Weight = 3
sht_dashboard.range('L2').api.Borders(7).Color = 0x00B050
sht_dashboard.range('L2').api.Borders(7).LineStyle = -4115

The subheadings in the code snippet above should explain what each part of the code is doing.

  1. I have created the following function to manually create a specific table format. It is quite long so I will break down what it is doing in this step.

    The function above takes 4 inputs, header_cell, title, df_summary, color.

  • header_cell is a String referring to the top right cell where you would like to place your DataFrame summary (e.g. "B5").
  • title is a String of what you would like to title your summary (e.g. "Top 8 Days of Revenue).
  • df_summary is the Pandas DataFrame you wish to format and place on your Excel Dashboard page.
  • color is a String referring to the predefined color in the function (e.g. "blue").

In the function, we first define a dictionary of colors.

colors = {"purple":[(112,48,160),(161,98,208)],
"blue":[(0,112,192),(155,194,230)],
"green":[(0,176,80),(169,208,142)],
"yellow":[(255,192,0),(255,217,102)]}

Here we have named 4 colors, purple, blue, green and yellow. For each color, there are two shades, a darker shade and a lighter shade both in RGB format within tuples. Any colors can be added here if you would like to expand on this range!

The rest of the formatting is dynamically referencing the header_cell we have specified in the input of the function and automatically formatting the remainder of the table for you.

  1. Next we call the function 4 times, once for each of our DataFrame summaries we created.
create_formatted_summary('B5','Total Profit per Item', pv_total_profit, 'green')
create_formatted_summary('B17','Total Iteams Sold', pv_quantity_sold, 'purple')
create_formatted_summary('F17','Sales by Month', gb_date_sold, 'blue')
create_formatted_summary('F5','Top 5 Days by Revenue ', gb_top_revenue, 'yellow')
  1. Finally, we make a chart using Matplotlib and pandas .plot() function (which calls on Matplotlib)
# Makes a chart using Matplotlib
fig, ax = plt.subplots(figsize=(6,3))
pv_total_profit.plot(color='g',kind='bar',ax=ax)
# Add Chart to Dashboard Sheet
sht_dashboard.pictures.add(fig,name='ItemsChart',
    left=sht_dashboard.range("M5").left,
    top=sht_dashboard.range("M5").top,
    update = True)

The first section above creates a bar chart, with green bar colors.

Then sht_dashboard.pictures.add() xlwings function, allows us to place an image of this chart on our Excel Dashboard. We specify where we want to place it with the left and top arguments. name will also give our Image a name in Excel.

Bonus

As an extra to make your Excel Dashboard look more professional, we can even add a logo. The code below will download a small example logo in png format from the Github repository made for this tutorial, save it, add it to our Excel Dashboard and then resize it. Make sure you have specified a desired location to save the png image in the FOLDER_PATH variable. (At the top of the script on Github)

import requests
FOLDER_PATH = r"path_to_save_folder" # r"C:UsersNameDownloads"
image_url = r"https://github.com/Nishan-Pradhan/xlwings_dashboard/blob/master/pie_logo.png?raw=true"
r = requests.get(image_url, stream = True)
image_path = rf"{FOLDER_PATH}logo.png"
# Saves image to image_path above
file = open(image_path, "wb")
file.write(r.content)
file.close()
# Adds image to Excel Dashboard
logo = sht_dashboard.pictures.add(image=image_path,
    name='PC_3',
    left=sht_dashboard.range("J2").left,
    top=sht_dashboard.range("J2").top+5,
    update=True)
# Resizes image
logo.width = 54
logo.height = 54

We have now created our dashboard summary for the dataset! Don’t forget to save it by running:

wb.save(rf"{FOLDER_PATH}fruit_and_veg_dashboard.xlsx")
Formatted Summary Sheet using xlwings | GIF by Author
Formatted Summary Sheet using xlwings | GIF by Author

Conclusion

With not much code, we have created an aesthetically pleasing dashboard which quickly summarizes our data. The advantage of doing this in Python is the easy-to-read syntax, the relatively few lines of code and the use of a combination of third-party modules which can save us vast amounts of time in summarising our data.

Using Python to format your Excel reports can be an intermediary step between manually creating an Excel report (or using VBA) and completely replacing Excel with an alternative reporting software. Being able to interact with Excel at this level of granularity allows you to speed up your workflow, automate your reports and improve your Python Coding skills, all while keeping a familiar output for end users.

This code should be enough to get you started; you can customize it in any way you want. Try changing the colors, borders, titles, background color, logo, almost anything mentioned here can be altered to match your desired style.

Bonus points to anyone who creates a function to arrange these formatted summaries programmatically!


Code used in this tutorial is available on Github _here_

If you are stuck, check out the xlwings docs here

If you have any further questions, please leave them in the comments section below.

Note: This article represents my personal views and experiences only.


One Useful Way to Track the End of Lockdown Using Python


Related Articles