The world’s leading publication for data science, AI, and ML professionals.

Spice up your excel with python

Get familiar with python and bring your Excel repor to the next level.

1/2: upload, manipulate and save data

image from Lukas Blazek
image from Lukas Blazek

Are you using Excel to manipulate raw data and generate regularly (daily, weekly…) reports? Maybe you already step up using VBA macros in order to automatise them. Then why use python? If the volume of data you manage is big (hundred thousands of lines) or if you extract your data from different sources (html, csv, json, sql database) and want to share the final results in one single excel report. There are several reasons to spice up your excel report using python, and in this article, I will share with you a couple of my favorite tips.

This article is the first part where I will share a sample of sales report refreshed using a python script. In the next one, I will use the same sample to create a full interactive report in Excel without using Pivot table neither Excel formula.


Let’s work with a concrete example. I propose you to upload a dataset from Kaggle (https://www.kaggle.com/carrie1/ecommerce-data). You can also upload the file from my github (https://github.com/Aymen-lng/sales_report). It is a 540000 line csv file from an e-commerce site.

Sales report dashboard

I created a sample of monthly sales report (you can find it on my github). It shows the total sales of the e-commerce company distributed by country. The average billing by customer, the top 10 countries and top 10 customers. And with a set of slicers we can also navigate by periods (years, quarters and months).

This Dashboard is feed by a sheet with the manipulated data source, another sheet pivoting it in order to get the slicers and world map.

Sheet of manipulated data
Sheet of manipulated data
Calculation sheet with pivot table
Calculation sheet with pivot table

The data source and Excel refresh.

The CSV file uploaded from Kaggle looks like this: 8 columns presenting the Invoice number, Stock Code, Stock description, Quantity sold, Invoice date, Unit price, Customer ID and country.

We could upload the data in excel using data assistant. Once uploaded, the problem will be applying a set of formulas and pivot tables to these half million of lines without crashing excel. The issue can be expecting by using a VBA macro.

This is why we will use python, in order to mine the data, manipulate it, write the result into the Excel report and refresh it.


Python libraries

Import data with the powerful pandas:

Pandas is one of the most popular data analysis python library. Its name comes from the name "Panel Data" (multidimensional data analysis term used in statistics and econometric).

We already imported the library

Then you can import your data and create your data frame. Our uploaded data is coming from data.csv .

df = pd.read_csv(data.csv)

But what if the excel file is not on the default path used by Python. In this case, you should import "os", another python module that will allow you to navigate into your os path and set the one where your excel file is saved.

Another simple way is to use variables:

Now let’s check the shape (number of rows and columns) of the dataset.

my_dataframe.shape

(541909, 8)

541 909 lines. This is the main reason to not manipulate the data directly in Excel. If you start making Vlookup, PivotTable, Filter, SumIf, Copy, Past etc. directly in Excel you will need or a computer with a serious set up, or a high level of patience.

Manipulate the data frame

First of all, we will calculate the Revenue by multiplying the quantity by the unit price and directly store the result in a new column called "revenue":

Then we need to isolate the month from the invoice date. We create a variable "Month" to store all the invoices date. Then On the same variable I select the first value of the slash split "/", for all invoices date (I use a for…in loop). Finally, I create a new column "Month" in the data frame where I store variable’s values.

The excel formula equivalent would be "MONTH(InvoiceDate"). It looks more simple in Excel, but there are two elements to take in consideration here. 1) In python there are several ways to isolate the month, I decided to use one without importing another library just for this action. But you will quickly learn that for each action, you have an appropriate library with a fast way to do it. 2) In Excel even a simple formula as "MONTH" can become a hell if you have to copy and paste it in 500k lines.

Let’s do the same with the year:

Year = sales_report_df["InvoiceDate"].values
Year = [my_str.split("/")[2] for my_str in Year]
Sales_report_DB["Year"] = Year

hmm…. by isolating the year, we included the hour of the invoice. As I said earlier there are tons of way to do it. So here or we can split "InvoiceDate" into 2 columns, one with the date one with the hour; or we can use "pandas" that we already imported. Let’s use a pandas’ tool to isolate the year.

Better here
Better here

By moving to the quarter calculation, we will do a step forward into the advantage of python method. If we have a formula "MONTH" or "YEAR" in Excel, there is no formula "QUARTER". So even in excel, we will have to calculate the quarter, and this is the ending point of the sales report if it has to be done in Excel. Here below a sample of how we can calculate the quarter. Without explaining it in details the "why" of this formula we can easily understand that a copy past of it in 500k lines might be long.

LOOKUP(J26,{1,1;2,1;3,1;4,2;5,2;6,2;7,3;8,3;9,3;10,4;11,4;12,4})

In python, we will use the module date time imported

And here is the final data frame with the summary of code:

Interact with Excel using xlwings

With python there are several ways to interact with Excel (pandas, openpyxl, xlsxwriter etc.) and the recent join of Guido van Rossum (creator of python) into Developer division of Microsoft is sign of a good future for this language.

In this article, I will use xlwings which is my favorite one. An amazing tool well connected to Excel enable as an add ins and able to communicate with Vba!

import xlwings as xw

The first step is creating a variable for the Dashboard workbook and another one for the specific sheet of the workbook where we want to pass the data frame

Now depending on the version of OS, python and libraries you are using, you will be able to past the data frame with 1 line of code or passing through a function. By default, some python tools have a time out security, meaning that if a part of the code is running for too many time, the code break automatically with a message error "timeout error". As we are pasting a big amount of data you might face the issue, but I will also show you a way to deal with it.

If you don’t have a timeout error, you can just:

sheet.range("A1").value=df

If you face a timeout error then I propose you below function that will split the data frame into pieces of 50000 lines and past them one by one. The function print a message to inform you on which line we are.


And here we are. All data uploaded into the Dashboard. Remains just a simple refresh of both pivot table in sheet "Calculation".

Manual refresh
Manual refresh

hm….. But why refresh manually the pivot table when we did everything automatically since the beginning. As said before there are several ways to interact with Excel. As an introduction to the next article where we will totally integrate the dashboard construction into the python script, I will use here a way to code in VBA the refresh and then run the macro with python.

Let’s open the Visual Basic Developer and create a module "refresh".

Sub resfresh()
ThisWorkbook.Sheets("Calculation").PivotTables("PivotTable1").RefreshTable
ThisWorkbook.Sheets("Calculation").PivotTables("PivotTable2").RefreshTable
End Sub

And now we can add the call of this macro in our python script.

And here we are. The dashboard is refreshed, all amounts changed.

Let finish by saving and close the dashboard


To conclude this first part, I highly recommend you start using python if you deal with more 100k lines in your reports (before you can still use VBA) and/or if you have to combine data from different sources or format.

Instead of uploading data into Excel and apply formulas, Pandas remains a great tool to perform most of the calculation steps. There are tons of articles that show how to do with pandas what you use to do with Excel! Just google it.

Finally, a lot of tools permit manipulation of Excel with python, but my favorite one is by far xlwings. In the next part of this article, I will show you why by totally integrating the python script into the Dashboard.


Full code:


Related Articles