
Why use a percentage of the total?
Let’s say you have an e-commerce website that sells coffee, and you have three different categories of products: Arabica, Robusta, and Liberica.
On a report, you see that for June 2022, you sold for 345k€ of Arabica, 112k€ of Robusta, and 13k€ of Liberica. In absolute terms, you notice that Arabica is your top revenue maker, but as a percentage of the total, it represents 73% of your sales for this month.
The idea is to look at the contribution of a row to the overall sum of all rows
To avoid manually computing this percentage by dividing one value by the total sum of all values, we will learn how to use SQL to assist us, and help answer the following questions:
- What products make the greatest contribution to our company’s revenue?
- How has our country’s market share evolved over time?
In this article, our data comes from the Google Merchandise e-commerce website. For our analysis, our table contains products sold, product names and categories, and the revenue generated.

Contribution of product categories to the overall revenue
For our first business question, we want to know how much each of our product categories contributes to our total revenue.
To begin, we need to calculate the total revenue for each of our product categories.
As a result, we get a table with one row for each product category and revenue generated.

Our online shop has a total of 21 categories, and if we make the sum of the revenue (the sum of the revenue_per_category
field), we get 1.74 million (which is our total webshop revenue).
Now, to obtain a percentage of the total, we need to divide the revenue of each category by the total revenue.
In order to achieve that, without using a JOIN
clause, you can use an OVER()
clause. Let’s dive into the details:
At the top, in our WITH
clause, we have a first query that computes the revenue per category. In our main statement, we have two lines after the *
:
- The first line computes the overall revenue (you actually don’t need it, it is only here to highlight the syntax and show how to use the window function)
- The second line computes the share of revenue per category.
You can notice that we use a SAFE_DIVIDE
function in case we would make a division by 0 (meaning we would have 0 as a total revenue). This function will return a NULL
value for the share of the revenue instead of throwing a "divide by 0" error.

This is how you can plot it in Data Studio with a pie chart:

The Apparel category is the top contributing category, alongside Other and Office.
We can observe that we have two distinct categories Other and others.
- Other is the category that is labeled in our data as "Other".
- others is because Data studio truncates the results when there are too many data points. Here we only display 10 slices, but we have 21 categories. By doing so, Data Studio groups all remaining categories into a single one called others.
Contribution of product categories to revenue over time
For our second business question, we would like to know how the revenue share of our product categories is changing over time.
For that, we will compute the revenue for each product category per month, and then, compute a percentage of the total for each product category within that month.
First, we need to calculate the total revenue of each of our categories over time (in our case every month). We add a second GROUP BY
per month.
As a result, we get one row for each product category, each month, and the revenue generated.

To look at the change over time, instead of looking at 21 categories, let’s zoom in on the top 4 categories, which we’ve seen in our first business question: Apparel, Other, Office, and Drinkware.
You can notice that we created a sub-query that encapsulates our ratio
field, to filter the desired categories after we calculate the percentage of total per month. If we would have filtered before, the total revenue would have only represented the four categories.
Our window function in this query is quite similar to the one used in the previous part of this article, except that our OVER()
clause now divides each month into partitions.
This leads to the following results:

We can observe the Other category was certainly a tracking error or was mislabeled because it doesn’t come back over the month and was the only category represented in August-September 2016.
We do see the three categories being over time 60% of our shop revenue, whereas, in January 2017, the "Office" category contributed more than usual.
Conclusion
If we learn this method in SQL, it’s to save manual calculation time and be able to scale this calculation for many categories/dimensions.
It’s also because not all data warehouse systems offer an effective way to compute a ratio or a share of the total. It is not the case for Bigquery.
Other systems, such as Amazon Redshift or Snowflake have a window function called ratio_to_report()
that computes what we’ve seen in this article. Data visualization systems, such as Tableau or QlikSense, also make calculating a percentage of the total a bit easier.
Thank you for reading; I hope this article gave you the key to computing and understanding the method behind it.