The world’s leading publication for data science, AI, and ML professionals.

How to calculate a percentage of total in BigQuery using SQL

Compare the contribution of a row to the overall sum of all rows

Photo by Karim MANJRA on Unsplash
Photo by Karim MANJRA on Unsplash

Why use a percentage of the total?

Let’s say you have an e-commerce website that sells coffee, and you have three different categories of products: Arabica, Robusta, and Liberica.

On a report, you see that for June 2022, you sold for 345k€ of Arabica, 112k€ of Robusta, and 13k€ of Liberica. In absolute terms, you notice that Arabica is your top revenue maker, but as a percentage of the total, it represents 73% of your sales for this month.

The idea is to look at the contribution of a row to the overall sum of all rows

To avoid manually computing this percentage by dividing one value by the total sum of all values, we will learn how to use SQL to assist us, and help answer the following questions:

  • What products make the greatest contribution to our company’s revenue?
  • How has our country’s market share evolved over time?

In this article, our data comes from the Google Merchandise e-commerce website. For our analysis, our table contains products sold, product names and categories, and the revenue generated.

Our base data table. (Image by Author)
Our base data table. (Image by Author)

Contribution of product categories to the overall revenue

For our first business question, we want to know how much each of our product categories contributes to our total revenue.

To begin, we need to calculate the total revenue for each of our product categories.

As a result, we get a table with one row for each product category and revenue generated.

Total revenue for each product category. (Image by Author)
Total revenue for each product category. (Image by Author)

Our online shop has a total of 21 categories, and if we make the sum of the revenue (the sum of the revenue_per_category field), we get 1.74 million (which is our total webshop revenue).

Now, to obtain a percentage of the total, we need to divide the revenue of each category by the total revenue.

In order to achieve that, without using a JOIN clause, you can use an OVER() clause. Let’s dive into the details:

At the top, in our WITHclause, we have a first query that computes the revenue per category. In our main statement, we have two lines after the * :

  • The first line computes the overall revenue (you actually don’t need it, it is only here to highlight the syntax and show how to use the window function)
  • The second line computes the share of revenue per category.

You can notice that we use a SAFE_DIVIDE function in case we would make a division by 0 (meaning we would have 0 as a total revenue). This function will return a NULL value for the share of the revenue instead of throwing a "divide by 0" error.

Share of revenue for each category. (Image by Author)
Share of revenue for each category. (Image by Author)

This is how you can plot it in Data Studio with a pie chart:

The share of each product category to our total revenue. (Image by Author)
The share of each product category to our total revenue. (Image by Author)

The Apparel category is the top contributing category, alongside Other and Office.

We can observe that we have two distinct categories Other and others.

  • Other is the category that is labeled in our data as "Other".
  • others is because Data studio truncates the results when there are too many data points. Here we only display 10 slices, but we have 21 categories. By doing so, Data Studio groups all remaining categories into a single one called others.

Contribution of product categories to revenue over time

For our second business question, we would like to know how the revenue share of our product categories is changing over time.

For that, we will compute the revenue for each product category per month, and then, compute a percentage of the total for each product category within that month.

First, we need to calculate the total revenue of each of our categories over time (in our case every month). We add a second GROUP BY per month.

As a result, we get one row for each product category, each month, and the revenue generated.

Total revenue per month for each of our categories. (Image by Author)
Total revenue per month for each of our categories. (Image by Author)

To look at the change over time, instead of looking at 21 categories, let’s zoom in on the top 4 categories, which we’ve seen in our first business question: Apparel, Other, Office, and Drinkware.

You can notice that we created a sub-query that encapsulates our ratio field, to filter the desired categories after we calculate the percentage of total per month. If we would have filtered before, the total revenue would have only represented the four categories.

Our window function in this query is quite similar to the one used in the previous part of this article, except that our OVER() clause now divides each month into partitions.

This leads to the following results:

Percentage of the total revenue for our top 4 categories over months. (Image by Author)
Percentage of the total revenue for our top 4 categories over months. (Image by Author)

We can observe the Other category was certainly a tracking error or was mislabeled because it doesn’t come back over the month and was the only category represented in August-September 2016.

We do see the three categories being over time 60% of our shop revenue, whereas, in January 2017, the "Office" category contributed more than usual.

Conclusion

If we learn this method in SQL, it’s to save manual calculation time and be able to scale this calculation for many categories/dimensions.

It’s also because not all data warehouse systems offer an effective way to compute a ratio or a share of the total. It is not the case for Bigquery.

Other systems, such as Amazon Redshift or Snowflake have a window function called ratio_to_report() that computes what we’ve seen in this article. Data visualization systems, such as Tableau or QlikSense, also make calculating a percentage of the total a bit easier.

Thank you for reading; I hope this article gave you the key to computing and understanding the method behind it.


Related Articles