[Since writing this post, Google has released official PIVOT and UNPIVOT functions. However, you may still find the post interesting as an example of using scripting.]
A common requirement when analysing data is the need to take information stored in separate rows and flip it to so that it is displayed in columns. This is often referred to as pivoting. In a spreadsheet, it’s really simple: you just choose Pivot Table from one of the menus, select the fields you want to see and a few other options, and the job is done. Things are much less straightforward when you are working with databases.
This post works through an example to provide a step-by-step guide to using the scripting features in Google’s Bigquery to pivot data.
The source data
We will use an example from the public datasets hosted by Google. The table bigquery-public-data.covid19_open_data_eu.covid19_open_data
contains statistics about COVID case numbers and deaths by date since the start of 2020. The table includes a raft of statistics for countries, and regions within countries, by day, including figures for the number of deaths, hospitalisations, and confirmed cases. Our aim will be to produce a time series chart that compares the daily death counts for the 5 countries with the highest cumulative total number of deaths.
The following query finds this top 5, and then selects the daily counts for these countries.
Note that the underlying table includes rows for regions and cities within countries. The aggregation_level = 0
ensures we only include rows containing data for each country as a whole.
The query produces a table which looks like this.

We want to pivot these results to produce a table with one row per date, and one column for each of the 5 countries.
In outline, the way you do this in Sql is as follows:
- Group by date and country name to ensure you have at most one row for each pair of countries and dates, and aggregate (e.g. SUM, AVG) the value you want to present.
- For the field we want to pivot on (i.e. country name), create a set of
CASE WHEN
statements which map the values for a given country to a new column. So, for Brazil we would have something like,CASE WHEN country_name='Brazil' THEN total_deceased END AS brazil
. - Aggregate each of these statements using either MAX or SUM whilst grouping by date to create a single row per date.
This approach will always work, but it has two potentially significant drawbacks. First, if your pivot field contains many different values, then you will need to write out lots of SQL code. This is both time consuming and creates code which is difficult to read and maintain.
More fundamentally, in order to write the SQL code, you need to know all the distinct values from the pivot field in advance. In our example, the set of countries with the highest death count may change over time, and we want to avoid having to manually update the query code each time we want to run our analysis. Equally, we might want to look at the top 10 or 20 countries, and do so without having to rewrite the majority of the query.
Scripting to create Dynamic SQL
The solution to our problem in BigQuery is scripting, which allows us to generate the text of a SQL query dynamically. We can use a loop to create the repetitive CASE WHEN statements from step 2 above, and this allows us to be flexible about the number and naming of the columns produced by the pivot.
In conjunction with stored procedures and user defined functions, we have a very powerful set of tools to create reusable functions to improve the way we process and analyse data in BigQuery. The example below solves just for this specific pivot problem, but it would be relatively easy to generalise to create a reusable pivot function.
We will build our script according to the following outline:
- Fetch the top 5 country names and store in an array;
- Use a loop to generate the CASE WHEN statements for these 5 countries;
- Add these statements to a template SQL query and run.
Fetch top 5 country names
We start by running a simple query to fetch the top 5 countries. However, we are going to do this in a script, and store the results of the query in a variable countries
. Variables need to be declared at the start of the script, and so we end up with something like this:
Note that the ‘processing location’ for this query needs to be set to EU. This produces the following results.

Generate CASE WHEN statements
The next part of our script creates a CASE WHEN statement for each of these country names. The loop is straightforward, but we need to be careful escaping quote marks and in generating valid field names. The only valid characters in a BigQuery field name are underscore, a-z, A-Z, and digits 0–9 but not as the first character.
The following script defines a few test examples of names that could lead to illegal field names to check the regex works OK.
The script produces the following string (I’ve added line breaks added to make it easier to read):
SUM(CASE WHEN countries_name='U. S. A.' THEN new_deceased ELSE 0 END) AS U__S__A_,
SUM(CASE WHEN countries_name='bad & name' THEN new_deceased ELSE 0 END) AS bad___name,
SUM(CASE WHEN countries_name='123 something else' THEN new_deceased ELSE 0 END) AS piv_123_something_else,
We have created valid field names, and the strings are correctly enclosed in single quotes. Note that this doesn’t deal with any edge cases – for example, if the pivot field itself contains a quote character our code will not work.
Add the dynamically generated text to template SQL
We can now combine the results of the two previous steps, and insert the resulting string into a query selecting data from the relevant source table:
This gives us the results that we want – a table with a single row per date, and one column for each of the top 5 countries. It’s then simple to import this small set of results into Google Sheets to plot the time series.

Final thoughts
Although the code above is specific to this example, it will easily generalise to other data. Hopefully, this post also serves as an introduction to scripting if you haven’t come across it before.
Scripting is a hugely powerful extension to BigQuery. In my work as a data scientist, it means that more pipeline processing can be completed entirely within BigQuery – rather than having to extract data to Python before returning it to a BQ table. For example, I have recently been using the techniques discussed above to convert word embeddings stored in arrays into columns in order use them as features in a BigQuery ML classification model. More on that in another post perhaps.