The world’s leading publication for data science, AI, and ML professionals.

Mastering Pivot Tables in dbt

This function will save you hours of pain writing SQL.

Photo by Nkululeko Jonas on Unsplash
Photo by Nkululeko Jonas on Unsplash

How many times have you had to put together an overly complex solution to solve a simple problem? If you’ve ever coded before, you can probably list at least a handful of instances. For me, a few of these times have revolved around pivot tables. They sound so straightforward yet are complicated to write in SQL.

Luckily, I recently discovered that Dbt has a function within dbt utils that makes creating pivot tables fast and simple. No complex aggregates or joins are needed!

What is a pivot table?

First, let’s start off with what a pivot table is. No, it’s not an Excel pivot table. It’s a function that turns unique values in a column into their own column, filling in the corresponding cell for that row with a boolean value.

For example, pivoting data that looks like this…

Image by author
Image by author

… would result in data that looks like this…

Image by author
Image by author

Pivot table use cases

This is a great way to aggregate and calculate whether certain characteristics exist in your data. I needed to pivot a column full of different wine characteristics when I discovered this trick. I wanted a column for each characteristic and a value telling me whether that characteristic was present or not for that wine.

Using pivot tables is also ideal for ensuring you have a unique primary key in your table. Often, when you have columns that contain different attributes that can exist for multiple ids, your primary key no longer becomes unique. You end up with multiple rows with the same id because of one column that creates duplicates.

Still not understanding?

Let’s look at this one wine. See how the values in each column are the same EXCEPT for the wine_features column? Because this wine is dry, organic, and biodynamic, a new row is created in order to represent each of these features.

Image by author
Image by author

This usually occurs due to the way you need to join tables in your database together. Unfortunately, pivot tables are your only way to avoid this issue and keep your data models clean.

How do you pivot tables using dbt?

dbt contains a function called dbt_utils.pivot that helps you pivot your tables using one simple line of code.

It looks like this:

{{ dbt_utils.pivot('<column name>', dbt_utils.get_column_values(ref('<table name>'), '<column name')) }}

You simply specify the column name that you want to pivot. Make sure you include it in single quotes.

You also specify the name of the table this column is in. Again, make sure you include this in single quotes.

Unfortunately, this needs to be a permanent table within your database. You can’t reference a table created by a CTE which is a bummer. I had to create a permanent table with the tables I needed to be joined together to then reference in this pivot function. Just make it its own intermediary model.

When using the pivot function in an actual query, it should look something like this:

select
   wine_name,
   {{ dbt_utils.pivot('wine_features', dbt_utils.get_column_values(ref('wine_features_joined'), 'wine_features')) }}
from {{ ref('wine_features_joined') }} 
group by wine_name

You can see that the function replaces where a column name would normally go in your Sql query. The columns that you are grouping by are the only other columns that should be selected.

Also note that I am using a {{ ref() }} function because I am selecting from a permanent table that already exists as a data model within my dbt environment. No selecting from CTEs!

If we start with a table of wines and their different features that look like this…

Image by author
Image by author

Running the query I showed above with the pivot function will cause the table to look like this:

Image by author
Image by author

Notice how the features that were once values in the wine_features column are now the column headers. The boolean values in these columns show whether that feature was present in the original column.

One other thing to keep in mind…

Often times the data types of pivoted columns will be hard to work with. Make sure you cast them to the appropriate data type and rename them as you see fit. Here you can see that the pivoted column’s data type is a number. Obviously, this is wrong.

Image by author
Image by author

I had to recast and rename my columns by doing this for each of my columns:

wine_features."organic" AS is_organic

The quotes allow you to select the column in Snowflake despite the data type being different than expected.

Conclusion

You are now ready to pivot tables using the dbt utils pivot function! It will make your life a whole lot easier and your code way cleaner. Stay tuned for more articles on the handy functions provided to you by dbt in their utils package.

Until then, learn more about dbt and the modern data stack by subscribing to my newsletter.


Related Articles