Jinja/DBT SQL Templating in JupyterLab/VSCode

Quickly prototype your SQL templates

Jean-Claude Cote
Towards Data Science

--

Photo by Joanna Kosinska on Unsplash

SQL itself doesn’t lend itself well to reusability. To achieve reusability SQL is often templated using libraries like Jinja. For example Apache Superset leverages Jinja templating in its Dataset definitions and of course DBT is a tool built entirely around Jinja templates. Jupyterlab-sql-editor natively supports Jinja templating making it possible to prototype jinja-sql templates for Superset and DBT right inside JupyterLab and VSCode.

Create reusable SQL using Jinja templates

The --jinja option enables Jinja templating support making any variable or function declared in your notebook available for substitution inside Jinja tags {{}}

For example you can use a variable to represent the actual table name used in your SQL statements. This can be useful when you have database schema for development and production tables.

In our previous article we saw that the %%sparksqlmagic supports many output formats; text, interactive grid, interactive JSON tree, html, skip. The %%sparksql magic has one more trick up its sleeve! Forgive my pun.

The --output sqloption renders the resulting SQL statement with syntax highlighting. It is a convenient way to validate your Jinja templated code. In this example we use a python list variable to automate the generation of a SELECT statement.

As we can see the Jinja {% for item in list %}has produced the desired columns.

Tapping into the power of DBT

Using Jinja templating to automate the generation of SQL statement is very powerful and in fact an entire project is built around this very idea. DBT is a framework to develop modular SQL models with SELECT statements and includes a ref()Jinja macro which handles dependency management (which models to build first).

DBT includes a plethora of Jinja macro aimed at simplifying the elaboration of SQL statements. DBT has a package management system allowing third party developer to contribute libraries of macros. There are lots of them on the DBT Hub

%%sparksql magic enables you to tap into a wealth of macros by simply replacing the--jinjawith the --dbt option and by configuring your notebook with the location of your DBT project. We will illustrate the use of DBT using the example project jaffle_shop.

In the jaffle_shop there is a model called orders.sql. We can place the content of this file inside a Jupyter cell and use the --dbt to process the template using DBT. We can use the —-output sqlto render the statement and the--output htmlto execute the query and display the results.

Notice the use of the DBT ref() macro. This macro refers to existing DBT models within your DBT project. %%sparksql actually uses the DBT framework to render the SQL template. All the DBT models and macros are made available to %%sparksql magic.

We can render the SQL produced by DBT. The output is pretty long, we only show a part of it. Notice the payment_method loop producing the expected columns. Also notice select * from _dbt__cte__stg_orders which is they way DBT handles the materialization of ephemeral models. More details here.

As seen before we can execute and display the results. Notice the DBT log outputs before the actual results are displayed.

Using %%sparksql is a convenient way to prototype your DBT code. We have illustrated the use of DBT using a pre-existing model. To create a DBT model from scratch you might want to simply display the data of existing tables (DBT sources) or pre-existing DBT models. Suppose you want to create a new model based on the stg_payments . You can start by showing the data in stg_payments

%%sparksql --dbt --output htmlselect * from {{ ref('stg_payments') }}

Then start transforming this dataset using DBT’s Jinja macros. All the while seeing the resulting output or rendering the SQL.

%%sparksql --dbt{% set payment_methods =
['credit_card', 'coupon', 'bank_transfer', 'gift_card'] %}
select
order_id,
{% for payment_method in payment_methods -%}
sum(case when payment_method = '{{ payment_method }}'
then amount else 0 end) as {{ payment_method }}_amount,
{% endfor -%}
sum(amount) as total_amount
from {{ ref('stg_payments') }}
group by order_id

Once you have a great templated query you can move it to production by simply copying the template into a DBT .sqlmodel file and be confident that it will actually do what you expect it to do.

Visual Studio Code DBT Power User

DBT Power User is a popular VSCode extension for DBT projects. It supports many DBT autocompletion features for your .sql model files.

Visual Studio Code supports working with Jupyter Notebooks natively and thus the %%sparksql magic works inside VSCode. As shown before all you need to do to leverage a DBT project in a notebook is to give the location of your DBT to %%sparksql

Once a notebook cell’s language is set to SQL the autocompletion of DBT Power User will kick in and you’ll get the same benefits as when editing an .sql file. Notice that if you install the sql-language-server in your VSCode it will automatically change the language to SQL when it detects a cell with a %%sparksql magic.

Remember that VSCode autocompletion is triggered by <CTRL-SPACE> rather than <TAB> in JupyterLab.

For example if we select the is_incremental suggestion shown above we get the following code inserted into our notebook cell. The same behaviour as DBT Power User in a .sql model file.

The bottom line is that %%sparksqlmagic works the same way in VSCode and JupyterLab notebooks. You can render DBT models into SQL, execute queries and view the results. Similarly VSCode notebooks are a great way to prototype DBT models. Once you are satisfied with your model you can copy it into an .sqlmodel file.

Conclusion

In this article we showed how to leverage %%sparksql to easily prototype templated Spark SQL in JupyterLab and VSCode. We focused on Spark, however jupyterlab-sql-editor also includes a %%trino magic! More SQL engines might be added in the future. Contributions are welcomed! Here’s our git repo CybercentreCanada/jupyterlab-sql-editor.

--

--

Data democratization advocate and principal engineer at Canadian Centre for Cyber Security | jean-claude.cote@cyber.gc.ca | Twitter @cybercentre_ca