I recently came across a problem that stumped me. Aren’t those the best? They immediately humble us.
This problem led to much frustration around how the code I was writing was working. It was my first ally digging deep with dbt macros. I’ve used them before but never added custom logic, I only copied what I found in Dbt documentation.
Instead of dwelling, I used this architecture problem as an opportunity to learn more about dbt macros and Jinja functions. And, by writing this article, I hope to fill the knowledge gap that exists on how macros work.
What are dbt macros?
If you don’t know, dbt macros are basically functions embedded in your SQL code. The logic is very similar to a simple Python function. They utilize a language called Jinja to write these functions. While Jinja is pretty simple for those who have written functions in Python, you still need to read through the documentation to understand how it works.
Let’s go over some basic Jinja syntax:
Variables
If you want to set a variable, you would write that like this:
{%- set default = development -%}
default
is the name of the variable and development
is the object that is set equal to the variable. Remember to wrap this with the proper syntax: {%- -%}
.
If you want to reference a variable, you need to enclose it with double curky braces. Like this:
{{ default }}
If Statements
Writing if blocks is **** almost identical to Python, just with the proper Jinja syntax wrapping them:
-- if
{%- if default = development -%}
-- else if
{%- elif default = production -%}
-- else
{%- else -%}
-- closing the function
{%- endif -%}
Be sure to close off your if function with the {%- endif -%}
block.
Logging
Logging variables is an important part of debugging any function you write. In order to do this in Jinja simply plug in the variable you wish to print to your console using the following syntax:
{% do log(node, info=true) %}
Here you will be logging the node’s information to your console. Keep in mind that you need to set info=true
for the information to log on to your console and not just in the log file.
Writing a macro
Now, how do you actually put this all together to write a dbt macro? You define it similarly to how you would define a Python function.
First, you must open it with the word macro, and the proper Jinja syntax wrapping it. Like this:
{% macro -%}
Then, after the word macro
, you must specify a name for the macro, similar to how you would specify the name of a Python function.
{% macro generate_schema_name() -%}
If the function takes any variables as input, add those inside the parenthesis.
{% macro generate_schema_name(custom_schema_name, node) -%}
Again, similarly to Python (I sound like a broken record), you can set defaults for these variables in case no variables are passed in when the function is referenced.
{% macro generate_schema_name(custom_schema_name=none, node=none) -%}
And, last but not least, when you are done writing your logic within the macro, you must close it with the following block:
{%- endmacro %}
Not too bad, right?
dbt variables to reference
Too much documentation is always better than too little. But, sometimes, when there is so much good information, it is impossible to find exactly what you need. After struggling for a few days, I finally found dbt’s Jinja Function documentation.
This is super helpful when trying to create a custom macro. It will give you a good idea of how to accomplish your vision using what’s available to you.
Instead of going through everyone, which you can do on your own, let’s review the ones I’ve found most helpful.
Node
A node references all of the config settings of a particular dbt model. This is often referenced when pulling the name or path of a model. I highly recommend logging the node as we did above to see all of the different information you can pull from it to use in your macros.
{% do log(node, info=true) %}
For example, I often use node.path
in my macro code in order to get the file path of my models. If a model is in a certain folder, I write the code to do one thing. If it is in another folder, I write the code to do another thing.
Target
This is another variable commonly used in macros. If you’re familiar with dbt then you know targets are used to set your development environment. Most of your projects probably have a dev
target and a prod
target. The database and schema information varies under each one.
The target variable is how you will want to reference this different information. You can use the current target name by calling target.name
and the current target’s database by calling target.database
.
This is particularly helpful when setting different conditions based on the environment you are working in. You can check to see if you are in dev or prod by using target.name
in an if statement.
Let’s write a macro!
Now, it’s time to write your first macro using everything we just went over. While dbt has a few macros already built into each project, like create_schema_name
and drop_old_relations
, it is also important that you know how to customize these macros to fit your needs. Personally, I reconfigured create_schema_name
to match how I wanted to name my dbt models.
Let me share what I did.
First, I started with a basic shell of the macro. This includes the starting and ending lines, with the starting line containing the name of the function.
{% macro generate_schema_name() -%}
{%- endmacro %}
Next, I added two if statements and an else. I know I want to include two conditions and an else block to catch any cases that don’t meet either condition. Don’t forget your closing block!
{% macro generate_schema_name() -%}
{%- if -%}
{%- elif -%}
{%- else -%}
{%- endif -%}
{%- endmacro %}
Now, what conditions do I want to set? Let’s do each dependent on the environment set in the target. One will be for the scenario that you’re in development and the other for when you are in production.
{% macro generate_schema_name() -%}
{%- if target.name == 'dev'-%}
{%- elif target.name == 'prod' -%}
{%- else -%}
{%- endif -%}
{%- endmacro %}
Notice that our function name is generate_schema_name()
. This means we want to set different schemas depending on the environment. What would make sense for the schemas in dev and prod to be named?
This is all personal preference. This isn’t how my databases and schemas are actually set up, just how I am going to use them in this example.
I want to write a macro that sets the schema for all models in development to data_mart_dev
. However, I want all production models to be whatever is defined in my dbt_project.yml
.
Note that this macro generate_schema_name()
is one already written by dbt and takes a variable custom_schema_name
as input. Custom_schema_name
refers to what’s specified in your project file. In addition, we also want to pass in the model’s node.
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- if target.name == 'dev'-%}
{%- elif target.name == 'prod' -%}
{%- else -%}
{%- endif -%}
{%- endmacro %}
Now, we will want to set data_mart_dev
as the schema within the first if block in the macro, since we want all dev models to be built here. Strings are simply written as normal text without any quotes.
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- if target.name == 'dev'-%}
data_mart_dev
{%- elif target.name == 'prod' -%}
{%- else -%}
{%- endif -%}
{%- endmacro %}
Next, we need to set the schema to be custom_schema_name
in the second if block. Variables are referenced with {{ }}
. Adding trim
after a pipe will get rid of any whitespace when calling the variable.
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- if target.name == 'dev'-%}
data_mart_dev
{%- elif target.name == 'prod' -%}
{{ custom_schema_name | trim }}
{%- else -%}
{%- endif -%}
{%- endmacro %}
Lastly, we need to define what the schema should be when none of these conditions are met. Let’s create a default variable and set that equal to the target schema.
{% macro generate_schema_name(custom_schema_name, node) -%}
default_schema = target.schema
{%- if target.name == 'dev'-%}
data_mart_dev
{%- elif target.name == 'prod' -%}
{{ custom_schema_name | trim }}
{%- else -%}
{{ default_schema | trim }}
{%- endif -%}
{%- endmacro %}
Now our macro is complete! This one will be automatically called by each dbt model you choose to run. Your dev
models will be created in the schema data_mart_dev
and your prod
models will be created in the schemas defined in your dbt_project.yml
file.
Conclusion
Congrats! You have written your first dbt macro that sets different schema names for different development environments. dbt macros can be confusing to figure out, but once you do, there is a ton of cool stuff you can do with them. The possibilities of customizing your projects, making them more efficient, and testing your models are endless.
However, make sure you don’t overdo it. The benefit of using dbt is how modular it makes your code. Your code is meant to be stored so it can be repeated in multiple places. This is the same with macros. You don’t want to write a macro for one very small use case. You want the macro to apply to multiple places within your project. Keep this in mind when you’re deciding whether you should write a new macro or not.
If you’re interested in learning even more about dbt macros, be sure to subscribe to my new newsletter on Substack. Soon I will be sharing an exclusive article about a recent macro I wrote for a very specific use case. It will be full of valuable information that you may want to apply to your own dbt models! Happy coding.