Getting hands-on with DBT — Data Build Tool

Step by Step Guide to running your first project with DBT

Jyoti Dhiman
Towards Data Science

--

Photo by Volodymyr Hryshchenko on Unsplash

DBT is all the rage now! When I first read about it I was like okay it’s a config-driven ETL, what else is new?

Then I read more about it I was like umm.. multi-adapters support that’s nice, we don’t have to write generic libraries and share them with others anymore.

Reading more and more, I was like wow! It’s a complete platform bundled with cloud, tests basically making our life easier.

Well, enough praise! Let’s get our hands dirty and run our first project with dbt

Setup Guide

Step 1: Setup Postgres

DBT has support for multiple adaptors such as Redshift, Snowflake, BigQuery, Spark, Postgres, etc. Since this is our project, let’s start simple with postgres but if you have connectors for any of the other listed adaptors feel free to make apt changes for that specific adaptor.

Coming back to postgres, if you already have postgres setup, you can skip this step, else you can follow this guide to set up the same. Once set up, verify it is set up.

~ % postgres -V
postgres (PostgreSQL) 13.3

Step 2: Create a new virtual env and activate it

DBT runs on a python ecosystem, if you don’t have a localpython setup, do that first(python setup guide)! (Come on, no Python, really?)

Create a new virtual environment for dbt

~ % virtualenv dbt~ % source dbt/bin/activate
(dbt) ~ %

Step 3: Setup Postgres dbt adaptor

This will setup install the necessary packages for using postgres as an adaptor + core packages for dbt as well.

pip install dbt-postgres

This will install dbt-core and dbt-postgres only:

$ dbt --version
installed version: 1.0.0
latest version: 1.0.0
Up to date!
Plugins:
- postgres: 1.0.0

For some reason, if you are not using postgres as an adaptor, install packages for dbt-<youradaptor> For instance, you want to setup dbt adaptor for spark, you can do

pip install dbt-spark

Running your first project

Step 1: Initialize your first dbt project

Okay, all things installed. Choose your favorite folder and run dbt init , it will ask you for the project name, add the same and choose your adaptor. Since we are using postgres, we can choose [1]

(dbt) % dbt init
16:43:39 Running with dbt=1.1.1
Enter a name for your project (letters, digits, underscore): dbt_for_medium
Which database would you like to use?
[1] postgres
[2] spark
Enter a number: 1
16:43:46
Your new dbt project "dbt_for_medium" was created!
Happy modeling!

Step 2: Configure profiles for dbt

This is very nice, check this out! DBT maintains profiles for maintaining details of the environment, like this project, this environment connects to this.

These profiles need to be updated at/Users/jyotidhiman/.dbt/profiles.yml , configure your adaptor details here:

dbt_for_medium:
outputs:
dev:
type: postgres
threads: 1
host: localhost
port: 5432
user: dbt
pass: dbtpass
dbname: postgres
schema: postgres
prod:
type: postgres
threads: 1
host: localhost
port: 5432
user: dbt
pass: dbtpass
dbname: postgres
schema: postgres
target: dev

Step 3: Verify profile setup is done correctly

Once the profile is configured, go to your dbt folder and run the following command to verify the setup has been done correctly. In case of any adaptor issue, an error will be highlighted here.

(dbt) dbt_for_medium % dbt debug
16:50:06 Running with dbt=1.1.1
All checks passed!

Step 4: Run your DBT project!!

DBT init comes with some dummy models and SQL which can be used to verify and run the setup, it can be invoked with the below command:

(dbt) dbt_for_medium % dbt run
16:51:02 Completed successfully
16:51:02
16:51:02 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Step 5: Verify data is created in Postgres

Since we are using postgres in our warehouse, the data created by the dummy model(our transformation), can be verified in postgres. To see how it was created take a look at my_first_dbt_model.sql in your project.

postgres=# select * from my_first_dbt_model;
id
— —
1
(1 row)

Step 5: Run tests

DBT provides out-of-the-box integration with tests and ships with some dummy tests which can be invoked using command dbt test

(dbt) dbt_for_medium % dbt test
16:52:00 Running with dbt=1.1.1
16:52:00 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
16:52:00
16:52:00 Concurrency: 1 threads (target='dev')
16:52:00 Finished running 4 tests in 0.16s.
16:52:00
16:52:00 Completed with 1 error and 0 warnings:
16:52:00
16:52:00 Failure in test not_null_my_first_dbt_model_id (models/example/schema.yml)
16:52:00 Got 1 result, configured to fail if != 0
16:52:00
16:52:00 compiled SQL at target/compiled/dbt_for_medium/models/example/schema.yml/not_null_my_first_dbt_model_id.sql
16:52:00

Now you see one of the tests failed, this is intentional by DBT to get your hands dirty. To fix the test, uncomment this line in my_first_dbt_model.sql

Image By Author

And run dbt run and dbt test again

Image by Author

Well, that’s it!! All successful and complete :)

Your basic DBT setup is ready and working! Yay!!

Now you can start building your DBT project on top of it. For more details about DBT, please check out this amazing article.

Do let me know if you face any issues with the setup or if you have any questions.

Hope this helps!
JD

--

--