Getting hands-on with DBT — Data Build Tool
Step by Step Guide to running your first project with DBT
DBT is all the rage now! When I first read about it I was like okay it’s a config-driven ETL, what else is new?
Then I read more about it I was like umm.. multi-adapters support that’s nice, we don’t have to write generic libraries and share them with others anymore.
Reading more and more, I was like wow! It’s a complete platform bundled with cloud, tests basically making our life easier.
Well, enough praise! Let’s get our hands dirty and run our first project with dbt
Setup Guide
Step 1: Setup Postgres
DBT has support for multiple adaptors such as Redshift, Snowflake, BigQuery, Spark, Postgres, etc. Since this is our project, let’s start simple with postgres
but if you have connectors for any of the other listed adaptors feel free to make apt changes for that specific adaptor.
Coming back to postgres
, if you already have postgres
setup, you can skip this step, else you can follow this guide to set up the same. Once set up, verify it is set up.
~ % postgres -V
postgres (PostgreSQL) 13.3
Step 2: Create a new virtual env and activate it
DBT runs on a python ecosystem, if you don’t have a localpython
setup, do that first(python setup guide)! (Come on, no Python, really?)
Create a new virtual environment for dbt
~ % virtualenv dbt~ % source dbt/bin/activate
(dbt) ~ %
Step 3: Setup Postgres dbt adaptor
This will setup install the necessary packages for using postgres
as an adaptor + core packages for dbt
as well.
pip install dbt-postgres
This will install dbt-core
and dbt-postgres
only:
$ dbt --version
installed version: 1.0.0
latest version: 1.0.0
Up to date!
Plugins:
- postgres: 1.0.0
For some reason, if you are not using postgres
as an adaptor, install packages for dbt-<youradaptor>
For instance, you want to setup dbt
adaptor for spark, you can do
pip install dbt-spark
Running your first project
Step 1: Initialize your first dbt project
Okay, all things installed. Choose your favorite folder and run dbt init
, it will ask you for the project name, add the same and choose your adaptor. Since we are using postgres
, we can choose [1]
(dbt) % dbt init
16:43:39 Running with dbt=1.1.1
Enter a name for your project (letters, digits, underscore): dbt_for_medium
Which database would you like to use?
[1] postgres
[2] sparkEnter a number: 1
16:43:46
Your new dbt project "dbt_for_medium" was created!Happy modeling!
Step 2: Configure profiles for dbt
This is very nice, check this out! DBT maintains profiles for maintaining details of the environment, like this
project, this
environment connects to this.
These profiles need to be updated at/Users/jyotidhiman/.dbt/profiles.yml
, configure your adaptor details here:
dbt_for_medium:
outputs:dev:
type: postgres
threads: 1
host: localhost
port: 5432
user: dbt
pass: dbtpass
dbname: postgres
schema: postgres
prod:
type: postgres
threads: 1
host: localhost
port: 5432
user: dbt
pass: dbtpass
dbname: postgres
schema: postgrestarget: dev
Step 3: Verify profile setup is done correctly
Once the profile is configured, go to your dbt
folder and run the following command to verify the setup has been done correctly. In case of any adaptor issue, an error will be highlighted here.
(dbt) dbt_for_medium % dbt debug
16:50:06 Running with dbt=1.1.1All checks passed!
Step 4: Run your DBT project!!
DBT init comes with some dummy models and SQL which can be used to verify and run the setup, it can be invoked with the below command:
(dbt) dbt_for_medium % dbt run
16:51:02 Completed successfully
16:51:02
16:51:02 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
Step 5: Verify data is created in Postgres
Since we are using postgres
in our warehouse, the data created by the dummy model(our transformation), can be verified in postgres
. To see how it was created take a look at my_first_dbt_model.sql
in your project.
postgres=# select * from my_first_dbt_model;
id
— —
1
(1 row)
Step 5: Run tests
DBT provides out-of-the-box integration with tests and ships with some dummy tests which can be invoked using command dbt test
(dbt) dbt_for_medium % dbt test
16:52:00 Running with dbt=1.1.1
16:52:00 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
16:52:00
16:52:00 Concurrency: 1 threads (target='dev')
16:52:00 Finished running 4 tests in 0.16s.
16:52:00
16:52:00 Completed with 1 error and 0 warnings:
16:52:00
16:52:00 Failure in test not_null_my_first_dbt_model_id (models/example/schema.yml)
16:52:00 Got 1 result, configured to fail if != 0
16:52:00
16:52:00 compiled SQL at target/compiled/dbt_for_medium/models/example/schema.yml/not_null_my_first_dbt_model_id.sql
16:52:00
Now you see one of the tests failed, this is intentional by DBT to get your hands dirty. To fix the test, uncomment this line in my_first_dbt_model.sql
And run dbt run
and dbt test
again
Well, that’s it!! All successful and complete :)
Your basic DBT setup is ready and working! Yay!!
Now you can start building your DBT project on top of it. For more details about DBT, please check out this amazing article.
Do let me know if you face any issues with the setup or if you have any questions.
Hope this helps!
JD