
dbt (data build tool) is one of the hottest technologies in the data engineering and analytics space. Recently, I’ve been working on a task that performs some post-processing over dbt artefacts and wanted to write up some tests. In order to do so, I’ve had to create an example project that could run locally (or in a docker container), so that I wouldn’t have to interact with the actual Data Warehouse.
In this article we will go through a step-by-step process one can follow in order to create a dbt project and connect it with a containerized Postgres instance. You can use such projects either for testing purposes, or even for experimenting with the dbt itself in order to try out features or even practise your skills.
Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering
Step 1: Create a dbt project
We will be populating some data in a Postgres database therefore, we first need to install the Dbt Postgres adapter from PyPI:
pip install dbt-postgres==1.3.1
Note that the command will also install the dbt-core
package as well as other dependencies that are required for running dbt.
Now let’s go ahead and create a dbt project – to do so, we can initialise a new dbt project by running the dbt init
command in the terminal:
dbt init test_dbt_project
You will then be prompted to select which database you like to use (depending on the adapters you have installed locally, you may see different options):
16:43:08 Running with dbt=1.3.1
Which database would you like to use?
[1] postgres
(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)
Enter a number: 1
Make sure to enter the number that corresponds to the Postgres adapter, as shown in the output list. Now the init
command should have created the following basic structure in the directory where you’ve executed it:

dbt init
command – Source: AuthorStep 2: Create a Docker Compose file
Now let’s create a docker-compose.yml
file (place the file at the same level as the test_dbt_project
directory) in which we will be specifying two services – one would correspond to a ready-made Postgres image and the second one to a dbt
image that we will define in a Dockerfile
in the next step:
version: "3.9"
services:
postgres:
container_name: postgres
image: frantiseks/postgres-sakila
ports:
- '5432:5432'
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
dbt:
container_name: dbt
build: .
image: dbt-dummy
volumes:
- ./:/usr/src/dbt
depends_on:
postgres:
condition: service_healthy
As you can tell, for the Postgres container, we will be using an image called frantiseks/postgres-sakila
which is publicly available and accessible on Docker Hub. This image, will populate the Sakila Database on the Postgres instance. The database models a DVD rental store and is consisted of multiple tables which are normalised and correspond to entities such as films, actors, customers and payments. In the next few following sections we’ll make use of this data in order to build some example dbt data models.
The second service, called dbt
, will be the one that creates an environment where we will build our data models. Note that we mount the current directory into the docker container. This will let the container have access to any changes we may be doing to the data models without having to re-build the image. Additionally, any metadata generated by dbt commands (such as manifet.json
) will appear instantly on the host machine.
Step 3: Create a Dockerfile
Now let’s specify a Dockerfile
that will be used to build an image on top of which the running container will then build the models specified in our example dbt project.
FROM Python:3.10-slim-buster
RUN apt-get update
&& apt-get install -y --no-install-recommends
WORKDIR /usr/src/dbt/dbt_project
# Install the dbt Postgres adapter. This step will also install dbt-core
RUN pip install --upgrade pip
RUN pip install dbt-postgres==1.3.1
# Install dbt dependencies (as specified in packages.yml file)
# Build seeds, models and snapshots (and run tests wherever applicable)
CMD dbt deps && dbt build --profiles-dir profiles && sleep infinity
Note that in the last CMD
command, we intentionally added an extra && sleep infinity
command such that the container won’t exit after running the steps specified in the Dockerfile
so that we can then access the container and run additional dbt commands (if needed).
Step 4: Create a dbt profile for the Postgres database
Now that we have created the required infrastructure for our host machines in order to create a Postgres database, populate some dummy data as well as creating an image for our dbt environment, let’s focus on the dbt side.
We will first have to create a dbt profile that will be used when interacting with the target Postgres database. Within the test_dbt_project
directory, create another directory called profiles
and then a file called profiles.yml
with the following content:
test_profile:
target: dev
outputs:
dev:
type: postgres
host: postgres
user: postgres
password: postgres
port: 5432
dbname: postgres
schema: public
threads: 1
Step 5: Define some data models
The next step is to create some data models based on the Sakila data populated by the Postgres container. If you are planning to use this project for testing purposes, I would advise to create at least one seed, one model and a snapshot (with tests if possible) so that you have a full coverage of all dbt entities (macros excluding).
I have created some data models, seeds and snapshots already, that you can access them on this repository
Step 6: Run the Docker containers
We now have everything we need in order to spin up the two docker containers we specified in the docker-compose.yml
file earlier, and build the data models defined in our example dbt project.
First, let’s build the images
docker-compose build
And now let’s spin up the running containers:
docker-compose up
This command should have initialised a Postgres database using the Sakila Database, and created the dbt models specified. For now, let’s make sure you have two running containers:
docker ps
should give an output that includes one container with name dbt
and another one with name postgres
.
Step 7: Query the models on Postgres database
In order to access the Postgres container, you’ll first need to infer the container id
docker ps
And then run
docker exec -it <container-id> /bin/bash
We will then need to use psql
, a command-line interface that gives us access the postgres instance:
psql -U postgres
If you have used the data models I’ve shared in the previous sections, you can now query each of the models created on Postgres using the queries below.
-- Query seed tables
SELECT * FROM customer_base;
-- Query staging views
SELECT * FROM stg_payment;
-- Query intermediate views
SELECT * FROM int_customers_per_store;
SELECT * FROM int_revenue_by_date;
-- Query mart tables
SELECT * FROM cumulative_revenue;
-- Query snapshot tables
SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot;
Step 8: Creating additional or modifying existing models
As mentioned already, the Dockerfile
and docker-compose.yml
files were written in such a way such that the dbt container would still be up and running. Therefore, whenever you modify or create data models, you can still use that container to re-build seeds, models, snapshots and/or tests.
To do so, first infer the container id of the dbt
container:
docker ps
Then enter the running container by running
docker exec -it <container-id> /bin/bash
And finally run any dbt command you wish, depending on the modifications you’ve made to the example dbt project. Here’s a quick reference of the most commonly used commands for these purposes:
# Install dbt deps
dbt deps
# Build seeds
dbt seeds --profiles-dir profiles
# Build data models
dbt run --profiles-dir profiles
# Build snapshots
dbt snapshot --profiles-dir profiles
# Run tests
dbt test --profiles-dir profiles
How to get the full code of this tutorial
I’ve created a GitHub repository called dbt-dummy
that contains all the pieces you need in order to quickly create a containerized dbt project that uses Postgres. You can access it in the link below.
GitHub – gmyrianthous/dbt-dummy: This is a dummy dbt (data build tool) project you can use for…
This project is also available in the example projects section of the official dbt documentation!
Final Thoughts
In today’s tutorial we went through a step by step process for creating a dbt project on a local machine using Docker. We’ve built two images, one for the Postgres database that also populates the Sakila database, and another one for our dbt environment.
It’s important to be able to quickly build some example projects with data build tool that can then be used as a testing environment or even a playground for experimenting and learning.
Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering
Related articles you may also like