Causal Inference is a fascinating topic. Causal models seek to create a mechanistic understanding of how variables are related. Recently I’ve been reading Statistical Rethinking by Richard McElreath, whose eloquent and accessible writing has changed the way I think about not just regression and statistical analysis, but life.
This article seeks to explore causal modelling using Directed Acyclic Graphs or DAGs and brms (Buerkner).

I found this wonderful article by Ziegler et al, that seeks to teach the pedagogy of multiple linear regression models to school children. The dataset I’ll be using is taken from this and is available under CC BY 4.0 license.
Building a Multiple Linear Regression Model With LEGO Brick Data
Load Packages and Data
library(tidyverse)
library(tidybayes)
library(brms)
library(ggdag)
library(dagitty)
train <- read_csv('lego.population.csv')
train_parse <-
train %>%
drop_na(Weight, Pieces) %>%
mutate(Weight = as.numeric(str_sub(Weight, 1, 3)),
Price = as.numeric(str_remove(Price, '[^[:alnum:]]')),
Amazon_Price = as.numeric(str_remove(Amazon_Price, '[^[:alnum:]]')),
Price = coalesce(Price, Amazon_Price)) %>%
drop_na(Price, Weight) %>%
mutate(Weight_c = Weight/max(Weight),
Price_c = Price/max(Price),
Pieces_c = Pieces/max(Pieces)) %>%
drop_na(Weight, Pieces, Price)
We perform some basic data cleansing, and then min/max scale the variables so that they’re on the same scale so that our later interpretation of beta posterior means later is lot easier.
Exploratory Data Analysis
Using GGally:ggpairs we generate the below pair plot for our variables of interest.
GGally::ggpairs(train_parse %>%
select(Weight_c, Pieces_c, Price_c),
aes(alpha = 1/3)) +
theme_minimal() +
labs(title = 'Pairplot for Weight, Pieces and Price')

Nothing overtly interesting to note, other than the strong positive linear relationships between all three variables. But what is the correct causal model? Does increasing the price also increase the number of pieces? I don’t think so.
Causal Inference
Using the loaded dataset we are going to create a causal understanding of Price. The dataset is conceptually easy to understand causally.
Thousands of Lego sets exist from basic children’s Duplo to large sets with thousands of pieces like the Millenium Falcon or Death Star from Star Wars. Prices can start from $10–20 to thousands for limited edition scale model sets. Can we create a reasonable causal model for the price given a set’s features?
DAGs
In the below, we use the daggity package to set up our proposed DAG with some plot co-ordinates and then use ggdag to plot the below figure that describes all the different pathways that Weight and the number of Pieces can influence the Price of a lego set.
dag_coords <-
tibble(name = c("Pcs", 'W', 'Pr'),
x = c(1, 2, 2),
y = c(2, 2, 1))
dagify(Pr ~ Pcs,
Pr ~ W,
W ~ Pcs,
coords = dag_coords) %>%
ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_point(color = 'dark red', alpha = 1/4, size = 20) +
geom_dag_edges(edge_color = 'dark red') +
geom_dag_text(color = 'dark red') +
scale_x_continuous(NULL, breaks = NULL, expand = c(.1, .1)) +
scale_y_continuous(NULL, breaks = NULL, expand = c(.1, .1)) +
theme_bw() +
theme(panel.grid = element_blank())
Our task is now to test the implications of these DAGs and then finalise a causal model that best describes the influence of these variables on Price based on our dataset. To begin with, we can reasonably say that the number of pieces is positively correlated with weight and price, i.e. the more pieces a set has, the heavier it will be and more expensive. Similarly, we can also reason that larger, heavier sets should cost more.

Regress Price on Pieces
A natural place to start is regressing the price on the number of pieces, Pcs → Pr. The beauty of Bayesian analysis is being able to provide prior distributions before looking at the data. The distribution of set prices is described by a normal distribution, where the mean is described by linear terms of the intercept and gradient term for pieces. The intercept is described by a relatively broad gaussian distribution with a mean value of $20 and a standard deviation of $6. In essence, this describes the distribution of base prices of a Lego set without any pieces. The beta term describes the value increase per piece. A Google search indicates the average price of a Lego piece is 11c, so this is a good start, and we’ll make the standard deviation sufficiently broad.

mP_Pr <- brm(Price ~ 1 + Pieces,
family = gaussian,
prior = c(prior(exponential(1), class = sigma),
prior(normal(20, 6), class = Intercept),
prior(normal(0.11, 0.04), class = b)
),
data = train_parse,
warmup = 1000, iter = 2000, chains = 4, cores = 4, seed = 246) %>%
add_criterion(criterion = c('loo'))
summary(mP_Pr)

Our priors weren’t too far off after being exposed to the data, the posterior mean for the intercept term represents the average price of a Lego set with zero pieces, and each piece adds 8c to the value of the set. The posterior distributions have fairly discrete error terms on their respective scales, thus the model is fairly confident of these values based on the data.
Regress Price on Weight
Similarly above, set some reasonable yet broad priors that will be swamped by the data. We’ve assumed the same intercept prior as before, the beta term for Weight, again very broad, with a mean of $50/kg and a standard deviation of $15.

mW_Pr <- brm(Price ~ 1 + Weight,
family = gaussian,
prior = c(prior(exponential(5), class = sigma),
prior(normal(20, 6), class = Intercept),
prior(normal(50, 15), class = b)), )
data = train_parse,
warmup = 1000, iter = 2000, chains = 4, cores = 4, seed = 246)

This time the Intercept term is lower than before, with a posterior mean of $5.77, and the gradient term for Weight has a posterior mean of $57.90/kg. Note this model is more confident as the error values are more discrete than when regressing Price on Pieces.
Regress Price on Pieces and Weight
Now things get a little more interesting – we’ve created two Bayesian regression models for Price, using both the number of Pieces and Weight. Both seem reasonably ok models, but which is the better causal model?
Meet the Collider, whereby the number of Pieces and Weight are independent of each other (Pcs || W). We can already remove this form consideration given what we mechanistically know about the variables, but let’s develop models to support this.


# The Colllider
mWP_Pr <- brm(
Price ~ 1 + Pieces + Weight,
family = gaussian,
prior = c(prior(exponential(5), class = sigma),
prior(normal(20, 6), class = Intercept),
prior(normal(50, 15), class = b, coef = Weight),
prior(normal(0.11, 0.04), class = b, coef = Pieces)
),
data = train_parse,
warmup = 1000, iter = 4000, chains = 4, cores = 4, seed = 246)

We notice immediately that the coefficient for the intercept has not changed much at all – and both the predictor coefficients have decreased, this isn’t surprising given how strongly correlated they are, and an example of multicollinearity. This is enough to disprove the Collider as an appropriate causal model, violating the independence assumption between the two variables. Below we visualise the posterior distributions of Weight and Pieces together as a scatter plot, to display multicollinearity, there is some shared axis of variance. On the right-hand side, we also see the near-zero difference in posterior distributions of the Weight term between regressing on Weight vs. Weight and Pieces together.
# LHS: Plot of Posterior Draws Weight vs. Pieces
as_draws_df(mWP_Pr) %>%
ggplot(aes(x = b_Pieces_c, y=b_Weight_c)) +
geom_point(alpha = 0.3) +
labs(y = 'Weight', x = 'Pieces', title = 'Weight and Pieces Display Multicollinearity and Not Indpendant', subtitle = 'How Can Price per Piece Increase as Price per Unit Weight Decreases?') +
theme_minimal()
# RHS: Comparison of Posterior Draws for Weight Across W → Pr ← Pcs & W → Pr
tidy_draws(mWP_Pr) %>%
select(b_Pieces_c, b_Weight_c) %>%
transmute(Pieces_Weight = b_Pieces_c + b_Weight_c,
Weight = tidy_draws(mW_Pr) %>% select(b_Weight_c) %>% as_vector()) %>%
pivot_longer(1:2, names_to = 'Variable', values_to = 'posterior_samples') %>%
ggplot(aes(posterior_samples, fill = Variable)) +
geom_density(alpha = 1/3) +
theme_minimal() +
labs(title = 'Addition of Pieces and Weight is Nearly Equivalent to Weight Alone',
subtitle = 'Variables Min-Max Scaled',
y = 'Density',
x = 'Posterior Distribution')
#NB We've used models regenerated using min-max scaled variables.


The next DAG is known as the Pipe, whereby the number of pieces is independent of Price when conditioning on Weight, (or Pcs || Pr | W). Alternatively, we think about it as, once we know the Weight of a set, knowing the number of Pieces adds no further value to our understanding of Price.

Below we establish the Pipe as a Bayesian model, using the priors from our previous examples.
# The Pipe
pr_model <- bf(Price ~ 1 + Weight)
w_model <- bf(Weight ~ 1 + Pieces)
mWP_Pr2 <- brm(pr_model + w_model + set_rescor(F),
prior = c(prior(exponential(5), class = sigma, resp = Price),
prior(exponential(5), class = sigma, resp = Weight),
prior(normal(50, 15), class = b, coef = Weight, resp = Price),
prior(normal(0.11, 0.04), class = b, coef = Pieces, resp = Weight)),
family = gaussian,
data = train_parse,
warmup = 1000, iter = 4000, chains = 4, cores = 4, seed = 246)

The Price intercept remains unchanged as we are regressing on Weight, and the posterior mean of which is effectively the same as our first model W → Pr. The posterior mean of the Weight Intercept could potentially represent the weight of the packaging before any pieces are added. The regression coefficient for Pieces is relatively small (0.0014kg/1.4g per piece).
Let’s test the implied conditional independence with a counterfactual model. We establish new data, varying the number of pieces while holding the weight constant at 1kg.
# Counterfactual Plot Pieces on Price, Holding Weight Consant
nd <- tibble(Pieces = seq(from = 0, to = 5000, length.out = 50),
Weight = 1)
predict(mWP_Pr2,
resp = c('Price'),
newdata = nd) %>%
as_tibble() %>%
bind_cols(nd) %>%
ggplot(aes(x = Pieces, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_smooth(stat = 'identity', alpha = 1/5, size = 1/4) +
labs(y = 'Counterfactual Price', x = 'Manipulated Pieces')

As the impact of Pieces on Price is mediated through Weight, it holds that Price will remain constant as Weight is constant regardless of the number of Pieces. This supports the implied conditional independency, that Price is independent of Pieces when conditioning on Weight (or Pcs || Pr | W). Alternatively, once we know the Weight of a set, knowing the number of Pieces adds no further information to our understanding of Price.
Concluding Remarks
In this article, we’ve demonstrated an accessible introduction to causal modelling using Bayesian regression. We’ve developed a causal model for the impacts of physical variables (Pieces and Weight) on Price, and found that we can reasonably believe that regressing Price on Weight is a decent causal model.
I’d like to take a chance to thank Richard McElreath for his wonderful writing and lectures on causal modelling which have inspired me to change the way I think, taking a more rigorous Bayesian approach to my work and life.
Thank you. I hope you enjoyed reading this as much as I enjoyed writing. If you’re not a Medium member – use my referral link below and get regular updates on new publications from myself and other fantastic Medium authors.