The Power of Bayesian Causal Inference: A Comparative Analysis of Libraries to Reveal Hidden Causality in Your Dataset.

Understanding the causal effect of variables in systems or processes is very valuable. There are a number of Python libraries that can assist in determining causal relationships. I will compare five popular Causal Inference libraries in their functionality, ease of use, and flexibility. Each is accompanied by hands-on examples. The included libraries are Bnlearn, Pgmpy, CausalNex, DoWhy, __ and _CausalImpac_t. By the end of this blog, you will have a better understanding of these five causal inference libraries and determine which fits best for your use case.
Background
Causal inference is to determine the cause-and-effect relationships between variables in a process or system. In general, we can separate variables into two distinct groups; driver and passenger variables. Driver variables are those that directly influence the outcome or dependent variable, while passenger variables are those that do not have a direct effect but are correlated/ associated with the outcome variable.
The identification of driver variables can be crucial in projects such as predictive maintenance or in the security domain. The driver variables can help explain the causal relationship between the predictor and outcome variables. In contradiction, passenger variables do not have a direct effect on the outcome variable. However, they can still be useful as they can provide additional variation and thus understanding of the context in which the data was collected. For example, if we find that engine failures are strongly correlated with location, we might suspect that there is an underlying driver variable that is causing the failure for a specific location. Causal inference helps to make better decisions by identifying which variables to manipulate and which variables to monitor.
In causal inference, we not only seek to determine whether an event would occur, but also to understand the mechanism by which it occurs.
However, causal inference analysis is a challenging task because it requires separating the effects of multiple variables, accounting for confounding variables, and dealing with uncertainty. Luckily, Python has several libraries that can help data scientists perform causal inference. In this article, I will compare the five most popular causal inference libraries in Python: Bnlearn, Pgmpy, DoWhy, CausalNex, and CausalImpact.
If you find this article about Bayesian causal learning helpful, you are welcome to follow me because I write more about Bayesian statistics. If you are thinking of taking a Medium membership, you can support my work a little bit by using my referral link. It is the same price as a coffee for which you can read every month unlimited articles.
Comparison of Five Causal Inference Libraries.
Let’s go through each of the five packages and examine their functionality in terms of problem-solving capacity. To compare the libraries most consistently, I will use the same data set (when possible) across the five libraries, namely the Census Income data set. This multivariate data set contains 14 nodes, with 48842 instances (or samples) [1]. Each node contains two or more possible states and can be used to estimate whether having a graduate degree is important for making more than 50k annually. We can load the data set using the Bnlearn library.
# Installation
pip install bnlearn
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import bnlearn as bn
# Import data set and drop continous and sensitive features
df = bn.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
df.head()
# workclass education ... native-country salary
#0 State-gov Bachelors ... United-States <=50K
#1 Self-emp-not-inc Bachelors ... United-States <=50K
#2 Private HS-grad ... United-States <=50K
#3 Private 11th ... United-States <=50K
#4 Private Bachelors ... Cuba <=50K
#
#[5 rows x 7 columns]
Library 1: Bnlearn for Python.
Bnlearn is a Python package that is suited for creating and analyzing Bayesian Networks, for discrete, mixed, and continuous data sets [2, 3]. It is designed to be ease-of-use and contains the most-wanted Bayesian pipelines for causal learning in terms of structure learning, parameter learning, and making inferences. There is a range of statistical tests that can be used by simply specifying the parameters during initialization. Bnlearn also contains various helper functions to transform data sets, derive the topological ordering of the (entire) graph, comparison of two graphs, and to make various insightful plots, among others. More details about structure learning with bnlearn can be found here:
A Step-by-Step Guide in detecting causal relationships using Bayesian Structure Learning in Python.
One of the great functionalities of Bnlearn is that it can learn the causal structure based on only the data set. There are six algorithms implemented for this task; hillclimbsearch, exhaustivesearch, constraintsearch, chow-liu, naivebayes and TAN
and can be combined with scoring types BIC, K2, BDEU
. Some methods require setting a root node, such as Tree-augmented Naive Bayes (TAN
), which are recommended in case you know the outcome (or target) value. This will also dramatically lower the computational burden and is recommended in case you have many features. In addition, with the independence test
, spurious edges can be easily pruned from the model. In the following example, I will use the hillclimbsearch
method with scoring type BIC
for structure learning. In this example, we will not define a target value but let the Bnlearn decide the entire causal structure purely on the data itself.
# Load library
import bnlearn as bn
# Structure learning
model = bn.structure_learning.fit(df, methodtype='hillclimbsearch', scoretype='bic')
# Test edges significance and remove.
model = bn.independence_test(model, df, test="chi_square", alpha=0.05, prune=True)
# Make plot
G = bn.plot(model, interactive=False)
# Make plot interactive
G = bn.plot(model, interactive=True)
# Show edges
print(model['model_edges'])
# [('education', 'salary'),
# ('marital-status', 'relationship'),
# ('occupation', 'workclass'),
# ('occupation', 'education'),
# ('relationship', 'salary'),
# ('relationship', 'occupation')]
To determine the Directed Acyclic Graph (DAG), we need to specify the input data frame as shown in the code section above. After fitting a model, the results are stored in the model
dictionary, which can be used for further analysis. An interactive plot of the causal structure is shown in Figure 1.

With the learned DAG (Figure 1), we can estimate the conditional probability distributions (CPDs, see code section below), and make inferences using do-calculus. Or in other words, we can start asking questions to our data.
# Learn the CPDs using the estimated edges.
# Note that we can also customize the edges or manually provide a DAG.
# model = bn.make_DAG(model['model_edges'])
# Learn the CPD by providing the model and dataframe
model = bn.parameter_learning.fit(model, df)
# Print the CPD
CPD = bn.print_CPD(model)
Question 1: What is the probability of having a salary > 50k given the education is Doctorate:
P(salary | education=Doctorate)
Intuitively, we may expect a high probability because the education is "doctorate". Let’s find out the posterior probability from our Bayesian model. In the code section below we derived a probability of P=0.7093
. This confirms that when education is doctorate, there is a higher probability of having a salary of >50K compared to not having a doctorate education.
# Start making inferences
query = bn.inference.fit(model, variables=['salary'], evidence={'education':'Doctorate'})
print(query)
+---------------+---------------+
| salary | phi(salary) |
+===============+===============+
| salary(<=50K) | 0.2907 |
+---------------+---------------+
| salary(>50K) | 0.7093 |
+---------------+---------------+
Let’s now ask the question of whether lower education also results in a lower probability of having a salary of >50K. We can easily change education in HS-grad
and again ask the question.
Question 2: What is the probability of having a salary > 50k given the education is HS-grad: _
P(salary | education=_HS-grad_)
_
This results in a posterior probability of P=0.1615
. Studying is thus very beneficial for a higher salary according to this data set. However, be aware that we did not use any other constraints as evidence that can influence the outcome.
query = bn.inference.fit(model, variables=['salary'], evidence={'education':'HS-grad'})
print(query)
+---------------+---------------+
| salary | phi(salary) |
+===============+===============+
| salary(<=50K) | 0.8385 |
+---------------+---------------+
| salary(>50K) | 0.1615 |
+---------------+---------------+
Until this part, we used a single variable but all variables in the DAG can be used for evidence. Let’s make another more complex query.
Question 3: What is the probability of being in a certain workclass given that education is Doctorate and the marital status is never-married.
P(workclass| education=Doctorate, marital-status=never-married)
.
In the code section below can be seen that this returns the probability for each workclass, with workclass being private
having the highest probability: P=0.5639
.
# Start making inferences
query = bn.inference.fit(model, variables=['workclass'], evidence={'education':'Doctorate', 'marital-status':'Never-married'})
print(query)
+-----------------------------+------------------+
| workclass | phi(workclass) |
+=============================+==================+
| workclass(?) | 0.0420 |
+-----------------------------+------------------+
| workclass(Federal-gov) | 0.0420 |
+-----------------------------+------------------+
| workclass(Local-gov) | 0.1326 |
+-----------------------------+------------------+
| workclass(Never-worked) | 0.0034 |
+-----------------------------+------------------+
| workclass(Private) | 0.5639 |
+-----------------------------+------------------+
| workclass(Self-emp-inc) | 0.0448 |
+-----------------------------+------------------+
| workclass(Self-emp-not-inc) | 0.0868 |
+-----------------------------+------------------+
| workclass(State-gov) | 0.0810 |
+-----------------------------+------------------+
| workclass(Without-pay) | 0.0034 |
+-----------------------------+------------------+
Summary
- Input data: The input data can be discrete, continuous, or mixed data sets.
- Advantages: Contains the most wanted Bayesian pipelines for structure learning, parameter learning, and making inferences using do-calculus. Plots can be easily created and can be CPDs explored. Great for starters and experts that do not want to build the pipeline themselves.
Library 2: Pgmpy.
Pgmpy is a library that provides tools for Probabilistic Graphical Models. It contains implementations of various statistical approaches for Structure Learning, Parameter Estimation, Approximations (Sampling Based), and Exact inference. An advantage is that the core functions are low-level statistical functions which makes it flexible to build your own causal blocks. Although this is great, it requires a good knowledge of Bayesian modeling and can therefore be more difficult than libraries such as Bnlearn when you are just starting with Bayesian modeling.
In terms of functionality, the same results can be derived as shown with Bnlearn because some of the core functionalities of Pgmpy are also utilized in Bnlearn. However, in pgmpy you do need to build the entire pipeline yourself, including the transformation steps required for the data, but also the collection of the results and making the plots. In the code section below I will briefly demonstrate how to learn the causal structure with the hillclimbsearch
estimator and scoring method BIC
. Note that different estimators may require very different steps in the pipeline.
# Install pgmpy
pip install pgmpy
# Import functions from pgmpy
from pgmpy.estimators import HillClimbSearch, BicScore, BayesianEstimator
from pgmpy.models import BayesianNetwork, NaiveBayes
from pgmpy.inference import VariableElimination
# Import data set and drop continous and sensitive features
df = bn.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
# Create estimator
est = HillClimbSearch(df)
# Create scoring method
scoring_method = BicScore(df)
# Create the model and print the edges
model = est.estimate(scoring_method=scoring_method)
# Show edges
print(model.edges())
# [('education', 'salary'),
# ('marital-status', 'relationship'),
# ('occupation', 'workclass'),
# ('occupation', 'education'),
# ('relationship', 'salary'),
# ('relationship', 'occupation')]
To make inferences on the data using do-calculus, we first need to estimate the CPDs as shown in the code section below.
vec = {
'source': ['education', 'marital-status', 'occupation', 'relationship', 'relationship', 'salary'],
'target': ['occupation', 'relationship', 'workclass', 'education', 'salary', 'education'],
'weight': [True, True, True, True, True, True]
}
vec = pd.DataFrame(vec)
# Create Bayesian model
bayesianmodel = BayesianNetwork(vec)
# Fit the model
bayesianmodel.fit(df, estimator=BayesianEstimator, prior_type='bdeu', equivalent_sample_size=1000)
# Create model for variable elimination
model_infer = VariableElimination(bayesianmodel)
# Query
query = model_infer.query(variables=['salary'], evidence={'education':'Doctorate'})
print(query)
Summary
- Input data: The input data set must be discrete.
- Advantages: Contains fundamental building blocks that can be used to create your own Bayesian pipelines.
- Disadvantage: Requires good knowledge of Bayesian modeling.
Library 3: CausalNex.
CausalNex is a Python library for learning causal models from data, identifying causal pathways, and estimating causal effects [5]. Causalnex supports only discrete distributions, whereas continuous features, or features with a large number of categories, should be discretized prior to fitting the Bayesian Network. It is described that: "models typically fit poorly in case many variables are used". However, helper functionalities are provided to reduce the cardinality of categorical features. Let’s examine the possibilities of Causalnex using the Census Income data set. First, we need to make the data numeric, since this is what the underlying NOTEARS [7] algorithm expects. We can do this by label encoding non-numeric variables (see code section).
# Installation
pip install causalnex
# Load libraries
from causalnex.structure.notears import from_pandas
from causalnex.network import BayesianNetwork
import networkx as nx
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# Import data set and drop continous and sensitive features
df = bn.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
# Next, we want to make our data numeric, since this is what the NOTEARS expect.
df_num = df.copy()
for col in df_num.columns:
df_num[col] = le.fit_transform(df_num[col])
We can now apply the NOTEARS algorithm to learn the structure and visualize the causal model using the plot function. We need to apply thresholding to remove weaker edges and prevent having a fully connected graph. In addition, to avoid spurious edges, constraints on the edges can be included.
# Structure learning
sm = from_pandas(df_num)
# Thresholding
sm.remove_edges_below_threshold(0.8)
# Use positions from Bnlearn
pos=G['pos']
# Make plot
plt.figure(figsize=(15,10));
edge_width = [ d['weight']*0.3 for (u,v,d) in sm.edges(data=True)]
nx.draw_networkx_labels(sm, pos, font_family="Yu Gothic", font_weight="bold")
nx.draw_networkx(sm, node_size=400, arrowsize=20, alpha=0.6, edge_color='b', width=edge_width)
# If required, remove spurious edges and relearn structure.
sm = from_pandas(df_num, tabu_edges=[("relationship", "native-country")], w_threshold=0.8)

With the derived structure we can learn the conditional probability distribution (CPDs) and start making queries. However, there are a few additional steps to take, i.e., reducing the cardinality of categorical features, and discretizing numeric features. Note that we also need to specify the node states in the form of a dictionary to prevent manually converting numeric values into categorical labels. Although each step is important, for the sake of simplicity I will skip these steps to avoid many lines of code. For full details, I recommend reading the documentation manual here. It also demonstrates how to use a few helper methods to make discretization easier.
# Step 1: Create a new instance of BayesianNetwork
bn = BayesianNetwork(structure_model)
# Step 2: Reduce the cardinality of categorical features
# Use domain knowledge or other manners to remove redundant features.
# Step 3: Create Labels for Numeric Features
# Create a dictionary that describes the relation between numeric value and label.
# Step 4: Specify all of the states that each node can take
bn = bn.fit_node_states(df)
# Step 5: Fit Conditional Probability Distributions
bn = bn.fit_cpds(df, method="BayesianEstimator", bayes_prior="K2")
# Return CPD for education
result = bn.cpds["education"]
# Extract any information and probabilities related to education.
print(result)
# marital-status Divorced ... Widowed
# salary <=50K ... >50K
# workclass ? Federal-gov ... State-gov Without-pay
# education ...
# 10th 0.077320 0.019231 ... 0.058824 0.0625
# 11th 0.061856 0.012821 ... 0.117647 0.0625
# 12th 0.020619 0.006410 ... 0.058824 0.0625
# 1st-4th 0.015464 0.006410 ... 0.058824 0.0625
# 5th-6th 0.010309 0.006410 ... 0.058824 0.0625
# 7th-8th 0.056701 0.006410 ... 0.058824 0.0625
# 9th 0.067010 0.006410 ... 0.058824 0.0625
# Assoc-acdm 0.025773 0.057692 ... 0.058824 0.0625
# Assoc-voc 0.046392 0.051282 ... 0.058824 0.0625
# Bachelors 0.097938 0.128205 ... 0.058824 0.0625
# Doctorate 0.005155 0.006410 ... 0.058824 0.0625
# HS-grad 0.278351 0.333333 ... 0.058824 0.0625
# Masters 0.015464 0.032051 ... 0.058824 0.0625
# Preschool 0.005155 0.006410 ... 0.058824 0.0625
# Prof-school 0.015464 0.006410 ... 0.058824 0.0625
# Some-college 0.201031 0.314103 ... 0.058824 0.0625
# [16 rows x 126 columns]
Summary
- Input data: The input data set must be discrete. Continuous data is not supported.
- Advantages: Causal networks can be learned and inferences on the data can be performed.
- Disadvantage: Requires intensive pre-processing steps but helper functionalities are provided to reduce the cardinality of categorical features.
Library 4: DoWhy.
DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions [2, 3]. In comparison to Bnlearn and Pgmpy, in the DoWhy library, it is obligatory to define both the outcome variable and the treatment variable. The treatment variable is the variable of interest that you want to investigate the causal effect of, and the outcome variable is the variable that you want to measure the effect on. In addition, it is also recommended to provide a DAG, aka the causal relationships between the nodes. To create the causal graph for your dataset, you either need domain knowledge or you can connect each variable to the target and treatment variable. Note that the latter is automatically performed when no DAG is given. For this data set, I do not include a known structure and let DoWhy create edges between all variables to the outcome and treatment variable. The first code section shows the required pre-processing steps, and the second code section the creation of the causal model.
# Installation
pip install dowhy
# Import libraries
from dowhy import CausalModel
import dowhy.datasets
from sklearn.preprocessing import LabelEncoder
# Import data set and drop continous and sensitive features
df = bn.import_example(data='census_income')
drop_cols = ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week', 'race', 'sex']
df.drop(labels=drop_cols, axis=1, inplace=True)
# Treatment variable must be binary
df['education'] = df['education']=='Doctorate'
# Next, we need to make our data numeric.
df_num = df.copy()
for col in df_num.columns:
df_num[col] = le.fit_transform(df_num[col])
# Specify the treatment, outcome, and potential confounding variables
treatment = "education"
outcome = "salary"
# Step 1. Create a Causal Graph
model= CausalModel(
data=df_num,
treatment=treatment,
outcome=outcome,
common_causes=list(df.columns[~np.isin(df.columns, [treatment, outcome])]),
graph_builder='ges',
alpha=0.05,
)
# Display the model
model.view_model()

As you may have noticed in the code section above, the treatment variable must be binary (set to Doctorate), and all categorical variables need to be encoded into numeric values. In the code section below we will be using the properties of the causal graph to identify the causal effect. The result may not be surprising. It shows that having a Doctorate
increases the chances of a good salary
.
# Step 2: Identify causal effect and return target estimands
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
# Results
print(identified_estimand)
# Estimand type: EstimandType.NONPARAMETRIC_ATE
# ### Estimand : 1
# Estimand name: backdoor
# Estimand expression:
# d
# ────────────(E[salary|workclass,marital-status,native-country,relationship,occupation])
# d[education]
#
# Estimand assumption 1, Unconfoundedness: If U→{education} and U→salary then P(salary|education,workclass,marital-status,native-country,relationship,occupation,U) = P(salary|education,workclass,marital-status,native-country,relationship,occupation)
#
# ### Estimand : 2
# Estimand name: iv
# No such variable(s) found!
#
# ### Estimand : 3
# Estimand name: frontdoor
# No such variable(s) found!
# Step 3: Estimate the target estimand using a statistical method.
estimate = model.estimate_effect(identified_estimand, method_name="backdoor.propensity_score_stratification")
# Results
print(estimate)
#*** Causal Estimate ***
#
## Identified estimand
# Estimand type: EstimandType.NONPARAMETRIC_ATE
#
### Estimand : 1
# Estimand name: backdoor
# Estimand expression:
# d
# ────────────(E[salary|workclass,marital-status,native-country,relationship,occupation])
# d[education]
#
# Estimand assumption 1, Unconfoundedness: If U→{education} and U→salary then P(salary|education,workclass,marital-status,native-country,relationship,occupation,U) = P(salary|education,workclass,marital-status,native-country,relationship,occupation)
#
## Realized estimand
# b: salary~education+workclass+marital-status+native-country+relationship+occupation
# Target units: ate
#
## Estimate
# Mean value: 0.4697157228651772
# Step 4: Refute the obtained estimate using multiple robustness checks.
refute_results = model.refute_estimate(identified_estimand, estimate, method_name="random_common_cause")
Summary
- Input data: Outcome variable, and Treatment variable. It is highly recommended to provide a causal DAG.
- Requirements: The treatment variable must be binary, and all categorical variables need to be encoded into numeric values.
- (dis)advantage: The output contains many details which can be beneficial but make it also complicated to understand. Can not learn the causal DAG from the data.
Library 5: CausalImpact.
CausalImpact is a Python package for causal inference using Bayesian structural time-series models [4]. The main goal is to infer the expected effect of a given intervention by analyzing differences between expected and observed time series data, such as Program Evaluation, or Treatment Effect Analysis. It makes the assumption that the response variable can be precisely modeled by linear regression. However, it must not be affected by the intervention that took place. For instance, if a company wants to infer what impact a given marketing campaign will have on its "revenue", then its daily "visits" cannot be used as a covariate as the total visits might be affected by the campaign. In the following example, we will create 100 observations with a response variable y
and a predictor x1
. Note that in real use cases, we will have many more predictor variables. For demonstration, the intervention effect is created by changing the response variable by 10 units after timepoint 71.
To estimate the causal effect, we first need to specify the pre-intervention period and post-intervention period. With the run
function, we fit a time series model using MLE (default), and estimate the causal effect.
# Installation
pip install causalimpact
# Import libraries
import numpy as np
import pandas as pd
from statsmodels.tsa.arima_process import arma_generate_sample
import matplotlib.pyplot as plt
from causalimpact import CausalImpact
# Generate samples
x1 = arma_generate_sample(ar=[0.999], ma=[0.9], nsample=100) + 100
y = 1.2 * x1 + np.random.randn(100)
y[71:] = y[71:] + 10
data = pd.DataFrame(np.array([y, x1]).T, columns=["y","x1"])
# Initialize
impact = CausalImpact(data, pre_period=[0,69], post_period=[71,99])
# Create inferences
impact.run()
# Plot
impact.plot()
# Results
impact.summary()
# Average Cumulative
# Actual 130 3773
# Predicted 120 3501
# 95% CI [118, 122] [3447, 3555]
# Absolute Effect 9 272
# 95% CI [11, 7] [326, 218]
# Relative Effect 7.8% 7.8%
# 95% CI [9.3%, 6.2%] [9.3%, 6.2%]
# P-value 0.0%
# Prob. of Causal Effect 100.0%
# Summary report
impact.summary(output="report")
The Average column describes the average time during the post-intervention period. The Cumulative column sums up individual time points, which is a useful perspective if the response variable represents a flow quantity, such as queries, clicks, visits, installs, sales, or revenue. We see that in this example, there is a 100% probability of causal effect with a P-value of 0 which is correct because we included this in the data.

In the plot (Figure 4), we can see three panels that can be described as follows: In Panel 1, we can see the data along with a counterfactual prediction specifically for the post-treatment period. In the second panel, is shown the difference between the observed data and the counterfactual predictions. This difference represents the estimated pointwise causal effect, as determined by the model. The bottom panel depicts the cumulative effect of the intervention by aggregating the pointwise contributions from the previous panel. It is important to note that the validity of these inferences relies on the assumption that the covariates were not influenced by the intervention themselves. Additionally, the model assumes that the relationship established between the covariates and the treated time series during the pre-period remains consistent throughout the post-period.
Summary
- Input data: Time series data.
- Requirements: The pre-intervention period and post-intervention period needs to be specified.
- Advantage: Can determine the causal effects on time-series data.
Overall summary between the libraries.
We have seen five libraries for learning Causality, each with its own (dis)advantages and their focus to solve certain problems. A summary is as follows:
The BnLearn library is suited in use cases where you have discrete, mixed, or continuous data sets and need to derive the causal DAG from the data (structure learning), do parameter learning (CPD estimations), or in cases where you aim to make inferences (do-calculus). The advantages are that many pre- and post-processing steps are solved within the pipelines which makes it less error-prone. It allows categorical (string) labels as input and can remove spurious edges using statistical tests (independence test). In addition, edges or nodes can also be whitelisted or blacklisted. Note that it is not possible to entirely customize the modeling part as most processing steps are performed within the pipelines, which may give less control to the user.
The Pgmpy library contains low-level statistical functions to create and combine various methods for structure learning, parameter learning, and inferences. It is flexible in usage but requires good knowledge of Bayesian modeling. Note that the Bnlearn library utilizes some functions from Pgmpy and has therefore overlapping functionality. Or in other words, if you aim to create customized pipelines, it is recommended to use Pgmpy. Otherwise, I recommend using bnlearn.
The CausalNex library can be used to derive the causal DAG from the data, and make inferences. Note that it only works on discrete data sets. After detecting a causal DAG, edges can be removed by thresholding. Unfortunately, it can not work with categorical labels and needs to be converted into numerical values. In addition, various intermediate preprocessing steps are required. For example, reducing the cardinality of categorical features, describing numeric features, and specifying all of the states that each node can take in the form of a dictionary. Although this gives more control to the user in the modeling part, the extra modeling adds a layer of complexity.
The DoWhy approach is ideal in cases where you want to estimate the causal effect if you have an outcome variable together with a treatment variable. Ideal for use cases where you need A/B testing, such as determining the causes of hotel booking cancellations. The outcome can be set as "cancelation", and treatment can be set as "different room assigned". Another use case could be to estimate the effect of a member rewards program. Note that this library is not designed to learn the causal DAG from the data. However, it does need a causal DAG as input for the model which you need to provide manually.
CausalImpact is a Python package for causal inference using Bayesian structural time-series models. Although it also computes causality, it is not comparable in functionality to the other libraries as it does not intend to create a causal DAG or make inferences on the data. The goal is to infer the expected effect of a given intervention by analyzing differences between expected and observed time series data, such as Program Evaluation, or Treatment Effect Analysis.
Final words.
Modeling your data for causality can be challenging for many reasons. It starts with choosing the right library that matches your specific use case for which we have seen a Comparison of five popular causal inference libraries: Bnlearn, Pgmpy, CausalNex, DoWhy, and CausalImpact. Each library has its own (dis)advantages and is best suited for certain use cases. The libraries are compared in their functionalities, capabilities, and interpretability of the results using the same data set. Only for CausalImpact, a different data set is used as it could only model continuous values and required time series data.
A grouping on functionality would cluster Bnlearn, Pgmpy, and CausalNex as they contain functionalities for structure learning, parameter learning, and making inferences. The second group is DoWhy and CausalImpact which are designed to measure the causal effect of treatment variable(s) on an outcome variable Y, controlling for a set of features X. Besides the five causal libraries described here, more libraries are worth looking at, such as EconML, Pyro, and CausalML, which fall in the second group (causal effect of treatment variable on an outcome variable).
Be Safe. Stay Frosty.
Cheers E.
Disclaimer: I am the developer of the Python bnlearn library. Whatever your decision for a causal library is, make sure you fully understand the modeling part and can interpret the results.
If you found this article about Bayesian causal learning helpful, you are welcome to follow me because I write more about Bayesian statistics. If you are thinking of taking a Medium membership, you can support my work a little bit by using my referral link. It is the same price as a coffee for which you can read every month unlimited articles.
Software
Let’s connect!
Related work with Bayesian Statistics
A Step-by-Step Guide in detecting causal relationships using Bayesian Structure Learning in Python.
A step-by-step guide in designing knowledge-driven models using Bayesian theorem.
A Guide to Find the Best Boosting Model using Bayesian Hyperparameter Tuning but without…
Hands-on Guide for Hyperparameter Tuning with Bayesian Optimization for Classification Models.
References
- Census Income. (1996). UCI Machine Learning Repository. https://doi.org/10.24432/C5S595.
- Taskesen E, A Step-by-Step Guide in detecting causal relationships using Bayesian Structure Learning in Python, Medium, 2021
- Taskesen, E. (2020). Learning Bayesian Networks with the bnlearn Python Package. (Version 0.3.22) [Computer software].
- Dowhy: https://www.pywhy.org/dowhy/v0.8/getting_started/intro.html
- Amit Sharma, Emre Kiciman. DoWhy: An End-to-End Library for Causal Inference. 2020. https://arxiv.org/abs/2011.04216
- Kay H. et al, Inferring causal impact using Bayesian structural time-series models, 2015, Annals of Applied Statistics (247–274, vol9)
- Xun Zheng et al, _DAGs with NO TEARS: Continuous Optimization for Structure Learning_, 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
- AI4I 2020 Predictive Maintenance Dataset. (2020). UCI Machine Learning Repository.