Snakemake is one of the most popular workflow management packages for data science, but it lacks functionality to distribute the workflows (pipelines) you create. In this post, I will demonstrate how to add such functionality by using the Python package distribution system. Using this distribution system has the following benefits:
- Distribute workflows to other data scientists who have no access to your infrastructure in an installable package
- Add version control to your pipelines
- Reduce Snakemake boilerplate (e.g. the required argument
—-cores
)
When following these guidelines, installing and running your snakemake workflow will be as easy as executing:
Step 0: Prerequisites
To build a package, a (discoverable) Python 3 installation is required. The easiest way to achieve this is by creating a Conda environment with Python 3 included. I used Python 3.9 on a Unix system for this post. To follow along, some experience using Conda and Snakemake is required.
Step 1: Creating the packaging material

For this example, we will create a Python package (called snakepack) that uses a simple internal Snakemake pipeline that copies all .txt
files from an input directory to an output directory.
To mimic a real-world situation, we will create two separate folders, one containing the package files and one containing the input and configuration files that a user would use to run the package on.
Package folder structure
This structure will contain all files to create the actual package. The root folder contains the files that are required to build the package. The snakepack subfolder contains the actual modules. Inside the snakepack subfolder is the snakemake folder which contains the pipeline file.
This structure can e.g. be created by running
The files stay empty until we will fill them in later.
User files
These folders contain the files that will be used to demonstrate that the pipeline works. They can be located anywhere, but in this example, they will reside in the home (~
) folder:
This structure can e.g. be created by running
Step 2: Forging the pipe to pack

Let’s start by creating the Snakemake pipeline in the snakemake folder and see if it works without the package wrapper. As we want the user of our package to be able to configure the pipeline, we will use Snakemake’s config file to define the input and output directories and pass them to Snakemake.
Snakefile
In our snakepack_files
folder (see above) we can fill in the config file:
config.yaml
Testing
This pipeline can now be run from command line from the snakemake
folder via
Step 3: Packing up

We will create a package wrapper around the existing snakemake pipeline which allows us to create one .whl
file that contains the package and can be distributed. Use the above folder/file structure to fill in these files:
copyfiles.py
As we want to be able to invoke the pipeline from the command line after we install the package, this file acts as a wrapper that takes the command line arguments and uses them to start the snakemake pipeline. An advantage is that we can control which parameters the user sets and which parameters we fix or determine automatically.
In this case, we can distinguish the following parameters:
- User defined: the location of the config file
- Fixed: the location of the Snakefile inside the package
- Automatically determined: the number of available cpus
requirements.txt
This file contains the requirements for the Python package. In this case, snakemake is a requirement. We also included mamba, as this optimises the usage of snakemake.
The requirements will be automatically downloaded and installed when you install the snakepack package later on.
setup.py
The setup.py
file will give the python package tool instructions on how to create the package.
The setup
command directs the package builder that this is version 0.0.1 of package snakepack. The find_packages()
call ensures that the python module will be included in the wheel. install_requires
defines the requirements that we read in from requirements.txt
. Lastly, we set include_package_data
to True
and point to the snakemake directory to be included in the package.
Testing
Once you have created the above files, you can test whether your package can be installed in development mode and you can run the module.
This should run the pipeline and copy the files. Delete the copied files afterwards.
Step 4: Take-off!

If step 3 works, you are ready to build the actual package:
This will create a wheel (.whl
) file in a new subfolder called dist/
. This wheel can be shared with others.
Testing
Let’s try if it installs and runs:
Conclusion
This post uses a very simple snakemake pipeline to show it can be made distributable by containing it inside a custom Python package. If you are a frequent Snakemake user and would like to easily share and version control your pipelines, you can expand the offered framework with more complex pipelines and tailor it to your needs.