The world’s leading publication for data science, AI, and ML professionals.

Conda Too Slow? Try Mamba!

Popular package managers compared

Retro package delivery. Photo by Charlie M on Unsplash
Retro package delivery. Photo by Charlie M on Unsplash

Sooner or later, every data scientist and machine learning engineer will encounter package managers and environments. Environments contain the libraries needed to run project code. Developers can create multiple environments on the same machine, making it possible to maintain different environments for different projects. Software is not installed system-wide, but is contained within an environment.

Package managers are used to distribute software libraries. Popular package managers include Conda, pip, and mamba.

It is definitely worth checking out mamba, as I was able to install a large environment 10 times faster through mamba, compared to conda!

In this article, I will show you how to obtain this speed up. I will discuss:

  • How to set up an environment
  • The conda and mamba package and environment managers
  • How they compare in terms of speed
  • libmamba: mamba speedup within conda?

Software Environment

Maintaining a software environment file ensures that code remains reproducible and can be executed on different platforms. A machine learning project should always include a list of required packages, along with their version numbers. If you give your model to another developer or ship it to a customer, they can replicate the environment locally.

A sample environment file looks like this, taken from one of my git repositories at https://github.com/crlna16/ai4foodsecurity:

name: ai4foodsecurity
channels:
  - conda-forge
  - defaults
  - pytorch
  - nvidia
dependencies:
  - pandas==1.0.1
  - geopandas==0.8.2
  - rasterio==1.1.8
  - matplotlib==3.3.2
  - tensorboard==2.4.0
  - sentinelhub==3.3.2
  - pytorch==1.9.0
  - torchvision==0.10.0
  - numpy==1.19.5
  - sh==1.14.2
  - radiant-mlhub==0.3.0
  - ipykernel=5.3.4

  - 'cudatoolkit=11.1'

Package management systems can be used to create environments from files like this.

Package Management Systems

There are a number of different ways to create environments and install packages in them. We will focus exclusively on

  • Conda,
  • Mamba.

While pip is also a popular choice for maintaining Python environments, using conda or mamba has the advantage that they check for dependencies. For example, Tensorflow 2.13.0 requires Python 3.8–3.11 as a dependency. If the user tries to install a non-compliant Python version, conda will warn the user about the inconsistency and refuse to install the package. Pip on the other hand will not complain, but the code may not run.

Some packages are available through pip in a more recent version than through conda. In this case, it is possible to explicitly include pip packages in the environment definition.

It can be very time consuming to debug mistakes and inconsistencies in an environment. Errors are often not obvious, and it is difficult to determine the correct version of the required packages after the fact. Therefore, it is highly recommended that you maintain the environment description file with packages and version numbers.


Conda

Conda [https://docs.conda.io/en/latest/] is a multi-platform package manager that runs on Windows, MacOS, and Linux. It is both a package manager, hosting software packages on central servers ready for download, and an environment manager. While most commonly used in the context of Python development, conda also works with other Programming languages.

The main distribution channel for conda packages is https://repo.anaconda.com/, which contains more than 7,500 verified packages. Beyond that, the community oriented conda-forge [https://anaconda.org/conda-forge] contains more than 22,500 additional packages.

As an example, to create a conda environment and install numpy, run

conda create -n mycondaenv
conda activate mycondaenv
conda install numpy

While conda is generally great, it tends to get slow over time. Especially if you have a large environment, it can take a long time to resolve the environment when installing additional packages. This is frustrating for developers, because instead of getting on with software development and machine learning experiments, have to wait half an hour for the environment to resolve itself.

Bored developer waiting for the conda environment to resolve. Photo by Juan Gomez on Unsplash
Bored developer waiting for the conda environment to resolve. Photo by Juan Gomez on Unsplash

Mamba

Mamba [https://mamba.readthedocs.io/en/latest/index.html] is a conda-compatible package manager that supports most of conda’s packages. The mamba command can be used as a drop-in replacement for conda in all commands. Packages from conda-forge are also available through mamba.

To create an environment and install a package, use

mamba create -n mymambaenv
mamba activate mymambaenv
mamba install numpy

Mamba itself can installed using conda

conda install -c conda-forge mamba
Photo by Cara Fuller on Unsplash
Photo by Cara Fuller on Unsplash

Install speed

Comparing the following commands on the same Linux system, I found different execution times for the conda and mamba package managers.

time conda create -y -n mycondaenv numpy

>> real 0m37,992s
time mamba create -y -n mymambaenv numpy

>> real 0m27,722s

The commands are shorthands that create the environment and install the numpy package in a single line of shell instructions. The -y flag is used so that packages are installed automatically without user confirmation.

Installing numpy via mamba was 25% faster than installing it via conda!

Let’s try to create a large environment by saving the environment definition file from above to env.yml and installing directly from there.

time conda env update --file env.yml

real 10m51.233s
user 10m4.853s
sys  0m12.286s
time mamba env update --file env.yml

real 1m0.634s
user 0m45.550s
sys  0m4.051s

Mamba is astonishingly 10 times faster at resolving this environment!

Why is mamba faster than conda?

Each time we install a new package in an environment, the environment manager must perform the following steps:

  • Collect the package metadata
  • Resolve the environment (i.e. check which packages are already installed and check for consistency)
  • Download the packages
  • Install the downloaded packages

The step where conda typically spends a lot of time is in resolving the environment. The difference between the package managers here is that mamba leverages efficient C++ code and parallel processing to be faster than conda. The libsolv module used in mamba to resolve the dependencies and environment is also used in other places, e.g. in Unix distributions. In addition, mamba can perform parallel downloads instead of the conda sequential downloads.

Libmamba: Mamba speedup within conda

The libmamba solver combines the speed of mamba with the established brand of conda. It is activate by the following instructions:

conda install -n base conda-libmamba-solver
conda config --set solver libmamba

We measure again the time it takes to install the complex environment from before:

time conda env update --file env.yml

real 4m26.127s
user 3m43.397s
sys 0m11.437s

In this case, libmamba used within conda was 50% faster than plain conda, but still the speed up is not comparable to the one achieved through mamba.

Summary

Environment managers are critical for maintaining reproducible software environments during development and deployment. Conda is a popular package manager, but it can become very slow with large environments. Mamba serves as a drop-in replacement for conda. By using efficient code and parallel processing, the installation of new packages is much faster than using conda. The libmamba solver claims to reach mamba-like speed within conda.

Our test showed that mamba was 10 times faster in creating a large environment from scratch compared to conda. libmamba was 2 times faster than plain conda. So, next time you find yourself waiting for a long time for the conda environment to resolve, consider using mamba instead of spending a boring afternoon.

Further reading


Related Articles