Conda: Essential Concepts and Tips

For beginners and experienced users

Giacomo Vianello
Towards Data Science

--

In this blog post, I will describe what conda is, and how to use it effectively, whether it is the first time you look at it or you are a seasoned user. While in the latter case many things will be known to you, you will still find some tricks you could use to improve your experience.

Photo by Patrick on Unsplash

Table of contents

· Why conda
· Entry-level examples
· What is conda
Conda and pip
Conda Vs Anaconda Vs Miniconda
· How to install conda
Getting Miniconda
· Installing packages, and environments
The base environment
Other environments
The best way to use Jupyter and conda
Removing environments
Sharing an environment
· Channels
The conda-forge channel
· Conda and pip
· Conda is slow
· Free up some disk space
· Conda and docker
· In-depth: RPATH and conda
· Conclusions

Why conda

Conda is for you if one or more of the following are true:

  • You spend way too much time installing and configuring software instead of focusing on the projects you need to work on
  • You invariably end up with a giant mess that you have to clear up and then restart from scratch
  • You want to be able to move your environments between machines
  • You need to install conflicting requirements for different projects on the same machine
  • You want a repeatable, trackable installation process that you can share across platforms and distributions, which could also include non-python packages
  • You just got a new machine (maybe with a GPU) and you want to get started quickly
  • You want to always experiment with the latest version of everything, but don’t want to deal with incompatibilities and figuring out which version goes with which version
  • You have multiple machines with different Linux distributions or even a mix of Linux/macOS/Windows, and you want the same environment across them

If any of this is true, conda is going to be very useful for you.

Entry-level examples

Let’s say you just got a new shiny machine with a GPU and you want to install the NVIDIA CUDA toolkit. Do you want to do this and spend the next 5–10 hours on the task, or you’d rather do this and be on your way?

> conda install -c conda-forge cudatoolkit=10.2

Do you want to install pytorch and jump on the Deep Learning bandwagon?

> conda install pytorch cudatoolkit=10.2 -c pytorch

Do you want to install Tensorflow and Pytorch in the same environment? It’s not madness, it’s just:

> conda install -c conda-forge -c pytorch python=3.8 cudatoolkit=10.2 pytorch pytorch tensorflow tensorboard

and much more. In general, most things are one conda install away. And, most importantly, if you mess up you just remove the environment and retry, no trace is left in your system.

I hope this gives you a hint of why conda is such an amazing tool. Let’s jump into it and learn (much) more about it.

What is conda

Conda is a cross-platform, open-source package manager.

You are probably already using a package manager on your system. In Linux, you might have one or more of apt, yum or snap. In macOS you have homebrew or others. Conda is similar, but it is platform-independent. It is indeed available for Linux, Mac, and Windows and it works in the same way on all 3 platforms.

Conda and pip

A common mistake is to compare conda with pip . They are similar, but also very different in a way: pip is a package manager centered around Python (after all the repository of pip packages is called the Python Package Index), while conda is language-agnostic. While pip packages do contain sometimes compiled libraries (for example, numpy, scipy, pandas…), in general, these are only supporting Python interfaces. Usingconda instead you can install C/C++ libraries, compilers, graphic libraries, full-fledged applications that have nothing to do with Python. You can even use conda to install GPU software like the NVIDIA CUDA toolkit, for example, or the go language, or npm, julia and so on. In other words, the conda ecosystem goes way beyond Python. Therefore conda is much more similar, as we said, to yum or homebrew than to pip. In fact, a typicalconda installation containspip as just another package, and you can install packages in your conda environment using pip (but see the caveats below before you do so).

Conda Vs Anaconda Vs Miniconda

As said above, conda is the package manager.

Anaconda is a collection of conda packages that contain most of the things that you are going to use on your daily job (including, of course, conda the package manager). Some people prefer it because it offers a nice graphical user interface to handle environments and packages instead of a command-line tool (CLI). I personally never use Anaconda, because I find it bloated and I don’t mind the CLI. I use instead Miniconda.

Miniconda is a barebone collection of packages: it contains python, the conda package manager as well as a few other packages (including pip). It does not have any Graphical User Interface, only the CLI. It is your starting point for building an environment that contains what you need, and nothing more. It is also super-useful for building Docker containers (see below). I personally only use Miniconda.

How to install conda

For this blog post, let’s focus on Miniconda and the CLI. This is by far the most common use case, but if you want to use Anaconda and its GUI, most of the things discussed apply there as well.

Getting Miniconda

Go here and download the self-installing package appropriate for your system (Linux, Windows, or Mac). Save it somewhere.

Alternatively, run this command:

NOTE: all command line examples here will assume Linux and the bash shell. If you are on Mac and on bash, things will be identical. If you are using another shell there might be minimal changes. For example, in the following command, substitute Linux with MacOSX if you are on Mac. I won’t cover Windows, but it should be easy to translate the commands to the shell in Windows (or you can use the WSL

# Download Miniconda. You can also use wget or any other command 
# line download tool for this purpose
> curl https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh > Miniconda3-installer.sh

Now go to wherever you downloaded the file, and install Miniconda by running:

# Add run permissions for current user
> chmod u+x Miniconda3-installer.sh
> ./Miniconda3-installer.sh -b -p ~/miniconda3

You can substitute -p ~/miniconda3 with any other prefix, however, it is generally a good idea to keep that prefix. For simplicity, the rest of the blog post will assume that path for Miniconda.

Congrats, you have installed Miniconda! Now let’s put it to use by adding it to the PATH environment variable and then activate it:

If you find conda useful, you might want to add these lines to your .bashrc file (or whatever your shell uses as init script) so you won’t have to do it for every new terminal.

> export PATH=${PATH}:~/miniconda3/bin# While you could use `conda activate` instead,
# this way of activating is more portable
> source ~/miniconda3/bin/activate

Note that we put the miniconda3 path at the end of the PATH env variable. This is a trick I use so that the presence of Miniconda is minimally invasive. For example, if now I run python when I did not activate conda, I will still get into the python interpreter installed in my system, and NOT the one that comes with Miniconda. If I had put the path at the beginning of the PATH env variable, I would instead get into the Miniconda python, whether I activate conda or not.

We are now ready to go, you should see something like:

(base) giacomo@fd4af0669a8:/#

The (base) part tells you that you are in the “base” environment and conda is ready for use.

WAIT! DO NOT install anything in the base environment. Read the next section first

Installing packages, and environments

Packages can be installed in two ways. One is obvious, i.e., using the command:

> conda install [name of the package]

In some cases, you need to add a channel specification (see the section about channels).

Another way is by creating environments and specifying packages during creation. Let’s look at that.

Environments are one of the most useful concepts in conda. They are self-contained installations: everything inside an environment only depends on things that are within that environment.

Well, not 100%, but 99.9%. You still depend on a few core libraries of your system likelibc on Linux, for example, but these are advanced details (described at the bottom). For all intent and purposes and for 99.9% of the users, the approximation I just said is a good way to think about environments.

Yes, they are conceptually similar to the virtual environments of python, but they can contain many more things than just python packages.

If you are interested in how this is achieved, look at the bottom of this post.

The documentation about environments contains all the details. Here I am summarizing the main points, as well as offering some tips and lessons learned.

The base environment

The base environment gets created when you install Miniconda or Anaconda. It contains conda and packages that are needed for basic operations.

A typical beginner mistake is to install packages into the base environment. DO NOT DO THAT. Doing so will create conflicts with the other environments you are going to create.

The base environment should only contain conda, its dependencies, as well as conda-related things like conda-build or mamba (see below for what mamba is). Instead, you should always rely on other environments for any other software.

Other environments

Since we don’t want to install anything in the base environment, the first thing is to create your go-to environment for day-to-day operations. For example, a standard data scientist environment could look like:

> conda create --name ds -c conda-forge python=3.8 matplotlib numpy scipy pandas jupyter jupyterlab seaborn

Conda will print a long list of packages, that are needed to make everything work. You can review the list, and then enter y to confirm. If you want to avoid that prompt, just add -y to the command line.

This will create a ds environment and will install packages at the same time.

TIP: if you already know that you are going to need some more packages, install them this way instead of relying on conda install later, so that conda will figure out the best combination of dependencies that fulfill your requirements upfront

As you might have guessed, you can add there whatever package you need, and chances are, you are going to find it in the conda-forge channel. We will get to what this means in a second.

You now need to activate this environment:

> conda activate ds

Since this is our go-to environment, you might want to add the line above to your .bashrc or whatever init script your shell runs, so that every time you open a shell you will find yourself in the ds environment.

From now on, if you run conda install [package] , the new package will be installed in the active environment.

You don’t need to stop here. Say you want to try the bleeding edge python 3.9 today, but you don’t want to disrupt your day-to-day operations. Easy enough, just create a different environment:

> conda deactivate
> conda create --name py39 python=3.9 numpy scipy matplotlib
> conda activate py39

The conda deactivate part deactivates the current environment and brings you back to the base environment, before creating and activating the next one. Here we also see how to specify explicit versions of packages, like in python=3.9.

TIP: always run conda deactivate before activating a new environment. This avoids weird problems for example with environment variables not being unset

Now, anything you install in the py39 environment is completely independent of what is inside the ds environment.

NOTE: after activating an environment, if you look at the $PATH environment variable (echo ${PATH}) you will see that conda activate prepend the path to the bin directory of the environment. That means that anything you install there have precedence on what is installed anywhere else on your system.

Similarly, you can use environments for installing say an older PyTorch version that you need to reproduce an old paper or to install the go language separately from your python environment, just to give a few examples.

At any moment, you can see a list of environments with:

> conda env list

The best way to use Jupyter and conda

If you use Jupyter in your day-to-day work (and who doesn’t?) there is an extension that you cannot miss: nb_conda_kernels. This simple extension allows you to control which environment to use for which notebook, from the Jupyter (notebook or lab) interface. You just need to install it in your default environment (what we called ds above) and then start Jupyter from your default environment:

> conda activate ds
> conda install -c conda-forge jupyterlab nb_conda_kernels
> jupyter lab

This is what you will get:

Using nb_conda_kernels you can select which environment to use for every notebook, without leaving Jupyter

NOTE: you DO NOT need to install Jupyter in each environment, but you MUST install at least one kernel (for example, ipykernel for Python) in each environment. An environment without a kernel will not show up in the list of available environments in Jupyter.

Removing environments

If you want to delete an environment to free up some disk, or because you want to start from scratch, you can simply do:

> conda deactivate
> conda uninstall --name py39 --all

This will completely remove the py39 environment and all its packages.

Sharing an environment

An environment can be replicated in multiple ways. It is important to understand the differences between them.

First, you need to activate your environment.

Then, if you are exporting to the same platform (say Linux to Linux), just do:

> conda env export > environment.yml

The environment.yml file is similar to a requirements.txt file for pip, if you know what I’m talking about. It contains every single package that is contained in your environment, including of course C/C++ libraries and the like. It can be used to recreate the environment on the same platform like this:

> conda env create -f environment.yml

A typical use case is to “develop” an environment on Linux and then export it this way to be recreated within a Linux Docker container. This gives you a guarantee of an exact replica, not only of the things you installed manually but also of all their dependencies. Some of these will be platform-dependent, which will make it difficult or impossible to replicate this environment on a different platform (say from Linux to macOS).

If instead, you want your environment to be replicable across platforms, export this way:

> conda env export --from-history > environment.yml

This version of the environment.yml file will only contain the packages that you explicitly installed during conda create or with conda install . When recreating the environment on a different platform (say, from Linux to Mac), conda will solve the dependencies of the packages listed in the environment file using what is available on the destination platform. Of course, you can use this environment file in the same way as before on the destination platform:

> conda env create -f environment.yml

A third option is a nuclear option: there is a way to package your entire environment in one binary file and unpack it at the destination (this of course only works within the same platform). This is very useful in some circumstances, most notably when using conda with PySpark. This is achieved using conda-pack, details here.

Channels

Conda packages are hosted remotely in channels, which are the equivalent of the repositories of yum or apt , for example. The default channels are used if you do not specify any -c parameter during conda install or conda create . Otherwise, the channels you specify are used. This means that conda will look for a certain package not only in the default channels but also in the one you specify. For example, a command like this:

> conda install -c pytorch torch

will look for the torch package in the pytorch channel as well as in the defaults channel. Then, the available versions will be compared with the rules explained here.

TIP: always turn on the strict channel policy. It makes conda faster, and helps to avoid conflicts. You can do it with conda config --set channel_priority strict

The conda-forge channel

The conda-forge organization maintains a huge list of packages in the conda-forge channel. You can find the recipes to build these packages indexed on their website with links to GitHub. You can also easily contribute your own packages to conda-forge.

If you look around the net, there are lovers and haters of conda-forge. I am in the former category. However, you do need to know some important things to use it.

There are compatibility problems between conda-forge packages and packages contained in the default conda channels. Therefore, you should always set up channel_priority: strictas explained in the previous section and give priority to the conda-forge channel over the default channels when installing things or creating environments. There are two ways of doing that. Either you always specify conda install -c conda-forge or conda create -c conda-forge using always conda-forge as the first listed channel, OR, you create a .condarc file in your home with this content:

channels:
- conda-forge
- defaults
channel_priority: strict

An environment should either use conda-forge or not, from creation to destruction. Do not mix and match. If you created it without using the conda-forge channel, then do not add it to the mix halfway. In practice, I always create an environment with conda-forge unless in very specific cases where I found incompatibilities.

A third option is to define a bash function in your .bashrc like:

condaforge() {

# $1 is the command, ${@:2} is every other parameter
# Print the command before executing it
echo "==>" conda "$1" -c conda-forge "${@:2}"
conda "$1" -c conda-forge "${@:2}"
}

and then use it to create environments and install packages:

> condaforge create --name test python=3.8

or

condaforge install numpy

This function will add the -c conda-forge channel to your conda installation commands.

TIP: sometimes it is hard to remember if a given environment was created using conda-forge. A simple trick is to always prepend cf to the environment name, like conda create --name cf_py39 -c conda-forge python=3.9 . Or, use conda list | grep pythonto see if python was installed from the conda-forge channel or not. In the former case, you will see something like:

python       3.8.5     h1103e12_7_cpython    conda-forge

Conda and pip

Conda and pip have been historically hard to combine. Things are much better these days, and in many cases, they just work out.

However, you should in general give precedence to one or the other. For example, normally you would give priority to conda, i.e., you try to install a package with conda first. If that’s not available, then use pip.

In some cases, you might want to do the opposite, by giving priority to pip. In that case, create an environment with just python+pip and nothing else, then use pip from that moment forward.

A compromise is to use conda to install the basics (numpy, scipy, matplotlib, jupyter…) as well as big dependencies with heavy compiled libraries like PyTorch, TensorFlow, the Cuda toolkit, OpenCV, or tricky and obscure packages like GDAL and rasterio. Install instead all python-only packages with pip. This is very unlikely to give you problems.

Note that if you are installing with pip a package that depends on say numpy, and you already installed numpy with conda, pip will recognize it and it will NOT install it again (unless there is a conflict of versions). This is why most of the time things just work.

Finally, notice that when you export an environment as explained above, the environment file does contain all the pip-installed packages as well as the conda-installed ones, so using pip in a conda environment does NOT prevent its sharing.

Conda is slow

One of the most common complaints about conda is that the package solver is slow. This is unfortunately true, although it is getting better with time. A very cool alternative just came out, and it is called mamba (see here).

Mamba is a drop-in replacement for conda installation procedures, so you can use mamba installinstead of conda install and mamba create instead of conda create . You will get the same results in a much much faster way. Pretty cool!

You need to install mamba in your base environment:

> conda deactivate
# Make sure your prompt says (base)
> conda install mamba

Free up some disk space

After a while, you will notice that your miniconda directory starts to be very big. The main reason is that conda keeps archives of all the packages that you ever downloaded. You can easily and safely clean these up with:

conda clean --all

and free up some space.

Conda and docker

You can use Miniconda to install an environment when building a Docker container (i.e., within the Dockerfile). There are a few good reasons as to why thighs might be a good idea in some situations:

  1. You can use an environment file and replicate exactly your development environment. This is a quick and easy way to ensure that what you develop will find the same dependencies you used during development
  2. You can install things from conda-forge, for example. This allows you to install things even when the distribution you are using in the container (the base image) does not have the package you need
  3. There are readily-available images on Docker Hub that contain already conda pre-installed: https://hub.docker.com/r/continuumio/miniconda3, so using conda is very convenient and doesn’t add much complexity
  4. You can install and run packages without having to be root , which sometimes is better from a security perspective, especially if these packages start web services, for example

Note that in your Dockerfile you should run conda clean --all -y at the end of your installation command to remove useless archives that would waste image space.

In-depth: RPATH and conda

You absolutely don’t need to know this, but if you are curious as to how conda achieves independent environments even in the case of compiled libraries, here it is.

When you compile a C/C++ or a FORTRAN package or some other compiled library/executable you can link the libraries you depend on statically, or dynamically. Most modern applications are compiled with dynamic links. This means that at runtime the system needs to look into your executable or your library, understand what other libraries it depends on, and find them in your system (dynamic linking).

The system has some predefined places where it looks for these libraries, and it maintains a cache of those paths. For example, on Linux, these are places like /lib or /usr/lib and so on. You can see a list of all the libraries known to a Linux system with:

> ldconfig --verbose

macOS has a similar system. Now if we take an executable or a library, we can see which other libraries it depends on. For example, this prints the libraries that my system vimdepends on:

> ldd /usr/bin/vim
linux-vdso.so.1
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
libpython3.8.so.1.0 => /lib/x86_64-linux-gnu/libpython3.8.so.1.0
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
/lib64/ld-linux-x86-64.so.2
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1

As expected, these are all system libraries. If I do the same on a conda vim:

> conda install -c conda-forge -y vim
> ldd ${CONDA_PREFIX}/bin/vim
linux-vdso.so.1
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
libtinfo.so.6 => /root/miniconda3/envs/py39/bin/../lib/libtinfo.so.6
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
libpython3.9.so.1.0 => /root/miniconda3/envs/py39/bin/../lib/libpython3.9.so.1.0
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
/lib64/ld-linux-x86-64.so.2
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1

we can see that now the linker found libtinfo andlibpython in the conda environment instead, even though they are available in the system /lib . Why? Because vim in conda has been compiled using RPATH, i.e., the path is hardcoded in the executable and it points to the conda environment.

This means that the vim I installed in this particular environment only links to libraries in the same environment, so I can have a different vim in a different environment or in my system somewhere with incompatible versions and things work just fine, without any conflict.

You can also note however that we still depend on some basic system libraries like linux-vdso (an interface to some kernel functionalities), libc and so on. These cannot be shipped with conda. This is also why conda environments are not as independent from the system as Docker containers. However, these system libraries are guaranteed to be backward compatible, and they provide the same interface on any Linux distribution, so they are unlikely to cause any problem.

The same is true for macOS, even though the system libraries are different.

You can even see the RPATH setting that conda uses right there in your environment. Every time you do a conda activate and activate an environment, conda sets some environment variables:

  1. CONDA_PREFIX: the path to the root of the active conda environment
  2. LDFLAGS: this is a standard env variable that is read by the linker at compile time. If you look into it you will see the RPATH setting. For example, on my laptop I see:
(ds) giacomov@gr723gad9:~$ echo $LDFLAGS
-Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,-rpath,/home/user/miniconda3/envs/ds/lib -Wl,-rpath-link,/home/user/miniconda3/envs/ds/lib -L/home/user/miniconda3/envs/ds/lib

The -rpath flag instructs the linker to hardcode that path as the RPATH in any executable. This means that you compile an executable or a library while an environment is active, it will have the RPATH setting in it. This is the basis of how conda-build works, but that’s for another day.

3. CXXFLAGS, CFLAGS: specific flags for C compilers, including the path to the headers in the conda environment

Conclusions

While there is certainly more to say, this post is long enough as it is. I hope you got useful information out of this, whether you are a beginner or not. Feel free to leave a comment and to ask questions.

--

--

Principal Data Scientist at Cape Analytics. Formerly a Research Scientist @ Stanford working in high-energy Astrophysics. www.linkedin.com/in/giacomovianello