Data Science without the Computer Science

An easier way to run reusable and independent Jupyter environments without learning Docker

Dan Lester
Towards Data Science

--

We all have some war stories. A virtualenv that couldn’t host a particular conda package on Windows. An attempted Python upgrade that wiped out the operating system’s native binary. Or a carefully curated virtualenv that finally does everything we need; but we daren’t make the next configuration change in case it upsets all of our hard work so far…

Environment management is important to a data scientist, but really you want it all to just work. I’ve heard this comment from multiple data scientists: we should probably use Docker or something to standardise and isolate our environments — but we don’t want to learn yet more computer science.

ContainDS is new software for Windows or Mac providing a simple user interface for running Jupyter Lab or Notebooks in independent virtual environments provided by Docker — all without having to learn how to control Docker from the command line.

Jupyter Logo © 2019 Project Jupyter

Based on a selection of ready-built environments (e.g. Jupyter with Tensorflow installed, or SciPy), you only need to specify a workspace (that is, a folder on your computer where you want to store your notebook files and any data), then with one click ContainDS will start up a virtual Linux ‘container’ which you can access directly through Jupyter in your web browser. It even takes care of Jupyter’s password token so you don’t have to copy and paste it…

You can install any other conda or pip packages you need, and then clone your environment to be reused for other projects or shared with your colleagues.

Get Me Started

Although ContainDS tries to shield you from the details of Docker, you do of course need to have Docker running in the background on your computer.

So install ‘Docker Desktop’ (Community Edition) on your computer. More details including system requirements are on the Docker website, and you can download the installer there if you sign up for an account on Docker Hub. More directly, you can download using the links straight to the installers here: Docker Desktop for Windows or Docker Desktop for Mac.

Next, install ContainDS for Windows or Mac from our Downloads page.

Choosing a base environment

Once ContainDS is up and running, and has detected Docker running correctly on your computer, you will see the ‘New Container’ screen:

ContainDS New Container Screen

Select one of the recommended Jupyter images — perhaps ‘datascience-notebook’ to get started for a Python installation with NumPy, Pandas, and more. Click SELECT and then you’ll see the configuration screen:

New Container config screen

All you really need to enter is a Workspace Path. This is just the location of a folder where you want to store notebook files and any data. It can be an existing folder if you already know which files you want to work with, or you can enter a new folder path which ContainDS will create for you.

Optionally, you can change the Container Name for future reference, and you can choose to launch into Jupyter Notebook — instead of the latest Jupyter Lab web front end — by unchecking the ‘Jupyter Lab’ checkbox.

Click CREATE to start downloading the image and creating the container. It may take a while to download the first time you use a particular image, but it will be cached for future use.

Launch straight into Jupyter

When the container is ready, you’ll see the familiar Jupyter console logs:

Click the WEB button and your default web browser will open already connected into Jupyter:

Jupyter Lab in web browser

You’ll notice that the web port has been assigned automatically by ContainDS, and you didn’t have to copy and paste any tokens or passwords since ContainDS has taken care of that for you.

Perfecting your environment

That’s about it. You now have your Jupyter environment and you can interact with it as you always used to.

To install other packages, you can open a Terminal window within Jupyter Lab (click the + icon and then select Terminal from the Launcher). Essentially, you’re just inside a Linux operating system at this point — which can make installation support for some packages much easier. The Jupyter images are all based on a conda virtual environment so you can use conda install or pip install.

Alternatively, just use the magic commands (e.g. !pip install <package>) directly inside your notebook.

Don’t completely forget about Docker

So using ContainDS meant you didn’t have to learn any Docker commands! But of course it’s still important to be aware that your environment is running through Docker. You’ll need to keep Docker running in the background. If you stop the container running, using the STOP button in ContainDS, then it may allocate a new port if you restart it in future — so you may need to click the WEB button again to launch a new Jupyter session on the new port.

If you completely remove a container then it won’t affect your workspace folders, but of course you will lose any modifications you made to the conda environment. To preserve such environmental changes first, and to enable you to reuse the environment in future, click on the Clone tab in ContainDS to take a snapshot.

What has ContainDS really done for me?

At this stage, all you’ve really done is to start a Docker container based on a Jupyter image that would have been available through the Docker command line. ContainDS made this easier for you, especially in ‘binding’ the workspace folder, taking care of the Jupyter token, and locating the web port.

ContainDS is actually built on top of Docker’s own bundled GUI application called Kitematic — updated and tailored for data scientists.

Going forward, there are many tasks and configurations that a data scientist might need Docker to do, and the intention is for ContainDS to continue to wrap up more of these for you. We need your feedback and thoughts to do this! So please get in touch to let us know more about what you want to achieve with your new containerized Jupyter environments…

Photo by frank mckenna on Unsplash

Having a GUI tailored to your needs makes things quick and easier, and saves having to learn Docker in more detail. But if you do want to see some of the Docker commands that you would have needed to achieve this manually, see Dataquest’s tutorial: Running a Dockerized Jupyter Server for Data Science.

--

--