Jupyter Data Science Stack + Docker in under 15 minutes

Published in

Towards Data Science

4 min readOct 1, 2017

Motivation: Say you want to play around with some cool data science libraries in Python or R but what you don’t want to do is spend hours on installing Python or R, working out what libraries you need, installing each and every one and then messing around with the tedium of getting things to work just right on your version of Linux/Windows/OSX/OS9 — well this is where Docker comes to the rescue! With Docker we can get a Jupyter ‘Data Science’ notebook stack up and running in no time at all. Let’s get started!

Docker is allows us to run a ‘ready to go’ Jupyter data science stack in what’s known as a container:

A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Available for both Linux and Windows based apps, containerized software will always run the same, regardless of the environment. Containers isolate software from its surroundings, for example differences between development and staging environments and help reduce conflicts between teams running different software on the same infrastructure.

So what’s the difference between a container and a virtual machine?

Containers and virtual machines have similar resource isolation and allocation benefits, but function differently because containers virtualize the operating system instead of hardware, containers are more portable and efficient.

To get started with Docker you’ll need to install Docker Community Edition. Download the appropriate installer for your environment here. Once you’ve installed Docker, restart your machine and we can move on to getting our Jupyter container set up. When running a container you’ll need to tell Docker what the base image for that container is.

So what’s an image and how does it relate to a container?

A Docker image is built up from a series of layers. Each layer represents an instruction in the image’s Dockerfile. Each layer except the very last one is read-only.

A container is simply a running instance of an image. The real time-saver with using Docker is that someone else has already built an image that has everything we need to start using a fully kitted-out Jupyter data science stack! All we need to do is tell Docker to start up a container based on that pre-defined image. To do this we’re going to write a simple recipe in the form of a Docker compose file. Save the file below as docker-compose.yml to your working directory:

After you’ve saved the file, open it with your favourite editor and change the section that says:

/Absolute/Path/To/Where/Your/Notebook/Files/Will/Be/Saved

to the path on your local machine where you want your work to be saved. Make sure that the path actually exists, that is, any directories have already been created. Failing to do so will cause errors when you attempt to start your container and worse yet, you won’t be able to save any work you do in the container!

Great, so we’ve got our Docker compose file ready to go, now we can use the docker-compose command to start up our container. Open a terminal or command prompt and change into your working directory and run the following command:

docker-compose up

You should see something like the following output in your terminal/command prompt (I’ve omitted some of the output for brevity):

$ docker-compose upCreating network “jupyternotebook_default” with the default driverCreating datascience-notebook-container …Creating datascience-notebook-container … doneAttaching to datascience-notebook-containerdatascience-notebook-container | Execute the command: jupyter notebook.
.
.datascience-notebook-container | [I 11:37:37.993 NotebookApp] The Jupyter Notebook is running at: http://[all ip addresses on your system]:8888/?token=123456789123456789123456789123456789datascience-notebook-container | [I 11:37:37.993 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).datascience-notebook-container | [C 11:37:37.994 NotebookApp]datascience-notebook-container |datascience-notebook-container | Copy/paste this URL into your browser when you connect for the first time,datascience-notebook-container | to login with a token:datascience-notebook-container | http://localhost:8888/?token=123456789123456789123456789123456789

The last line is a URL that we need to copy and paste into our browser to access our new Jupyter stack:

http://localhost:8888/?token=123456789123456789123456789123456789

Once you’ve done that you should be greeted by your very own containerised Jupyter service! You can learn more about what the Jupyter data science stack gives you be visiting this link.

To create your first notebook, drill into the work directory and then click on the ‘New’ button on the right hand side and choose ‘Python 3’ to create a new Python 3 based Notebook. Once you’ve done so, check your the path where you chose to save your work to locally and you should see your new Notebook’s .ipynb file saved!

To shut down the container once you’re done working, simply hit Ctrl-C in the terminal/command prompt. Your work will all be saved on your actual machine in the path we set in our Docker compose file. And there you have it — a quick and easy way to start using Jupyter notebooks with the magic of Docker.

I hope you found this article helpful, please leave any feedback or comments below. Happy hacking!

Jupyter Data Science Stack + Docker in under 15 minutes

Written by Tanbal تنبل