
We have all been there,
"It worked on my machine!!".
Who wasn’t on either end of that statement?
As developers, data scientists, software engineers, we work on complex code bases that depend on many items in the background. When we want to share our code with colleagues or put up on Github as an opensource project, we need to ensure that the code will work on all different environments.
Sometimes – more often than we would like to admit – we try to run a friend’s code or a code that we got from the internet when the computer yells at us "Import Error." That error means that the code needs more information that it can’t find on your computer.
The solution for this is using Docker. Docker is a container management system that aims to facilitate sharing projects and to run them across different environments. Basically, Docker makes it easy to write and run codes smoothly on other machines with different operating systems by encapsulating the code and all its dependencies in a container.
This container makes the code self-contained and independent from the operating system.

Why we use dockers?
When we write code for Data Science or machine learning applications, we often have many concerns that make using a Docker the best option for our applications. These concerns are:
- Ensure that the application will work on all environments in the same manner.
- Save those who will use/ run your application the trouble of handling dependencies and installation problems.
- Avoid working with virtual machines.
- Focus on building the application instead of worrying about managing dependencies.
Docker Basics
So, how does Docker work?
To understand that, we first need to cover some Docker terminology, let’s get into it.
Images
Images are archive with all data needed to run an app. If you’re familiar with Programming languages, you can think of an image the same way you do a class. Classes are blueprints; they contain necessary data to generate intense, while images are blueprints with needed data to create containers.
Images don’t change, which means whatever changes you perform on a particular image will not be saved unless you save a copy of the image.
Containers
A container is an enclosed environment where your app runs. Containers only have access to the resources it is allowed to (storage, CPU, memory), and does not know anything else about the machine it is running on. A container only has access to a Linux distribution with the information needed to run the application.
Containers leave no data behind by default. Any changes made to a container, as long as you don’t save it as a new image, are lost as soon as it is removed.
Dockerfiles
Dockerfiles are files that have the needed information to run an application. Every image must contain at least – and preferably – on Dockerfile. Dockerfiles can be divided into three main sections:
- The base image: The core of the application. For example, if the application needs Python 3 to run correctly, then Python3 will be the base image, and additional libraries will be included in the instruction set.
- Instruction set: The instruction set includes RUN commands; each one represents an additional library or binary the needs to be installed or run.
- Entry command: These are the commands that run once all needed libraries are installed. For example, an entry command can be Open Jupyter Notebook, or Run the Commandline, etc.
You can think of an image like an onion, the base image is the heart of the onion, and each instruction is a new layer in the image. That’s why you need to pay attention to how you’re layering your instruction.

Volumes
Since images are fixed, and containers have a short memory – similar to RAMs – what happens if we have data that is needed to run the application?
Here’s where volumes solve the problem. When we have data needed for the application, we can go one of two ways: either access the data locally or from a volume. Accessing the data locally – having local mount points – requires you to select a specific directory on the local machine where the data is stored.
Volumes are used for shared data when you don’t know anything about the host machine – where the application will run.
Registries
Registries are the repository-equivalent for Docker images. It allows you to pull and push containers images. You can distribute your images directly from your Docker host, or use a cloud agent like Kubernetes, Docker Swarm or DockerHub. Using such services allows you to gain useful features, such as automated deployment and scaling.
How to start using Docker?
Step №1: Installing Docker
To install Docker n your device, head to the official Docker website and install the correct version for your machine. To make use that your installation went correctly, try running the following command:
docker run
If you get something like the following, then everything is up and running, and you are ready to get to work!

Step №2: Get to know the basic commands
Dockers are quite a broad concept. However, you can get very far by knowing the basic 6 commands, run, ps, rename, stop, start, and logs.

Step №3: Prepare the requirements file
You can either add the binaries and needed libraries to the Dockerfile directly; it’s better to have them in an independent file. This file is often called requirments.txt. Here’s an example of a requirements.txt file.

Step №4: Prepare the Dockerfile
Write a simple, efficient Dockerfile. It’s time to assemble the onion from the inside out. We need to set a base image and the entry commands.

Step №5: Create a Docker image
Often our code is hosted on GitHub, which makes it very easy to create an image of the code. If you have GitHub locally, you can use the command line and run the repo2docker command to create an image from your repo.
However, if the code you’re trying to create an image for is on GitHub, then you can use myBinder too generate and host your repo’s image. You will be able to access the image using a link provided by myBinder. If you are using myBinder, the requirement.txt will be called envrionment.yml, and it will contain the same information as the requirements.
To use myBinder, you need to have the link of your repo‘s master branch. For this article, I am using a repo I created for an event.

If you’re using Anaconda navigator, you can use the Conda command line to generate the envrionment.yml file of a specific environment using this command:
conda env export --name ENVNAME > envname.yml
Step №6: Run a container
To run a container, you can use the run command we mentioned previously if the container is already downloaded on your machine. However, if you use myBinder and the image is hosted on the cloud, you can access it by using the link generated with the image. The link is added to your readme.md file and cause a badge that starts a container.

Tips and tricks for best practice
- Always make sure you’re using the most efficient base image. For example, in the case of Python3, choose slim-buster or stretch-buster. They have the full support and work well with most DS and ML libraries.
- Use labels to provide important information like usage tips and extra information about the application and the needed libraries and how they are used.
- Split the run commands to make them more readable. Put all the needed libraries in a requirments.txt file to keep things organized.
- Only install the necessary packages. It makes building and running the images more efficient.
- Ignore files explicitly to avoid security risks (add them to the .ignore file).
- Avoid adding data, either pull data from a database or the cloud (use bind mounts) but don’t hard code them in the image.
- If you are starting with Docker and want a standard project template, use the CookieCutter data science or CookieCutter docker science project templates.
Docker can get quite complicated and challenging to get a handle of, but the best thing to do is to keep practicing and try to make use of the powerful features Docker provides.
Even if you are not entirely familiar with Docker, the primary use of it offers extreme control and power over your applications. Using only the basic commands we covered in this article, you can harness Docker’s power and use it to share, deploy, and develop your applications.