
Background
But it works on my machine?
This is a classic meme in the tech community, especially for Data Scientists who want to ship their amazing machine-learning model, only to learn that the production machine has a different operating system. Far from ideal.
However…
There is a solution thanks to these wonderful things called containers and tools to control them such as Docker.
In this post, we will dive into what containers are and how you can build and run them using Docker. The use of containers and Docker has become an industry standard and common practice for data products. As a Data Scientist, learning these tools is then an invaluable tool in your arsenal.
What is Docker?
Docker is a service that help build, run and execute code and applications in containers.
Now you may be wondering, what is a container?
Ostensibly, a container is very similar to a virtual machine (VM). It is a small isolated environment where everything is self ‘contained’ and can be run on any machine. The primary selling point of containers and VMs is their portability, allowing your application or model to run seamlessly on any on-premise server, local machine, or on cloud platforms such as AWS.
The main difference between containers and VMs is how they use their hosts computer resources. Containers are a lot more lightweight as they do not actively partition the hardware resources of the host machine. I will not delve into the full technical details here, however if you want to understand a bit more, I have linked a great article explaining their differences here.
Docker is then simply a tool we use to create, manage and run these containers with ease. It is one of the main reasons why containers have become very popular, as it enables developers to easily deploy applications and models that run anywhere.

Docker Technical Features
There are three main elements we need to run a container using Docker:
- Dockerfile: A text file that contains the instructions of how to build a docker. image
- Docker Image: A blueprint or template to create a Docker container.
- Docker Container: An isolated environment that provides everything an application or machine learning model needs to run. Includes things such as dependencies and OS versions.

There are also a few other key points to note:
- Docker Daemon: A background process ([daemon](https://en.wikipedia.org/wiki/Daemon(computing))) that deals with the incoming requests to docker._
- Docker Client: A shell interface that enables the user to speak to Docker through its daemon.
- DockerHub: Similar to GitHun, a place where developers can share their Docker images.
Installing Docker
Hombrew
The first thing you should install is Homebrew (link here). This is dubbed as the ‘missing package manager for MacOS’ and is very useful for anyone Coding on their Mac.
To install Homebrew, simply run the command given on their website:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Verify Homebrew is installed by running brew help
.
Docker
Now with Homebrew installed, you can install docker by running brew install docker
. Verify docker is installed by running which docker
, the output should not rise any errors and look like this:
/opt/homebrew/bin/docker
Colima
The final part, is it install Colima. Simply, run **** install colima
and verify it is installed with which colima
. Again, the output should look like this:
/opt/homebrew/bin/colima
Now you might be wondering, what on earth is Colima?
Colima is a software package that enables container runtimes on MacOS. In more laymen terms, Colima creates the environment for containers to work on our system. To achieve this, it runs a Linux virtual machine with a daemon that Docker can communicate with using the client-server model.
Alternativetly, you can also install Docker desktop instead of Colima. However, I prefer Colima for a few reasons: its free, more lightweight and I like working in the terminal!
See this blog post here for more arguments for Colima
Deploying With Docker Example
Workflow
Below is an example of how Data Scientists and Machine Learning Engineers can deploy their model using Docker:

The first step is obviously to build their amazing model. Then, you need to wrap up all the stuff you are using to run the model, stuff like the python version and package dependencies. The final step is to use that requirements file inside the Dockerfile.
If this seems completely arbitrary to you at the moment don’t worry, we will go over this process step by step!
Basic Model
Let’s start by building a basic model. The provided code snippet displays a simple implementation of the Random Forest classification model on the famous Iris dataset:
Dataset from Kaggle with a CC0 licence.
This file is called
basic_rf_model.py
for reference.
Create Requirements File
Now that we have our model ready, we need to create a requirement.txt
file to house all the dependencies that underpin the running of our model. In this simple example, we luckily only rely on the scikit-learn
package. Therefore, our requirement.txt
will simply look like this:
scikit-learn==1.2.2
You can check the version you are running on your computer by the scikit-learn --version
command.
Create Dockerfile
Now we can finally create our Dockerfile!
So, in the same directiory as the requirement.txt
and basic_rf_model.py
, create a file named Dockerfile
. Inside Dockerfile
we will have the following:
Let’s go over line by line to see what it all means:
FROM python:3.9
: This is the base image for our imageMAINTAINER [email protected]
: This indicates who maintains this imageWORKDIR /src
: Sets the working directory of the image to be srcCOPY . .
: Copy the current directory files to the Docker directoryRUN pip install -r requirements.txt
: Install the requirements fromrequirement.txt
file into the Docker environmentCMD ["python", "basic_rf_model.py"]
: Tells the container to execute the commandpython basic_rf_model.py
and run the model
Initiate Colima & Docker
The next step is setup the Docker environment: First we need to boot up Colima:
colima start
After Colima has started up, check that the Docker commands are working by running:
docker ps
It should return something like this:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
This is good and means both Colima and Docker are working as expected!
Note: the
docker ps
command lists all the current running containers.
Build Image
Now it is time to build our first Docker Image from the Dockerfile
that we created above:
docker build . -t docker_medium_example
The -t
flag indicates the name of the image and the .
tells us to build from this current directory.
If we now run docker images
, we should see something like this:

Congrats, the image has been built!
Run Container
After the image has been created, we can run it as a container using the IMAGE ID
listed above:
docker run bb59f770eb07
Output:
Accuracy: 0.9736842105263158
Because all it has done is run the basic_rf_model.py
script!
Extra Information
This tutorial is just scratching the surface of what Docker can do and be used for. There are many more features and commands to learn to understand Docker. I great detailed tutorial is given on the Docker website that you can find here.
One cool feature is that you can run the container in interactive mode and go into its shell. For example, if we run:
docker run -it bb59f770eb07 /bin/bash
You will enter the Docker container and it should look something like this:

We also used the ls
command to show all the files in the Docker working directory.
Summary & Further Thoughts
Docker and containers are fantastic tools to ensure Data Scientists’ models can run anywhere and anytime with no issues. They do this by creating small isolated compute environments that contain everything for the model to run effectively. This is called a container. It is easy to use and lightweight, rendering it a common industrial practice nowadays. In this article, we went over a basic example of how you can package your model into a container using Docker. The process was simple and seamless, so is something Data Scientists can learn and pick up quickly.
Full code used in this article can be found at my GitHub here:
Medium-Articles/Software Engineering /docker-example at main · egorhowell/Medium-Articles
Another Thing!
I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.
Connect With Me!
References & Further Reading
- Docker website: https://www.docker.com/
- Docker tutorial: https://docker-curriculum.com/
- General tips for Docker: https://github.com/veggiemonk/awesome-docker