
Overview
We’re going to jog through the following topics in order:
- It works on my machine
- Virtual Machines (VMs) and containerisation
- Docker – A Hello World! example
- Next steps – Putting models into production, Kubernetes, Helm, MLOps, and DataOps
It works on my machine
Over my time working in Data Science I’ve seen many people struggle with (or simply ignore!!) the concept of environments.
With tools like [pip](https://pypi.org/project/pip/)
and [conda](https://docs.conda.io/en/latest/)
making it easy to install packages and libraries on the fly, it can be very tempting to just download everything you’ll ever need into the default environment and be done with it. Things will work fine for a while, but eventually, you need to run your project on another machine or share your model with a colleague. You send them the code and even kindly write a requirements.txt for them.

However, like posting an envelope full of glitter, this isn’t harmful but it’s not pleasant either. Your colleague probably doesn’t want to install buckets of irrelevant packages just to get it to run or they may even have conflicting versions of certain packages which can lead to a much bigger headache. There are ways around this such as [pipreqs](https://github.com/bndr/pipreqs)
but it’s not ideal.
The thing is, you know this, you’re savvier than that. You were kind enough to build a project environment and save only what’s needed. Perhaps using conda
(like I have here) or something like [venv](https://docs.python.org/3/library/venv.html), [virtualenv](https://virtualenv.pypa.io/en/latest/),
or whatever your favourite package manager is.

Now that’s better. You have a concise list of what’s needed to run your project and you won’t annoy everyone you work with.
It’s worth pointing out at this stage – this isn’t a trivial problem. Bad practice with environments is often the first of many hurdles to getting data science teams collaborating effectively and delivering models into production.
There’s obviously a lot more to environments, but this leads us on to our next suite of problems. What if my colleague has conflicts? What if they can’t be trusted to install stuff properly? Is the package available in their operating system (OS)? Also, for large complex projects that do have a lot of requirements, installing them all could take time and it adds another layer of frustration into the mix. Sometimes, multiple projects you’re developing have strong interdependencies and keeping track of them all can be a nightmare.
What if we could just bundle up everything, exactly as it is on our machine so that it can be run in the same state everywhere?
Virtual Machines (VMs) and containerisation
One potential solution is to create an image of your whole system – OS, installed packages, code – everything! Then run this image as a virtual machine. This is a good option and, in fact, the approach used by many. Luckily, many people have already done this and there are plenty of virtual machine images packed with most of what you’ll need to get your project going. For the budding data scientist just getting into the cloud, I’d strongly advise checking out the available data science virtual machines on AWS, Azure, and GCP.
Now, VMs are great and in many cases, they’re the right solution but they can be bulky when all you want is to run a reasonably lightweight Python project. That’s because VMs capture the whole OS and require something called a hypervisor to keep them working. Here’s a great diagram from docker.com:


- Ensuring that applications work on all environments in the same way
- If you’re in a larger team – perhaps your organisation serves models through a website – containers save the infrastructure and other teams from the trouble of handling dependencies and installation problems
- Once you have the hang of it, containers allow you to focus on building the project instead of worrying about managing dependencies
If you want more detail on what containers are, what Docker is and how it works, the documentation here is great.
Docker – A Hello World! example
Now, there are plenty of Docker tutorials out there and because many people come to Docker from the world of web development they tend to be geared towards getting a website up and running. This is great stuff and if you have the time, I suggest you run through some of them – this YouTube tutorial is a good start and comes with some great interactive labs. What I’m going to attempt here is a walkthrough of setting up a simple service for a prediction function.
Requirements
Before we get started, you’ll need to ensure you have these installed if you want to follow along:
- Windows or macOS: Install Docker Desktop
- Linux: Install Docker and then Docker Compose
- Anaconda: Install from here
- Python and another environment manager if you’re not keen on Anaconda
A basic Python service
Let’s start by creating a Flask service called app.py.
I’m working in the Anaconda PowerShell Prompt, on windows, using conda
but the process is really similar to other terminals/environment managers.
Start by creating a fresh environment to work in, switching to it, and installing Flask:
conda create -n <environment_name> -y
conda activate <environment_name>
conda install flask -y

Now with Flask ready to go a simple "Hello World!" might look like this:
# import modules
from flask import Flask
# define the app
app = Flask(__name__)
# define the function to be called
@app.route("/")
def hello():
return "Hello World!"
# define the main function
if __name__ == "__main__":
app.run(host="0.0.0.0")
Now in this case our requirements will be really simple. To generate out file use:
pip freeze > requirements.txt
You should get something like this:

So our package has the following structure:
app
├─── requirements.txt
└─── src
└─── app.py
The structure is pretty standard – I’m a big believer in standardising project folders for ease of collaboration and reuse, watch this space for a blog about how I use Cookiecutter to achieve this for more complex projects.
So to run this app, we just need to install our requirements into an environment with a working Python interpreter and we should be good to go. As stated earlier though, there are hidden issues we may not yet have seen around conflicts etc. Let’s move on to containers and how they make things easier.
Dockerfiles
Docker all revolves around the Dockerfile. This is a file that tells Docker how to put the container together. Note, this is a text file named Dockerfile
with no extension.
The basic process is:
- Create a
Dockerfile
that contains the instructions for building the image - Pull the base image you want to use using
docker pull <image_name>
- Build the image using
docker build . -t <image_name>
- Now you can simple
docker run <image_name>
from anywhere that has Docker working and your app will be up and running
Below is a simpleDockerfile
for our app:
# 1 - set base image
FROM python:3.8
# 2 - set the working directory
WORKDIR /opt/app
# 3 - copy files to the working directory
COPY . .
# 4 - install dependencies
RUN pip install -r requirements.txt
# 5 - command that runs when container starts
CMD ["python", "/opt/app/src/app.py"]
Under the hood, a Docker container is built layer by layer. Dockerhub is the default registry that Docker will pull images from (but it is possible to host your own privately – great for sharing work internally!). Docker will then pull your base image and start layering the other commands on top. In this case, we’re starting from the Python 3.8 image. The syntax here is straightforward <base_image>:<image_tag>
where the tag lets you pick specific versions or flavours of a base image.
From there, we set the directory we want to work in, then copy all of the files in the current directory into it. Note that in this file we’re using relative paths so where you run Docker from is important, it is possible to use absolute paths if that’s your preference. Once everything is copied over, we use pip to install the app’s requirements.
The final line essentially defines what the first command the container should run when it initially boots. If the container is left doing nothing it will immediately close (by default – this can be tweaked).
Building and running
Now that we have our Dockerfile
defined we can build it using:
docker build . -t <image_name>
Where we’re simply pointing to the files we want to build from (in this case the current directory), and using -t
to give our image a name. That might look something like this:

Our final step is to run this app to show how it’ll work from Docker.
docker run -d -p <host_port>:<container_port> <image_name>
There’s a couple of parameters to highlight here:
-d
allows us to run the container in detached mode (keeping our terminal free).-p <host_port>:<container_port>
maps the internal port to an external one – otherwise, it’ll remain isolated and we won’t be able to see in. Networking can get complex quickly with Docker and is beyond the scope of this article. For our example, we’ll use5000:5000
as that’s the default for Flask.
Once we’re up and running we can check the container is working using docker ps
which will show us things like name, image etc. Running these then you should have something that looks like this:

As you can see, running a container will return an ID for it, Docker also generates a human-readable name at random, infallible_curie in this case. Both of these can be used to reference the container in other commands which is handy. Then finally, just to prove the app is working we can visit [http://localhost:5000](http://localhost:5000)
in our browser.

Next steps
So, this might all seem pretty basic but I hope the benefits are clear – we now have a scriptable, repeatable way to record and redeploy our environments so that they’ll run anywhere Docker will. This is great for collaboration, audit, and scaling our operations. In the next part of this series we’ll:
- Walkthrough building a basic predictor service in Docker
- Discuss some of the benefits for projects and sharing with your stakeholders and customers
- Look at how Docker can be used to scale with ease and introduce Kubernetes and Helm
- Discuss MLOps, DataOps, and other parts of the Machine Learning Engineers workload.
You can find that here:
I hope this was useful, please let me know if you have any feedback. You can catch me on
- Twitter at www.twitter.com/adzsroka
- LinkedIn at www.linkedin.com/in/aesroka
- Or at www.adamsroka.co.uk