Why use Docker containers for Machine Learning?

Resolving The “Tt works in my machine” Problem

Xavier Vasques
Towards Data Science

--

First Things First: The Micro-services

The first thing to understand before talking about containerization is the concept of micro-services. If a large application is broken down into smaller services, each of those services or small processes can be termed micro-services and they communicate with each other over a network. The microservices approach is the opposite of the monolithic approach which can be difficult to scale. If one particular feature has some issues or crashes, all other features will experience the same. Another example is when the demand for a particular feature is seriously increasing, we are forced to increase the resources such as the hardware not only for this particular feature but for the entire application generating additional costs that are not necessary. This cost can be minimized if a micro-services approach is taken by breaking down the application into a group of smaller services. Each service or features of the application is isolated in a way that we can scale or update without impacting other application features. To put machine learning into production, let’s consider that the application needs to be broken down into smaller micro-services such as ingestion, preparation, combination, separation, training, evaluation, inference, postprocessing and monitoring.

Containerization

Micro-service architecture also has its drawbacks. When you are developing your machine learning application in one server, you will require the same number of virtual machines (VMs) as microservices containing dependencies. Each VM will need an OS, libraries and binaries, and consume more hardware resources such as processor, memory and disk space even if the micro-service is not really running. This is why Docker comes in. If a container is not running, the remaining resources become shared resources and accessible to other containers. You do not need to add an OS in a container. Let’s consider an entire solution composed of applications 1 and 2 (resp. APP 1 and APP 2). If you want to scale out the APP 1 or add other applications as shown in the scheme below, you can be limited using VMs instead of containers by the resources available to you. If you decide to scale out, only APP1 and not the APP 2 (just keep a single one), the APP 2 is becoming share of all container processes.

Virtual Machines versus Containerization (Image by Author)

Docker and Machine Learning: Resolving The “It works in my machine.”

Creating a machine learning model that works in our computer is not really complicated. But when you work for example with a customer that wants to use the model at scale, a model that can scale and function in all types of servers across the globe, it’s more challenging. After developing your model, it might run perfectly well in your laptop or server but not really on other systems such as when you move the model to the production stage or another server. Many things can happen like performance issues, the application crashes or is not well optimized. The other challenging situation is that our machine learning model can certainly be written with one single programming language such as python but the application will certainly need to interact with other applications written in other programming languages for data ingestion, data preparation, front-end, etc. Docker allows to better manage all these interactions as each micro-service can be written in a different language allowing scalability and the easy addition or deletion of independent services. Docker brings reproducibility, portability, easy deployment, granular updates, lightness, simplicity.

When a model is ready, the data scientist anxiety is that the model do not reproduce the results of the real life or when the work is shared with teammates. And sometimes, it’s not because of the model but the need to reproduce the whole stack. Docker allows to easily reproduce the working environment that is used to train and run the machine learning model anywhere. Docker allows packaging the code and dependencies into containers that can be ported to different servers even if it’s a different hardware or operating system. A training model can be developed on a local machine and be easily ported to external clusters with additional resources such as GPUs, more memory or powerful CPUs. It’s easy to deploy and make your model available to the globe by wrapping it into an API in a container and deploy the container using technology such as OpenShift, a Kubernetes distribution. The simplicity is also a good argument in favor of the containerization of machine learning applications as we can automatically create containers with templates and have access to open-source registry containing existing user-contributed containers. Docker allows developers to track the different versions of a container image, check who built a version with what, and roll back to previous versions. Finally, another argument is that your machine learning application can continue running even if one of its services is updating, repairing or down. For example, if you need to update an output message that is embedded in the entire solution, there is no need to update the entire application and to interfere with other services.

--

--

CTO and Distinguished Data Scientist, IBM Technology, France Head of Clinical Neurosciences Research Laboratory, France