The world’s leading publication for data science, AI, and ML professionals.

MLOps on Kubernetes with Docker Desktop

How to manage an entire MLOps pipeline on Kubernetes running on a Mac

The purpose of this article is to provide you with design and implementation ideas in the field of Machine Learning Operations (MLOps), describing a specific use case: How to implement an entire MLOps pipeline on a MacBook Pro. I’ve done this with Dataiku Data Science Studio (DSS) running on Kubernetes (K8s) with Docker Desktop. Even if your use case is different, you might benefit from my experience.

Photo by torben on Unsplash
Photo by torben on Unsplash

Behind the main narrative – the complete solution description – I also detail some general design concepts, which might come in handy to solve some architectural challenges. These are as follows:

  • Kustomize: How to manage similar K8s resources effectively and template free.
  • Docker inside Docker: How to build Docker images and push them to a private registry from a Docker sidecar container running on a K8s pod.
  • Docker outside of Docker: How to commit a Docker container running on a K8s pod and making its own snapshot to a new image.
  • Deployment: How to update and activate a K8s deployment configuration from a running pod.

The code to the solution in this post is available in my GitHub repository https://github.com/tibfab/dss-at-k8s.git


Table of contents


Main concept

All tasks to complete the mission can be roughly grouped into two phases:

  • installation phase and
  • deployment phase.

During the installation phase, as the first step, a Docker base image is created to host the actual DSS installation of a specific node type; see details of the different DSS node types in my post, "MLOps w/ Dataiku DSS on Kubernetes". As the next step of the installation, a K8s pod is started with an installation container from the base image to automatically carry out all installation steps. Finally, a snapshot from the running installation container is pushed to a Docker registry, which persists the installation status ready to deploy.

I’m aware of the concept of the ephemeral nature of K8s pods and the need to have basic Docker images easily reproducible from a versioned Dockerfile. Those concepts, however, don’t really count here because some challenges require unique solutions.

My solution might be unusual, but I came up with that because I couldn’t just ignore the obvious advantages of managing complex software solutions like Dataiku DSS by employing Docker images and K8s pods. For example, the flexibility and easiness of rolling out or migrating complex installations to different environments with adaptable resources.

The deployment phase creates all K8s resources – ConfigMaps, Services, and Deployments – necessary to run and access the whole DSS node cluster building a production-ready MLOps architecture.

Installation phase

Prerequisites

  • Docker Desktop installed and configured with proper resources (versions and resources I used are pictured below)
Docker Desktop version information
Docker Desktop version information
Docker Desktop resource configuration
Docker Desktop resource configuration
  • Kubernetes single-node cluster enabled (only the default service account is used in this example)

Installation step by step

Now I’d like to walk you through the main installation steps in detail:

  • Step 1. Building the base image
  • Step 2. Creating K8s secrets
  • Step 3. Installing DSS on the host container

Step 1. Building the base image

The Dockerfile – see in my GitHub repo – provides handy tools for installation/debugging and all necessary CLIs (docker and kubectl) needed to manage the DSS installations in K8s.

I used a terminal on my Mac and performed docker build in my repo’s main directory to create the base image.

% docker build -t dss-host-base:v.1.4 .
[+] Building 5.0s (5/8)                                                                                                            
 => [internal] load build definition from Dockerfile
...

After the image is successfully built, I need to push it to a Docker registry to make it available for the K8s installation pod. GitLab’s Container Registry (CR) seems to be a good choice. It provides 10 GB storage even for a free account.

% docker tag dss-host-base:v.1.4 registry.gitlab.com/tibor_fabian/dku-dss-k8s/dss-host-base:v1.4
% docker push registry.gitlab.com/tibor_fabian/dku-dss-k8s/dss-host-base:v1.4
The push refers to repository [registry.gitlab.com/tibor_fabian/dku-dss-k8s/dss-host-base]
...

To push the new image to the GitLab CR, you need to be logged in, of course. You can create so-called deploy tokens for that instead of using personal access tokens or credentials.

Note: The concept of having the image pushed to GitLab keeps this installation very close to a production-grade solution; for example, in AWS EKS, the same mechanism could also apply, even I’d recommend using ECR instead of GitLab CR in that case.

Step 2. Creating K8s secrets

The installation uses several K8s secrets on the pod for various tasks. Describing them all here would make this already lengthy post really long. Therefore, I outsourced this topic into a separate article: "Multiple Ways to Create Kubernetes Secrets".

Step 3. Installing DSS on the host container

Dataiku MLOps pipeline consists of four different DSS node types: Design, Automation, API Deployer, and API node. The API node is managed by the API Deployer; therefore, it doesn’t need to be installed explicitly.

The installation of the DSS Design node is started by applying the customized K8s configuration with kubectl as follows.

% kubectl apply --kustomize dss-at-k8s/installation/design
configmap/dss-node-type-design-node-installation-49fmckt7m7 created
pod/dss-design-node-installation created

The K8s configuration used in this solution makes heavy usage of the tool Kustomize. To avoid coming too short in explaining or making this article too long, I detail this concept in a separate post: "Kustomize explained; an MLOps Use Case".

This post describes the installation of the DSS Design node, but all other DSS nodes can be installed similarly by specifying the corresponding configuration directory for Kustomize.

For example, for the Automation node, that would be:

% kubectl apply --kustomize dss-at-k8s/installation/automation
configmap/dss-node-type-automation-node-installation-cgkhc95cd4 created
pod/dss-automation-node-installation created

My article "Kustomize explained; an MLOps Use Case" demonstrates in great detail how the K8s configuration is constructed from multiple parts template free. The example given there describes the deployment phase, but the mechanisms are exactly the same for installation.

The below directory hierarchy listing shows the customization structure of the installation phase. All DSS nodes have their own directory containing node-specific configurations such as persistent volumes.

-- installation/
   |-- apideployer
   |   |-- kustomization.yaml
   |   `-- volume.yaml
   |-- automation
   |   |-- kustomization.yaml
   |   `-- volume.yaml
   |-- base
   |   |-- dss-installation-pod.yaml
   |   `-- kustomization.yaml
   |-- design
   |   |-- kustomization.yaml
   |   `-- volume.yaml
   |-- create-dood-commit-script.sh
   |-- create-dss-startup-script.sh
   |-- github-repo-cred-secret.yaml
   |-- gitlab-registry-cred-secret.yaml
   `-- installation-steps.sh

The installation pod starts with three containers (see the configuration YAML in my GitHub repo):

  1. dss: This container starts with the image built in Step 1. above, performs the installation steps, and becomes the host of the actual DSS node type.
  2. dind: Docker in Docker (DinD) sidecar container provides the Docker daemon to build all necessary Docker images for DSS containerized execution.
  3. dood: Docker outside of Docker (DooD) sidecar container helps to commit the running dss container’s installation status creating a new DSS image used in the deployment phase.

For details on DinD and DooD sidecar concepts, please refer to my post, "MLOps w/ Dataiku DSS on Kubernetes".

The installation script in detail

Let’s take a look at the pod definition installation/base/dss-installation-pod.yaml, particularly into the configuration of the dss container, and go over the installation step by step.

Upon dss container startup, the Git repository is cloned, making the script [installation-steps.sh](https://github.com/tibfab/dss-at-k8s/blob/master/installation/installation-steps.sh) available inside the container (line 15). Then in the next line, the script is executed, which performs the following steps automatically:

  • Installs DSS dependencies for the host ⚑
  • Installs the DSS node type specified by the environment variable DSS_NODE_TYPE ☚
  • Activates the DSS feature user isolation ⚑
  • Installs DSS Spark integration ☚
  • Builds the Docker base image for DSS containerized execution
  • Builds the Docker base image for Spark running on K8s
  • Builds the Docker image for the DSS API Deployer to deploy DSS API nodes
  • Pushes the Docker images from the above steps to GitLab CR
  • Creates shell scripts to use in the deployment phase, start.sh, pre-stop.sh, and commit-dss.sh
  • Commits the running DSS host container (that is, dss itself) via DooD sidecar container
  • Pushes the new DSS host image to GitLab CR
  • Updates the K8s deployment configuration in GitHub to use the new DSS host image from the previous step
  • Starts the updated DSS deployment
  • Terminates the running installation pod; which means it terminates itself

The last two steps are performed only if the environment variable AUTO_DEPLOYMENT in the dss container configuration definition is set to true. The container sleeps infinitely otherwise.

Important note: The installation is kind of two-sided: Steps flagged with update the DSS host container; similar to updating a virtual machine with system dependencies. These changes need to be committed to a new Docker image to make changes persistent for the deployment phase.

On the other hand, the steps marked with install or update the DSS software on a persistent volume mounted to the installation pod. The same volume is then used for deployment after the installation phase has been completed. Changes to DSS, even later on, such as working on some Data Science projects in the studio, resides on the persistent volume. If the deployment is (re)started, DSS resumes exactly where it was terminated.

It would be out of the scope of this article to go through the script [installation-steps.sh](https://github.com/tibfab/dss-at-k8s/blob/master/installation/installation-steps.sh) in every detail. Let me rather explain some interesting aspects of it.

  • For security reasons, the user ubuntu performs the installation steps instead of root – and it is also ubuntu the deployment pod starts with.
  • One prerequisite for the installation is a persistent volume with ACL support – one for each DSS node. Such volume can be created by mounting a loop device with an Ext4 filesystem image from a Mac’s file system directly. I describe this topic in more detail in the post "Persistent Volume for Kubernetes w/ ACL Support on Mac".
  • The script [installation-steps.sh](https://github.com/tibfab/dss-at-k8s/blob/master/installation/installation-steps.sh) detects existing DSS installations on the mounted volume. Only the steps for updating the host image – marked w/ ⚑ – are performed if it finds one. The DSS installation itself on the volume remains unaffected.
  • The installation script uses Docker daemon to build, push, and pull images. The availability of Docker daemon is also a prerequisite to run DSS itself with containerized execution enabled. In my architecture, the dind sidecar container provides the solution. The container dss utilizes the Docker daemon running in the dind sidecar container via Docker CLI and port mapping. Please refer to my post, "MLOps w/ Dataiku DSS on Kubernetes" for more details on this topic.

Pre-stop script

A very simple an interesting concept: During installation the script installation-steps.sh creates another small script pre-stop.sh.

The variables used in the excerpt of installation-steps.sh above contain the actual context of the installation written to pre-stop.sh. Upon K8s pod termination, the resulting script pre-stop.sh (see below) stops DSS and detaches the filesystem image. The image was mounted during startup, and it serves as a persistent volume for the DSS installation. As already mentioned above, my article "Persistent Volume for Kubernetes w/ ACL Support on Mac" describes in detail how the image is created.

Updating the deployment configuration

Changes to the DSS container made in the installation phase are conveyed to the deployment steps with a new Docker image. This happens in two steps.

First, a new image is created from the running DSS container when the host container has been successfully installed – explained in section "How to commit the running DSS container".

Next, the deployment configuration of the K8s pod is updated. Below is another excerpt from the script installation-steps.sh, which reveals how the configuration update is implemented.

In line 4, the stream editor, sed patches the deployment configuration with the new image tag. Then the configuration update is committed with git. Additionally, the configuration update is tagged to track the change history or roll back to a specific installation state, if necessary (lines 16/17).

As a result, upon deployment pod startup, K8s pulls the new Docker image from GitLab CR to create the new dss container.

Deployment phase

The DSS deployment configuration is applied either by the script installation-steps.sh automatically at the end of the installation (as shown in the section above) or manually via Kustomize with the following command.

% kubectl apply --kustomize dss-at-k8s/deployment/design
configmap/dss-node-type-design-node-982cct889d created
service/dss-design-node created
deployment.apps/dss-design-node created

The below directory hierarchy listing shows a similar structure as of the installation. All DSS nodes have their own directory containing node-specific configurations such as persistent volumes and – new here compared to installation – resource requests and limits.

-- deployment
   |-- apideployer
   |   |-- kustomization.yaml
   |   |-- resources.yaml
   |   `-- volumes.yaml
   |-- automation
   |   |-- kustomization.yaml
   |   |-- resources.yaml
   |   `-- volumes.yaml
   |-- base
   |   |-- dss-deployment.yaml
   |   |-- kustomization.yaml
   |   `-- service.yaml
   `-- design
       |-- kustomization.yaml
       |-- resources.yaml
       `-- volumes.yaml

DSS node types have very different requirements as far as resource reservation is concerned. An example of resources.yaml file of the Design node is shown below for testing purposes.

In contrast, the DSS API Deployer requires fewer resources (see example in GitHub here).

How to commit the running container

Whenever an update was made to the host system, a new DSS host image needs to be created and configured for the next deployment. The DooD concept (Docker outside of Docker) described in this article provides a perfect solution for that.

The script installation-steps.sh creates some helper scripts for the deployment phase for starting and stopping the dss container. It also creates a script to make the image update process really easy: commit-dss.sh.

The script commit-dss.sh above resides in the DSS image, and each time the deployment pod is started, the start script copies commit-dss.sh to the mounted volume /share. Given the pod configuration YAML in my GitHub repo, the script becomes available from within the dood sidecar container because it also mounts the same shared volume shared-storage. See an excerpt of the deployment configuration below, how I achieve that.

To commit the running dss container, the dood container is entered first.

% kubectl exec -it dss-design-node-55cb79f-rg5rh --container -- dood sh

Then, the script commit-dss.sh is executed. It creates a new image from the running DSS host container – tagged with the current date/time – and pushes it to GitLab CR.

/ # /share/commit-dss.sh 
Commiting DSS container to image: registry.gitlab.com/tibor_fabian/dku-dss-k8s/dss-${DSS_NODE_TYPE}-node:dss-8.0.3-2020.12.10-17.07.50
...
Login Succeeded
sha256:46f6fc35948557caff04215b3266ae81d9192a821ca108f503800eb0ce3c497e
The push refers to repository [registry.gitlab.com/tibor_fabian/dku-dss-k8s/dss-design-node]
01c8b42cd0e1: Pushed 
...
dss-8.0.3-2020.12.10-17.07.50: digest: sha256:f945a466182e7f0893903b0bcf455ea3b533969b83d5e5cdc2455e6df951b83e size: 2204

Conclusion

Besides a complete use case description with a special architecture, this article shows some interesting generic concepts in the fields MLOps, Kubernetes, and Docker. The solution outlined above is useful if someone just wants to run a complete MLOps pipeline on a Mac without the additional costs of provisioning a cloud solution and don’t want to go too far from a production-ready system in the cloud.


Related Articles