The world’s leading publication for data science, AI, and ML professionals.

Machine Learning with Docker and Kubernetes: Install a Cluster from Scratch

Kubernetes, Docker, Python and Scikit-Learn for Machine and Deep Learning: How to Scale Data Scientist’s efforts

Photo by Annamária Borsos
Photo by Annamária Borsos

Kubernetes, the open-source container orchestration platform, is certainly one of the most important tools if we want to scale our machine and deep learning efforts. To understand the utility of Kubernetes for data scientists we can think about all the applications we developed and containerized. How will we coordinate and schedule all these containers? How can we upgrade our machine learning models without interruptions of service? How do we scale the models and make them available to users over the internet? What happens if our model is used by many more people we thought? If we didn’t think the architecture before, we will need to increase the compute resources and certainly manually create new instances and redeploy the application. Kubernetes schedules, automates, and manages tasks of container-based architectures. Kubernetes deploys containers, update them, provides service discovery, monitoring, storage provisioning, load balancing and more. If we Google "Kubernetes", we often see articles comparing Docker and Kubernetes. It’s like comparing an apple and an apple pie. The first thing to say is that Kubernetes is designed to run on a cluster, while Docker runs on a single node. Kubernetes and Docker are complementary by creating, deploying and scaling containerized applications. There is also a comparison between Kubernetes and Docker Swarm which is a tool for clustering and scheduling Docker containers. Kubernetes has several options providing really important advantages such as high availability policies, auto-scaling capabilities, the possibility to manage complex and hundreds of thousands of containers running on a public, hybrid, multi-cloud or on-premise environments.

You can find all the files used in this chapter on GitHub

Kubernetes Vocabulary

In Kubernetes, we have different concepts we need to know. We have the Kubernetes master which is a set of three processes running on one single node of our cluster that we call the master node: kube-apiserver, kube-controller-manager and kube-scheduler. Each node (excluding the master node) in our cluster runs two processes: kubelet, which communicates with the Kubernetes master and kube-proxy which is a network proxy reflecting the Kubernetes network services on each node. The API server is used for communication between components, the controller manager checks the state against the desired state and the scheduler decides which pods should run on. The kubelet is the primary "node agent" that runs on each node. kube-proxy is the Kubernetes network proxy running on each node.

In Kubernetes, we have the Control Plane (The Master Node) which is the orchestrator, the nodes are the machines (physical servers, virtual machines, etc.) that run your containerized applications. The nodes are controlled by the master node. Then, we have the Podthat is the basic building block of Kubernetes. Pods are the smallest deployable units of computing that we can create and manage in Kubernetes. It’s made up of one or more containers. We can consider for example creating a Pod that trains our machine learning model (the code is in a single container). We also must deal with Jobs which create one or more Pods. The Jobs will make sure that the execution of the Pods is successful. We can create a Job for training our models, to perform batch inference, or store information such as the training metadata to external storage or the predictions. Kubernetes can really help put our machine and deep learning models into production by simplifying the process of exposing the models to others. We will see that we need to follow few steps such as creating a Deployment specifying which container to run and how many replicas we want to create, expose the Deployment using a Service allowing us to define the rules for exposing pods to each other and to the internet. A very important feature of Kubernetes is load-balancing to handle the work of balancing the traffic among the replicas and autoscaling resources to meet increased demands.

Typical Kubernetes Architecture
Typical Kubernetes Architecture

Kubernetes Quick Install

We can use different methodologies to install Kubernetes. All of them are well explained in the Kubernetes website:

We can us Homebrew by typing in our terminal:

brew install kubectl

Alternatively, we can type:

Ubuntu / Debian

sudo apt-get update && sudo apt-get install -y apt-transport-https
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Red Hat / CentOS

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

# Mettre SELinux en mode permissif (le désactiver efficacement)
sudo setenforce 0
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

sudo yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

sudo systemctl enable --now kubelet

Check the installation with the following command:

kubectl version --client

Install all Kubernetes binaries (kubeadm, kubelet, kubectl) :

sudo apt-get install -y kubelet kubeadm kubernetes-cni
systemctl enable kubelet

Install a Kubernetes Cluster

In order to get familiar with the Kubernetes, we will use Vagrant, a tool to build and manage virtual machines. Vagrant creates and configures lightweight environments in a single workflow by using a configuration file named Vagrantfile. Let’s install Vagrant.

Ubuntu / Debian

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update &amp;&amp; sudo apt-get install vagrant

Red Hat / CentOS

sudo yum install -y yum-utils
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
sudo yum -y install vagrant

If we want to install it in other platforms, do not hesitate to check : https://www.vagrantup.com/downloads

Vagrant relies on interactions with providers such as VirtualBox, VMware or Hyper-V, to provide Vagrant with resources to run development environments. The installation of one of them is required.

We can install virtualbox:

sudo apt update
sudo apt install virtualbox
sudo apt install virtualbox-dkms

and create a "Vagrantfile":

sudo vagrant init bento/ubuntu-20.04

For our project, we will create or edit a specific Vagrantfile to build our own environments as follows:

As you can read in the Vagrantfile, we are creating a master node that we name "kubmaster" with an Ubuntu version 20.04, 2Gb of memory and 2 CPUs, one IP address (192.168.56.101) and docker. Then we create two nodes (kubnode1 and kubnode2) with the same configuration as the master node.

Once edited, change to root and type in the terminal (where the Vagranfile is):

vagrant up

This command will create and configure guest machines according to our edited Vagrantfile. When the process is finished, we can connect to the guest machines by using the following commands in your terminal :

vagrant ssh kubmaster
vagrant ssh kubnode1
vagrant ssh kubnode2

We have to disactivate the swap on the master node (kubmaster) and each node (kubnode1 and kubnode2). We connect to each guest machines to perform the following commands:

swapoff -a
vim /etc/fstab

In the fstab file, we need to comment the swap line.

Let’s also install some packages on each machine (kubmaster, kubnode1, kubnode2) such as curl and apt-transport-https:

apt-get update &amp;&amp; apt-get install -y apt-transport-https curl

Then, we perform a curl to get the gpg key that will allow us to use the Kubernetes binaries: kubectl, kubeadm, kubelet.

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -

We add the access to the Google repository (http://apt.kubernetes.io) that will allow us to download and install the binaries:

add-apt-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

To install binaries, we do the following:

apt-get install -y kubelet kubeadm kubectl kubernetes-cni
systemctl enable kubelet

Kubernetes: Initialization and internal network

Now that we installed the necessary packages in all our nodes, we will work on the initialization and network to join the different parts of the Kubernetes Cluster.

To initiate the master node, connect to the master node and type:

root@kubmaster:~# kubeadm init --apiserver-advertise-address=192.168.56.101 --node-name $HOSTNAME --pod-network-cidr=10.244.0.0/16

192.168.56.101 is the IP address of our master node we defined previously, and 10.244.0.0/16 is for the Kubernetes internal network defining the range that Kubernetes will use to attribute IP addresses within its network.

We see the following output with a token that was generated to join our different nodes:

As you can read in the output, to start using our cluster, we need to create the configuration file to work with kubectl :

root@kubmaster:~# mkdir -p $HOME/.kube
root@kubmaster:~# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
root@kubmaster:~# chown $(id -u):$(id -g) $HOME/.kube/config

To put in place the internal network, we need to provide a network between nodes in the cluster. For this, we will use Flannel which is a very simple and easy way to configure a layer 3 network fabric designed for Kubernetes. Different solutions exist such as WeaveNet, Contiv, Cilium etc. Flannel runs a binary agent called flanneld on each host. Flannel is also responsible for allocating a subnet lease to each host out of a larger, preconfigured address space.

We add pods to give the possibility to manage the internal network (command to be launched in all nodes):

sysctl net.bridge.bridge-nf-call-iptables=1

We then install our Flannel network with the help of a configuration file (kube-flannel.yml) available online by typing the following command in the master node:

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

We check the status of the pods in the master node (Flannel network, kube-scheduler, kube-apiserver, kube-controller-manager, kube-proxy, pods managing internal DNS, a pod that stores configurations with etcd etc …):

kubectl get pods --all-namespaces

It’s sometimes necessary to modify our configuration of flannel by editing the network (from 10.244.0.0/16 to 10.10.0.0/16) To do that, you can do the following:

kubectl edit cm -n kube-system kube-flannel-cfg
# edit network 10.244.0.0/16 to 10.10.0.0/16

If you type in the master node the command below, we can see that our master is ready:

kubectl get nodes

Now it is time to join the nodes to the master. For this, we copy the previously generated token and type the following command in the two nodes (kubnode1 and kubnode2):

kubeadm join 192.168.56.101:6443 --token 0dgzxg.86lj9mirur1ele2f   --discovery-token-ca-cert-hash sha256:59ca1ef21c1c4304ff1558210cf78528a6750babc3fa69241e925c2d5f7d90c6

We will see the following output:

Come back to the master node, and type the following commands to check the status:

kubectl get pods --all-namespaces
kubectl get nodes

We should see the nodes with the status "Ready". We can also do a docker ps to see all our launched containers (coredns, flannel, etc …) both in the master node and others.

A last comment is about the cluster access. If we type in the master node:

kubectl get nodes

We can see that our kubmaster and nodes have the same internal IP:

kubectl get nodes -o wide

We need to open and edit /etc/hosts and add our master node IP:

vim /etc/hosts

The /etc/hosts file need to be also modified in each node (kubnode1 and kubenode2):

Then, we can delete each flannel by typing in the master node:

kubectl get pods -n kube-system

Then,

kubectl delete pods kube-flannel-ds-56f9h -n kube-system
kubectl delete pods kube-flannel-ds-dx8tv -n kube-system
kubectl delete pods kube-flannel-ds-kgxvt -n kube-system

And the magic of Kubernetes is that it can reconfigure again without losing the service and have new flannels:

We will see that we have the correct IP addresses:

kubectl get nodes -o wide

In order to be more comfortable, we can also add the autocompletion by typing:

apt-get install bash-completion
echo "source <(kubectl completion bash)" >> ~/.bashrc
source ~/.bashrc

Next steps

Now that we know how to install a Kubernetes cluster, we will be able to create Kubernetes Jobs that will create pods (and hence containers) allowing for instance to train our Machine Learning models, serialize them, load models into memory and perform batch or online inferences. We explored these concepts here:

Machine Learning with Docker and Kubernetes: Training models

Machine Learning with Docker and Kubernetes: Batch Inference

Machine Learning Prediction in Real Time using Docker, Python Rest APIs with Flask and Kubernetes…


Source

https://phoenixnap.com/blog/kubernetes-vs-docker-swarm

https://kubernetes.io/fr/docs/concepts/

https://kubernetes.io/fr/docs/tasks/tools/install-kubectl/

https://www.vagrantup.com/downloads

https://gitlab.com/xavki/presentations-kubernetes/-/tree/master

https://github.com/flannel-io/flannel


Related Articles