Deploying Kubeflow to a Bare-Metal GPU Cluster from Scratch

My experience of deploying Google’s Kubernetes ML toolkit on physical servers with multiple GPUs

Vadim Markovtsev
Towards Data Science
14 min readMar 18, 2021

--

Attack on Kubeflow. Image by Anastasia Markovtseva, CC-BY-SA 4.0.

Hardware

I’ve got 3 standard Supermicro towers with 256GB RAM, an SSD, 5 HDDs, and 4 GPUs each. Ethernet connects them to the “controller” Dell server with access to the…

--

--

Machine learning and software engineer. Development teams manager. Public speaker. Google Developer Expert in Machine Learning (2018–2021).