
Cloud based computing platforms such as Google Cloud Platform (GCP) has gained quite a lot of attention with the emergence of deep learning and realization of "deeper the model, better the performance is" (I’m not here to argue about the truth of this concept, but merely stating a widely-accepted belief). And "deeper the model, better the computing resource should be".
And it is not an economically viable strategy to buy and install the latest GPU cards whenever they are released (especially for small and medium scale technical institutes). If one is adopting this strategy, they should be prepared for the following challenges and many more.
- Making large monetary sacrifices to buy the latest GPU cards and necessary infrastructure
- Maintaining infrastructure to ensure uninterrupted service
- Developing resource allocation schemes for multiple users
However, GCP provides this for a small fee (of course will add up in the long run). But importantly you don’t have to worry about maintenance or initial setting up (e.g. installing an operating system (OS)), and offers a wide variety of customization to choose from (e.g. OS, disk storage, GPU type, number of GPUs, etc.). For example you might choose an instance with Ubuntu 16.04, 50GB storage, 2 Tesla P100 GPUs, etc.
In this post, I’ll be discussing how to setup a custom Docker image, create a container with the image and and get your python + Tensorflow scripts running in that.
Initial Setup
I’m not going to go through the setting up, and there is quite a few nice resources out there explaining how to create and setup a computing instance on GCP. One of my favorites is,
Jupyter + Tensorflow + Nvidia GPU + Docker + Google Compute Engine
I’ll summarize what is in there in plain english,
- First define firewall rules for allowing TCP communication through several ports (tensorboard:6006, ftp:21-22, jupyter:8888)
- Setup a Computing Instance on GCP with the firewall rules
- Install CUDA on the instance
- Install Docker and Nvidia-Docker on the instance
So What Comes After the Initial Setup?
Here we are going to discuss where to go from here. First let me state the libraries and versions I’ll be using.
- Python: 3.4
- Tensorflow: 1.3
I’ll now briefly summarize what we’re going to do. We will first download an image that supports python 3.x Tensorflow. Then we will create a container with the downloaded image. Finally we will run some commands to make sure it works.
PS: Which python version you want to use depends on the problem you’re trying to solve. There are differences between python 3.x and python 2.x. However, python 2.x is to retire in the near future and python 3.x will take it’s place. So it’s better to migrate to python 3.x from python 2.x
SSH into the Instance
To ssh into the GCP instance, you can either use the
- gcloud Shell
- 3rd Party Shell (e.g. Ubuntu Shell or Putty on Windows)
Download a Compatible Docker Image
Unfortunately, gcr.io does not provide us with a python 3.x compatible Tensorflow images, but only python 2.x compatible Tensorflow images. So we will have to download a python 3.x compatible Tensorflow image from DockerHub. And you can see all the Tensorflow Images in DockerHub.
I’m going to go with the 1.3.0-gpu-py3 image (You will see this if you scroll down on Tensorflow Images in DockerHub )
To download first ssh into your instance using either gcloud shell or a terminal. Once you have access type,
sudo docker pull tensorflow/tensorflow:1.3.0-gpu-py3
When you type this, Docker will automatically look for the image with the tag name specified after the double colon (in DockerHub of course). Then you should see in the shell, the image being downloaded and extracted. To make sure the image was downloaded try
sudo docker images
and you should see the image listed
REPOSITORY TAG IMAGE ID CREATED SIZE
nvidia/cuda latest af786b192aa5 10 days ago 2.16GB
tensorflow/tensorflow 1.3.0-gpu-py3 ae2207c21bc1 5 months ago 3.29GB
Create and Run a Docker Container with the Image
After this operation is finished, we can create a docker container with the image. But before that, create a folder with the name docker_tensorflow_example. This will be saving the data we create while running things within the container, and we will discuss how to do that mapping later. Then change permission of this folder to the following,
sudo chmod 755 ~/docker_tensorflow_example/
so the owner has full access. Then we create our docker container with,
nvidia-docker run -it -v ~/docker_tensorflow_example/:/root/docker_tensorflow_example - name docker-py3-tensorflow tensorflow/tensorflow:1.3.0-gpu-py3 bash
That was a mouthful wasn’t it. Let’s break this baby into pieces then,
- nvidia-docker – is a Docker Engine Utility for running Docker containers on NVIDIA GPU. wrapper on top of docker
- run – means, run the docker container
- -it – says to keep the container open even if anything is not attached (so we can run things, detach form the remote and on our side and shut down the local computer.
- -v src_dir:target_dir – maps a local directory to a directory in the container. Remember, the container along with all the saved data in it dissapears when when you stop a container. So this option maps all the data created in the container folder (target_dir) to an actual directory on the storage (source_dir).
- -name docker-py3-tensorflow – This is the name of the container we are creating. Using a specific name will also stop us form creating containers blindly (because if you don’t specify a name docker will create the container anyway with some random name. If you specify a name docker will complain if a container with the same name already exists)
- tensorflow/tensorflow:1.3.0-gpu-py3 – Tells docker which image to use to create the container.
Something to Keep in Mind with nvidia-docker
If you see the error,
nvidia-docker: command not found
Then you probably will have to login as the root first with,
sudo -s
and try the command again.
Starting Jupyter Notebook Server
If you need to run a Jupyter Notebook Server use the above command without bash.
nvidia-docker run -it -v ~/docker_tensorflow_example/:/root/docker_tensorflow_example - name docker-py3-tensorflow tensorflow/tensorflow:1.3.0-gpu-py3
Testing the Container
Now you should have the docker container running and your shell should be within the container, the Terminal should have the prompt as,
root@8301xxxxxxxx: ~/
where 8301xxxxxxxx is the container ID. Now try,
Python3
and try,
import TensorFlow as tf
print(tf.__version__)
This should work properly, and you should get 1.3.0 as the output. If that happens, Congratulations! You have your libraries installed. If not, make sure you followed steps currently for both initial setting up and setting up the docker container.
Cheers!