How to setup Deep Learning environment on AWS GPU instance

Siavash Fahimi
Towards Data Science
4 min readJul 26, 2017

--

Setting up our goal

The goal is to learn how to set up a machine learning environment on Amazon’s AWS GPU instance, that could be easily replicated and utilized for other problems by using docker containers.

Setting up environment

The first step is to build up a virtual machine on amazon’s web services. To do so, we need to choose the right hardware and software packages for building Deep Learning models. Deep Learning models consume massive compute powers to do matrix operations on very large matrices. There are two hardware options on AWS. CPU only or GPU. GPU stands for Graphics Processing Unit. The major difference in the architecture of CPU and GPU is that GPUs are parallel processors, but more specialized. On the contrary, CPUs are general purpose computing architecture, which can’t do a good job at parallel computing. Amazon allows you to build a virtual machine with dedicated GPU cores for your heavy computation. Off course this adds a bit to your cost, but if you consider the amount of time which you save, then it’s a good deal.

Alternatively, if you really want to get serious about this, then I’d recommend to build your own system at home with nvidia GPUs.

Below are the steps that we need to take to set up a GPU Instance on AWS:

  • Launch Instance
  • select ubuntu 16.04
  • select g2.xlarge — 8 vCPU, 15Gb RAM, 60GB startup SSD, 1 GPU K520
  • select availability zone
  • Protect against accidental termination
  • add storage — 120 GB
  • add tags such as name and env…
  • select security group
  • launch and choose key

Connect to the instance

navigate to the directory where you have stored your SSH key and use below command to connect to your instance in terminal

ssh -i “your_ssh_key.pem” ubuntu@[your instance public IP address]

Installing NVIDIA drivers

ref: https://medium.com/towards-data-science/how-to-set-up-a-deep-learning-environment-on-aws-with-keras-theano-b0f39e3d861c

sudo apt-get updatesudo apt-get upgrade

Essentials

sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python3-venv swig python3-wheel libcurl3-devsudo apt-get install -y gcc g++ gfortran git linux-image-generic linux-headers-generic linux-source linux-image-extra-virtual libopenblas-dev

NVIDIA Drivers

sudo add-apt-repository ppa:graphics-drivers/ppa -ysudo apt-get updatesudo apt-get install -y nvidia-375 nvidia-settings

Install CUDA 8 repository

wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-debsudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-debsudo apt-get updatesudo apt-get install cudanvidia-smi

Setting up docker engine for nvidia GPU machines (nvidia-docker)

ref: https://docs.docker.com/engine/installation/linux/ubuntu

Add docker engine repository

sudo apt-get updatesudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ software-properties-commoncurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -sudo apt-key fingerprint 0EBFCD88sudo add-apt-repository \ “deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable”

Install docker engine ce

sudo apt-get updatesudo apt-get install docker-cesudo docker run hello-worldsudo usermod -aG docker $USER

Setup nvidia-docker

# Install nvidia-docker and nvidia-docker-pluginwget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.debsudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb# Test if docker is using nvidia GPUsnvidia-docker run — rm nvidia/cuda nvidia-smi

Setting up a docker container with jupyter notebook, tensorflow and machine learning libraries

Pull tensorflow docker image for gpu

docker pull tensorflow/tensorflow:latest-gpu-py3

Create a docker container with tensorflow for gpu

nvidia-docker run -it -name planet -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3 bash

Inside the docker container

Set Up the ground

apt-get updateapt-get install sudosudo apt-get updatesudo apt-get install gitsudo apt-get install nano #or your choice of editor 

Setting up the Python environment

At first let’s see what we need and what we have. We need:

Python 3.5, PIP3 9.0.1, jupyter notebook with Python 3.5 kernel, tensorflow 1.1, keras with tensorflow Backend, and these librarise: cv2 (OpenCV), sys, os, gc, numpy, pandas, seaborn, matplotlib, scikit-learn (sklearn), scipy, itertools, subprocess, six, skimage, IPython.display, tqdm, multiprocessing, concurrent.futures

Run ipython and import below libraries to make sure everything works. Most of these libraries are already installed in tensorflow docker image. However, some of of them may not be included.

import sysimport osimport gcimport numpyimport pandasimport seabornimport matplotlibimport sklearnimport scipyimport itertoolsimport subprocessimport siximport skimageimport IPython.displayimport tensorflowimport kerasimport tqdmimport multiprocessingimport concurrent.futuresimport cv2

Install missing libraries

pip3 install seabornpip3 install scikit-imagepip3 install keraspip3 install tqdmpip3 install opencv-python

Create a new image from this container and name it keras-tf

docker commit amazon keras-tf:latest

--

--