How to setup Deep Learning environment on AWS GPU instance
Setting up our goal
The goal is to learn how to set up a machine learning environment on Amazon’s AWS GPU instance, that could be easily replicated and utilized for other problems by using docker containers.
Setting up environment
The first step is to build up a virtual machine on amazon’s web services. To do so, we need to choose the right hardware and software packages for building Deep Learning models. Deep Learning models consume massive compute powers to do matrix operations on very large matrices. There are two hardware options on AWS. CPU only or GPU. GPU stands for Graphics Processing Unit. The major difference in the architecture of CPU and GPU is that GPUs are parallel processors, but more specialized. On the contrary, CPUs are general purpose computing architecture, which can’t do a good job at parallel computing. Amazon allows you to build a virtual machine with dedicated GPU cores for your heavy computation. Off course this adds a bit to your cost, but if you consider the amount of time which you save, then it’s a good deal.
Alternatively, if you really want to get serious about this, then I’d recommend to build your own system at home with nvidia GPUs.
Below are the steps that we need to take to set up a GPU Instance on AWS:
- Launch Instance
- select ubuntu 16.04
- select g2.xlarge — 8 vCPU, 15Gb RAM, 60GB startup SSD, 1 GPU K520
- select availability zone
- Protect against accidental termination
- add storage — 120 GB
- add tags such as name and env…
- select security group
- launch and choose key
Connect to the instance
navigate to the directory where you have stored your SSH key and use below command to connect to your instance in terminal
ssh -i “your_ssh_key.pem” ubuntu@[your instance public IP address]
Installing NVIDIA drivers
sudo apt-get updatesudo apt-get upgrade
Essentials
sudo apt-get install openjdk-8-jdk git python-dev python3-dev python-numpy python3-numpy build-essential python-pip python3-pip python3-venv swig python3-wheel libcurl3-devsudo apt-get install -y gcc g++ gfortran git linux-image-generic linux-headers-generic linux-source linux-image-extra-virtual libopenblas-dev
NVIDIA Drivers
sudo add-apt-repository ppa:graphics-drivers/ppa -ysudo apt-get updatesudo apt-get install -y nvidia-375 nvidia-settings
Install CUDA 8 repository
wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-debsudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64-debsudo apt-get updatesudo apt-get install cudanvidia-smi
Setting up docker engine for nvidia GPU machines (nvidia-docker)
ref: https://docs.docker.com/engine/installation/linux/ubuntu
Add docker engine repository
sudo apt-get updatesudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ software-properties-commoncurl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -sudo apt-key fingerprint 0EBFCD88sudo add-apt-repository \ “deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable”
Install docker engine ce
sudo apt-get updatesudo apt-get install docker-cesudo docker run hello-worldsudo usermod -aG docker $USER
Setup nvidia-docker
# Install nvidia-docker and nvidia-docker-pluginwget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.debsudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb# Test if docker is using nvidia GPUsnvidia-docker run — rm nvidia/cuda nvidia-smi
Setting up a docker container with jupyter notebook, tensorflow and machine learning libraries
Pull tensorflow docker image for gpu
docker pull tensorflow/tensorflow:latest-gpu-py3
Create a docker container with tensorflow for gpu
nvidia-docker run -it -name planet -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3 bash
Inside the docker container
Set Up the ground
apt-get updateapt-get install sudosudo apt-get updatesudo apt-get install gitsudo apt-get install nano #or your choice of editor
Setting up the Python environment
At first let’s see what we need and what we have. We need:
Python 3.5, PIP3 9.0.1, jupyter notebook with Python 3.5 kernel, tensorflow 1.1, keras with tensorflow Backend, and these librarise: cv2 (OpenCV), sys, os, gc, numpy, pandas, seaborn, matplotlib, scikit-learn (sklearn), scipy, itertools, subprocess, six, skimage, IPython.display, tqdm, multiprocessing, concurrent.futures
Run ipython and import below libraries to make sure everything works. Most of these libraries are already installed in tensorflow docker image. However, some of of them may not be included.
import sysimport osimport gcimport numpyimport pandasimport seabornimport matplotlibimport sklearnimport scipyimport itertoolsimport subprocessimport siximport skimageimport IPython.displayimport tensorflowimport kerasimport tqdmimport multiprocessingimport concurrent.futuresimport cv2
Install missing libraries
pip3 install seabornpip3 install scikit-imagepip3 install keraspip3 install tqdmpip3 install opencv-python
Create a new image from this container and name it keras-tf
docker commit amazon keras-tf:latest