Training machine learning models online for free(GPU, TPU enabled)!!!

Maithreyan Surya
Towards Data Science
6 min readOct 12, 2018

--

Computation power needed to train machine learning and deep learning model on large datasets, has always been a huge hindrance for machine learning enthusiast. But with jupyter notebook which run on cloud anyone who is has the passion to learn can train and come up with great results.

In this post I will providing information about the various service that gives us the computation power to us for training models.

  1. Google Colab
  2. Kaggel Kernel
  3. Jupyter Notebook on GCP
  4. Amazon SageMaker
  5. Azure Notebooks

1)Google Colab

Colaboratory is a google research project created to help disseminate machine learning education and research. Colaboratory (colab) provides free Jupyter notebook environment that requires no setup and runs entirely in the cloud.It comes pre-installed with most of the machine learning libraries, it acts as perfect place where you can plug and play and try out stuff where dependency and compute is not an issue.

The notebooks are connected to your google drive, so you can acess it any time you want,and also upload or download notebook from github.

GPU and TPU enabling

First, you’ll need to enable GPU or TPU for the notebook.

Navigate to Edit→Notebook Settings, and select TPU from the Hardware Accelerator drop-down .

code to check whether TPU is enabled

import os
import pprint
import tensorflow as tf
if ‘COLAB_TPU_ADDR’ not in os.environ:
print(‘ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!’)
else:
tpu_address = ‘grpc://’ + os.environ[‘COLAB_TPU_ADDR’]
print (‘TPU address is’, tpu_address)
with tf.Session(tpu_address) as session:
devices = session.list_devices()

print(‘TPU devices:’)
pprint.pprint(devices)

Installing libraries

Colab comes with most of ml libraries installed,but you can also add libraries easily which are not pre-installed.

Colab supports both the pip and apt package managers.

!pip install torch

apt command

!apt-get install graphviz -y

both commands work in colab, dont forget the ! (exclamatory) before the command.

Uploading Datasets

There are many ways to upload datasets to the notebook

  • One can upload files from the local machine.
  • Upload files from google drive
  • One can also directly upload datasets from kaggle

Code to upload from local

from google.colab import files
uploaded = files.upload()

you can browse and select the file.

Upload files from google drive

PyDrive library is used to upload and files from google drive

!pip install -U -q PyDrivefrom pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
# PyDrive reference:
# https://gsuitedevs.github.io/PyDrive/docs/build/html/index.html
# 2. Create & upload a file text file.
uploaded = drive.CreateFile({'title': 'Sample upload.txt'})
uploaded.SetContentString('Sample upload file content')
uploaded.Upload()
print('Uploaded file with ID {}'.format(uploaded.get('id')))
# 3. Load a file by ID and print its contents.
downloaded = drive.CreateFile({'id': uploaded.get('id')})
print('Downloaded content "{}"'.format(downloaded.GetContentString()))

You can get id of the file you want to upload,and use the above code.

For more resource to upload files from google services.

Uploading dataset from kaggle

We need to install kaggle api and add authentication json file which you can download from kaggle website(API_TOKEN).

!pip install kaggle

upload the json file to the notebook by, uploading file from the local machine.

create a /.kaggle directory

!mkdir -p ~/.kaggle

copy the json file to the kaggle directory

change the file permision

!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

Now you can use command to download any dataset from kaggle

kaggle datasets download -d lazyjustin/pubgplayerstats

Now you can use the below to download competition dataset from kaggle,but for that you have to participate in the competition.

!kaggle competitions download -c tgs-salt-identification-challenge

You can train and run fashion_mnist online without any dependency here.

Colab is a great tool for everyone who are interested in machine learning,all the educational resource and code snippets to use colab is provide in the official website itself with notebook examples.

2)Kaggle Kernels

Kaggle Kernels is a cloud computational environment that enables reproducible and collaborative analysis.

One can run both Python and R code in kaggle kernel

Kaggle Kernel runs in a remote computational environment. They provide the hardware needed.

At time of writing, each kernel editing session is provided with the following resources:

CPU Specifications

4 CPU cores

17 Gigabytes of RAM

6 hours execution time

5 Gigabytes of auto-saved disk space (/kaggle/working)

16 Gigabytes of temporary, scratchpad disk space (outside /kaggle/working)

GPU Specifications

2 CPU cores

14 Gigabytes of RAM

Kernels in action

Once we create an account at kaggle.com, we can choose a dataset to play with and spin up a new kernel,with just a few clicks.

Click on create new kernel

You will be having jupyter notebook up and running.At the bottom you will be having the console which you can use,and at the right side you will be having various options like

VERSION

When you Commit & Run a kernel, you execute the kernel from top to bottom in a separate session from your interactive session. Once it finishes, you will have generated a new kernel version. A kernel version is a snapshot of your work including your compiled code, log files, output files, data sources, and more. The latest kernel version of your kernel is what is shown to users in the kernel viewer.

Data Environment

When you create a kernel for a dataset ,the dataset will be preloaded into the notebook in the input directory

../input/

you can also click on add data source ,to add other datasets

Settings

Sharing: you can keep your kernel private,or you can also make it public so that others can learn from your kernel.

Adding GPU:You can add a single NVIDIA Tesla K80 to your kernel. One of the major benefits to using Kernels as opposed to a local machine or your own VM is that the Kernels environment is already pre-configured with GPU-ready software and packages which can be time consuming and frustrating to set-up.To add a GPU, navigate to the “Settings” pane from the Kernel editor and click the “Enable GPU” option.

Custom pakage:The kernel has the default pakages,if you need any other pakage you can easily add it by the following ways

  • Just enter the libarary name ,kaggle will download it for you.
  • Enter the user name/repo name

both methods work fine in adding custom pakages.

Kaggle acts as a perfect platform for both providing data,and also the compute to work with the great data provided.It also host various competition one can experiment it out to improve one’s skill set.

For more resource regarding kaggle link here. If you are new to kaggle you should definitely try the titanic dataset it comes with awesome tutorials.

Other resources regarding kaggle ,colab and machine learning follow Siraj Raval, and Yufeng G.

Since I was not able to cover all the services to train ml model online in this post,there will be a part2 to this post.

All the resource need to learn and practice machine learning is open sourced and available online.From Compute, datasets ,algorithms and there are various high quality tutorials available online for free,all you need is an internet connection,and passion to learn.

Thank you for reading till the end,I hope this article would be useful, as it solves the major problem faced by people who are starting the path towards machine learning and data science. If you have enjoyed this article,please let me know by clapping for the article. Queries are most welcomed , you can follow my post in medium maithreyan surya,you can also mail me here.

A video intro for using Colab effectively : https://www.youtube.com/playlist?list=PL9a4goxNJut3qyy56AY6Q9Zb2Nm4CQ3Tu

Machine learning as the potential to transform the world so do you.

--

--

Data Science and Machine Learning practitioner, connect with me in LinkedIn: https//www.linkedin.com/in/maithreyan-kesavan-707b50169/