PYTHON LIBRARY INTRO

Deep Learning on AWS for a Fraction of the Cost

Introducing the spot-connect module for python

Carlos Valcarcel
Towards Data Science
8 min readJun 5, 2020

--

Photo by Tim Mossholder on Unsplash

Spot-connect is a python module for any programmer looking for an easy way to use cloud computing.

Originally developed for machine learning projects, spot-connect makes it easy to work with Amazon Web Services and EC2 instances.

The services provided by AWS and the steps to setup an account are covered in the Getting to Know AWS and Setting Up Your AWS Account sections of this older article.

If you plan to try spot-connect and have not already installed and configured the AWS command line (pip install awscli , aws config) or if you have not created an access key for your AWS account, please read the referenced sections.

Spot-instances run on hardware that AWS is not able to rent out. There is never enough demand to rent out all of Amazon’s machines so they rent out their excess capacity at a fraction of the original price to make up some of their costs.

The only catch is that you can be kicked off the instance if demand increases, which is more likely to happen with high end instances (you get a 2-minute warning before you’re kicked off so you can save/transfer your work).

The spot-connect module lets you create and manage spot instances. It can be used from a command line, script or jupyter notebook. It focuses on managing EC2 resources (AWS’s cloud computing service) but provides additional functionality to interact with storage options like elastic file systems and S3.

What follows is a walk-through on how to use the module.

Installation

pip install spot-connect 

Works with python 3.*

Command-Line Use Case

Once you have installed the module you can use it from your prompt

spot_connect -n instance_1 -p t2.micro -a True

Using spot_connect from the command prompt also lets you connect the prompt directly to your instance.

Launch an instance with spot-connect, then run linux commands directly on the instance using a linked prompt (personal screen-capture).

The example above creates and instance called “instance_1” with the “t2.micro” profile. It connects to an elastic file system named “stockefs” and connects the prompt to the newly created instance.

The gif has been sped up. Launching an instance usually takes a couple of minutes or more depending on the specs. If you connect using the name of an existing instance then you will simply be reconnected to that instance immediately.

The most useful arguments for spot_connect from the command prompt are:

  • name (-n) : the name of the spot instance*
  • profile (-p) : the name of the profile (preset configs that specify instance type, max bid price, etc… more on these below). **
  • file system (-f) : elastic file system to connect to. If it does not exist, one will be created (does not have to be the same as the instance name).
  • script (-s) : script to run on the instance as soon as its ready (if you’re using a linux os on the instance this would have to be a bash script, …).
  • upload (-u) : upload any file you want to the instance (useful for small uploads, otherwise use S3 transfers, more below).
  • remote path (-r) : uploads will be uploaded to this path on the instance.
  • active prompt (-a) : if True, connect the prompt to the instance (can only be done from a command prompt).
  • instance profile (-ip) : access role for instances (different from profile). This grants the instance access to other AWS resources like S3. **

* = required
** = required only when creating an instance (not reconnecting)

Having the ability to connect your prompt directly to the instance is useful for troubleshooting scripts or running any other checks.

Picking an Instance Type

All you need to do to launch an instance with spot-connect is give the instance a name and a profile.

Profiles are preset instance specifications that define things like the instance type, maximum bid price, region, and other connection options. These can be found in the profiles.txt file that gets downloaded with the module when it is installed.

Available instance types, pricing, and image/AMI ids all change by region. Spot-connect comes with regional instance data and AMI id data that was scraped from the AWS website.

You can change the default profile settings using the spot-connect module:

from spot_connect import sutils 
sutils.reset_profiles()

The reset_profiles() method will show you a list of regions and then a list of AMIs, each time asking you to choose one. Your local copy of profiles.txt will then be updated to use that region and AMI.

Notebook Use Case

For notebooks and scripts, the SpotInstance class is equivalent to using spot_connect from the command prompt.

from spot_connect.spotted import SpotInstance
instance = SpotInstance('instance1', profile='t2.micro', price=0.01)

This command creates or connects to the “instance1” instance. A SpotInstance object can be instantiated with the same arguments as spot_connect. Additionally, SpotInstance also lets you specify profile parameters, allowing you to override any default settings in a profile like price which would let you submit a custom maximum bid directly.

With SpotInstance you can download and upload files,

instance.upload('test_run.py') # script creates a test.txt file

run scripts,

instance.run('run_test.sh') # bash script is: "python test_run.py\n" 

run commands directly on the instance,

instance.run('ls', cmd=True) # use the cmd option to pass commands

and even terminate the instance.

instance.terminate()         # terminate the instance 
instance.refresh_instance() # update the instance status

Run commands directly on the instance using the run(<command>, cmd=True) method. However, be aware that these commands are always run in the home directory which means that changing directories in one run will not mean you start in that directory in the next run.

Fortunately, you can use \n to run continuous commands together

instance.run('cd\nrm test.txt', cmd=True) 

This can make it easy to store complex scripts as functions that can be used directly in python. For example:

def transfer_script(bucket, efs_path):
'''Download S3 data and place it in the EFS folder'''
# The first command is the download
# Use "nohup" to run the job in the background of the instance

script ='nohup aws s3 sync '+bucket+' '+efs_path+'\n'

# Use "curpid=$!" to get the job-id for the sync job

script +='curpid=$!\n'
# Three part command in linux
# 1) In the background, wait for all jobs to finish
# 2) when finished, run command to shutdown the instance,
# 3) place all the output from the sync job in transfer.txt

script +="nohup sh -c 'while ps -p $0 &> /dev/null;
'do sleep 10; done && sudo shutdown -h now'
$curpid &> transfer.txt &\n"

return script
script = transfer_script("s3://data", "/home/ec2-user/efs/")
instance.run(command, cmd=True)

Instance Manager

Launching an instance is nice, but the goal of spot-connect is to facilitate distributing workloads across a number of instances. That is where the InstanceManager class comes in.

The InstanceManager lets you launch and keep track of multiple instances at once. It also provides functionality to transfer files between instances and S3, run distributed workloads and more.

Below is an example of how one might organize a the execution tasks across several instances:

1. Instantiate an InstanceManager with efs='data' so instances created by the manager automatically link to the EFS named “data.”

from spot_connect.instance_manager import InstanceManager 
im = InstanceManager(efs='data')

2. Launch an instance using the launch_instance method. Instances created with im can be found in im.instances .

im.launch_instance('monitor', profile='t2.micro')

3. Submit the “monitor” instance to the clone_repo method to clone the Github repo for the project into the efs.

im.clone_repo(im.instances['monitor'], 
'https://github.com/FlorentF9/DeepTemporalClustering.git',
directory='/home/ec2-user/efs/') # default path

4. Run the commands cd efs followed by mkdir results to create a results folder in the EFS.

im.instances['monitor'].run('cd efs\nls', cmd=True)

5. Design a method that can take the arguments for each job and return a script that can be run on each instance.

def runDTC(n_clusters):
'''Train a DTC model on the package data with n_clusters'''
# Change the directory to the project directory
script = 'cd /home/ec2-user/efs/DeepTemporalClustering/\n'
# Train the DTC algo in the background of the instance
script+= 'nohup python DeepTemporalClustering '+str(n_clusters)+' --savedir /home/ec2-user/efs/results\n'
# Get the job ID for the training job
script +='curpid=$!\n'
# Shut down the instance once that job is done
script +="nohup sh -c 'while ps -p $0 &> /dev/null; 'do sleep 10; done && sudo shutdown -h now' $curpid &> transfer.txt &\n"
return script

6. Train models for a series of “n_cluster” values. Train each model on a separate instance using the run_distributed_jobs method.

# Get a list of scripts to run, one for each instance 
scripts = [runDTC(i) for i in range(10, 61, 10)]
im.run_distributed_jobs(
"DTC", # prefix given to each instance name
len(scripts), # number of instances to launch
scripts, # the scripts to run on each instance
'p2.xlarge' # use this instance type for each instance
)

7. Check on the status of the instances, wait util all of them have terminated (finished their jobs).

In [7]: im.show_instances()
Out[7]: {'monitor': 'running',
'DTC_1': 'terminated',
'DTC_2': 'terminated',
'DTC_3': 'terminated',
'DTC_4': 'shutting-down',
'DTC_5': 'running',
'DTC_6': 'running'}

8.Use the instance_s3_transfer method to upload the results to S3. An instance profile must be provided to let the instance access S3.

im.instance_s3_transfer("/home/ec2-user/efs/results",
"s3://bucket_data",
"s3access") # my instance profile

Once in S3 you may download the results directly through the console or using the awscli from a command prompt:

aws s3 sync "s3://bucket_data/results" "<local folder>"

Before closing out this section its worth covering some useful bash commands:

  • nohup : when placed a the beginning of a command, the command will finish even if the user logs out.
  • & : when placed after your command, will send the command to the background so you can continue to use the prompt.
  • > something.txt : at the end of your command directs any output generated by the command to the desired text. Add another & after the .txt to supress the entire line and continue using the prompt.

Wrap-up

That is basically it for the spot-connect module. Here is the Github repo for the project where you can find a walk-through notebook (still needs to be updated to include the DTC example in this article) as well as the notebook I used to scrape the pricing and image data from AWS.

I develop the module depending on my schedule but the basic functionality presented in this article is a stopping point for a while for me. Contributions are very welcome. Here are the features I that think are worth pursuing:

  • 2-minute shut-down warning handling
  • use of AWS spot-fleet functionality from boto3
  • expanding storage management functionality (S3, EFS, EBS)
  • providing the ability to run and coordinate data-sync agents
  • improving general elegance of the code

Thanks for reading!

:)

--

--