AWS and Python: The Boto3 Package

Domo Arigato, AWS Boto

The Data Detective
5 min readJan 7, 2020

--

It’s 2020 and the world of cloud storage and computing will most likely be the direction of most businesses in the coming decades. The prospect of having scalable storage and computing power without having to purchase physical equipment is very appealing. The three big dogs of the cloud are Amazon Web Service (AWS), Microsoft Azure and Google Cloud Service.

Since Python is one of the most popular languages (at the time of this article), it would make sense to go over the packages that allow the working Data Engineer, Data Scientist and Machine Learning Scientist that uses Python to utilize the power of the cloud, no matter which service you choose. Microsoft Azure has the azure package, Google has its google-cloud packages and AWS has boto3. This article will focus on boto3.

Setting It Up

First, we need to set things up. AWS does offer free services and you can sign up for free. You will need a username and token to log in to boto3 through the backend, so go to https://aws.amazon.com and sign up for a free account. You will also need to have boto3 installed in your IDE, notebook, etc. That is simply done with a pip or conda install boto3.

Next, you will need to create a user through Amazon’s Identity and Access Management console (IAM). Here, you can add users, groups and anything access related. You will need to create a role for yourself so that you can have a permanent authorization token for credential purposes.

Step 1: set user name and access type
Step 2: set permissions (for this article, you will only need SNS, S3, Comprehend and Rekognition). Make sure to select FullAccess for each. If this is a personal account, you can give yourself FullAccess to all of Amazon services, just enter FullAccess in search and check all.
Step 3: add tags (optional) — these are key: value pairs that are additional identifiers for better control

After those steps above, just confirm the settings and the user will receive an access key id and secret access key. These will be needed to start a session of AWS on your local computer.

Start a Session

This is where your time will be saved. Starting a session is as easy as opening up your IDE or notebook, and using the following:

import boto3

s3 = boto3.client(‘service_name’, region_name=’region_name’,
aws_access_key_id=key,
aws_secret_access_key=password)

For context: ‘service_name’ would be which AWS service you are connecting to (S3, SNS, Comprehend, Rekognition, etc) and the region is the region of computing service you are connecting. The region is important because that will in cases determine costs. See the AWS website for a list of services and https://howto.lintel.in/list-of-aws-regions-and-availability-zones/ for a list of regions. Only those services that the user has permission will be accessible. Sorry, as of now, it won’t take a list of services, so you will need to load them one at a time.

Examples of Services and Functions.

S3. AWS’s simple storage solution. This is where folders and files are created and storage takes place. This is a non-relational storage space, so it will take many different types of files. The AWS term for folders is ‘buckets’ and files are called ‘objects’. Here are a few functions for S3:

import boto3, login into ‘s3’ via boto.client#### create bucketbucket = s3.create_bucket(Bucket=’bucket_name’)

#### list bucket
bucket_response=s3.list_buckets() #requests a list of buckets
buckets = bucket_response[‘Buckets’] #sets bucket list from Buckets key
print(buckets)

#### delete bucket
response = s3.delete_bucket(Bucket=‘bucket_to_be_deleted’)

#### upload object
s3.upload_file(Filename=’local_file_path’,
Bucket = ‘bucket_name’,
Key = ‘object_name’,
ExtraArgs={‘ACL’:’acl_type’, #sets access control limit type
‘ContentType’:’content_type’} #to specify type of content (html, jpg, etc.)

#### list objects
#obtain many files
response =s3.list_objects(Bucket=’bucket_name’,
MaxKeys=2 #maximum number of files to list
Prefix=’prefix_of_file_for_search’)
print(response)
#obtain single file
response =s3.head_object(Bucket=’bucket_name’, Key=’file_name’)
print(response)
#### download filess3.download_file(Filename=’local_path_to_save_file’,Bucket=’bucket_name’,Key=’file_name’)

#### delete file

s3.delete_object(Bucket=’bucket_name’,Key=’file_name’)

SNS. AWS’s simple notification system. This service will send notifications to groups and users based on conditions set by the admin. Here are some functions for SNS:

import boto3, login into ‘sns’ via boto.client#### create topicresponse=sns.create_topic(Name=’topic_name’)[‘TopicArn’] #sets topic and grabs response

#### list topics
response=sns.list_topics()

#### delete topics
sns.delete_topic(TopicArn=’full_topic_arn_value’)

#### create subscription
resp_sms = sns.subscribe(TopicArn = topic_name,
Protocol='delivery_method,
Endpoint=’phone_email_etc’)

#### list subscriptions
response = sns.list_subscriptions_by_topic(TopicArn = topic_name)
subs = pd.DataFrame(response['Subscriptions']) #converts list to a df
#### delete subscriptionsns.unsubscribe(SubscriptionArn=’full_sub_arn’)

#### send messages
##### publish to a topicresponse = sns.publish(TopicArn = topic_arn,
Message = ‘body of message’ #can use string formatting,
Subject = ‘Subject line’)

##### sending single sms
response = sns.publish(PhoneNumber = ‘phone_number’,Message = ‘body of message’) #can use string formatting

Comprehend. Comprehend is AWS’s natural language processing service. It can determine what language a file is written, translate, and perform sentiment analysis. Here are those functions:


#### text translate
translate = ‘translate’ via boto.client
response=translate.translate_text(Text=variable,’string’,etc.,
SourceLanguageCode = ‘auto’,
TargetLanguageCode = ’language_to_translate_to’)

#### detecting language
comprehend = ‘comprehend’ via boto.client
response=comprehend.detect_dominant_language(Text=variable,’string’,etc.,)

#### sentiment analysis
comprehend = ‘comprehend’ via boto.client
response=comprehend.detect_sentiment(Text=variable,’string’,etc.,LanguageCode=’language’)

Rekognition. This is AWS’s image recognition service. It actually does a pretty good job of detecting objects and extracting text from images. Here are some functions:

#### object detectionimport boto3, login into ‘s3’ via boto.client
#upload file where images are located
rekog = ‘rekognition’ via boto.client
response=rekog.detect_labels(Image=’image_name’:
{‘Bucket: ’bucket_name’, ’Name’: ’file_name’},
MaxLabels = maximum # of objects to detect
MinConfidence = set confidence level of classification)

#### text detection
import boto3, login into ‘s3’ via boto.client
#upload file where images are located
rekog = ‘rekognition’ via boto.clientresponse = rekog.detect_text(Image=’image_name’:
{‘Bucket: ’bucket_name’, ’Name’: ’file_name’},
MinConfidence = set confidence level of classification)

Obviously, there are many services and a lot more functions that would take quite some time to summarize. This will give you a nice start and the opportunity to go explore AWS services through Python. As always, the repos for these items are available through my Github at https://github.com/Jason-M-Richards/Data-Science-Toolkit.

--

--