AWS and Python: The Boto3 Package
Domo Arigato, AWS Boto
It’s 2020 and the world of cloud storage and computing will most likely be the direction of most businesses in the coming decades. The prospect of having scalable storage and computing power without having to purchase physical equipment is very appealing. The three big dogs of the cloud are Amazon Web Service (AWS), Microsoft Azure and Google Cloud Service.
Since Python is one of the most popular languages (at the time of this article), it would make sense to go over the packages that allow the working Data Engineer, Data Scientist and Machine Learning Scientist that uses Python to utilize the power of the cloud, no matter which service you choose. Microsoft Azure has the azure package, Google has its google-cloud packages and AWS has boto3. This article will focus on boto3.
Setting It Up
First, we need to set things up. AWS does offer free services and you can sign up for free. You will need a username and token to log in to boto3 through the backend, so go to https://aws.amazon.com and sign up for a free account. You will also need to have boto3 installed in your IDE, notebook, etc. That is simply done with a pip or conda install boto3.
Next, you will need to create a user through Amazon’s Identity and Access Management console (IAM). Here, you can add users, groups and anything access related. You will need to create a role for yourself so that you can have a permanent authorization token for credential purposes.
After those steps above, just confirm the settings and the user will receive an access key id and secret access key. These will be needed to start a session of AWS on your local computer.
Start a Session
This is where your time will be saved. Starting a session is as easy as opening up your IDE or notebook, and using the following:
import boto3
s3 = boto3.client(‘service_name’, region_name=’region_name’,
aws_access_key_id=key,
aws_secret_access_key=password)
For context: ‘service_name’ would be which AWS service you are connecting to (S3, SNS, Comprehend, Rekognition, etc) and the region is the region of computing service you are connecting. The region is important because that will in cases determine costs. See the AWS website for a list of services and https://howto.lintel.in/list-of-aws-regions-and-availability-zones/ for a list of regions. Only those services that the user has permission will be accessible. Sorry, as of now, it won’t take a list of services, so you will need to load them one at a time.
Examples of Services and Functions.
S3. AWS’s simple storage solution. This is where folders and files are created and storage takes place. This is a non-relational storage space, so it will take many different types of files. The AWS term for folders is ‘buckets’ and files are called ‘objects’. Here are a few functions for S3:
import boto3, login into ‘s3’ via boto.client#### create bucketbucket = s3.create_bucket(Bucket=’bucket_name’)
#### list bucketbucket_response=s3.list_buckets() #requests a list of buckets
buckets = bucket_response[‘Buckets’] #sets bucket list from Buckets key
print(buckets)
#### delete bucketresponse = s3.delete_bucket(Bucket=‘bucket_to_be_deleted’)
#### upload objects3.upload_file(Filename=’local_file_path’,
Bucket = ‘bucket_name’,
Key = ‘object_name’,
ExtraArgs={‘ACL’:’acl_type’, #sets access control limit type
‘ContentType’:’content_type’} #to specify type of content (html, jpg, etc.)
#### list objects#obtain many files
response =s3.list_objects(Bucket=’bucket_name’,
MaxKeys=2 #maximum number of files to list
Prefix=’prefix_of_file_for_search’)
print(response)
#obtain single file
response =s3.head_object(Bucket=’bucket_name’, Key=’file_name’)
print(response)#### download filess3.download_file(Filename=’local_path_to_save_file’,Bucket=’bucket_name’,Key=’file_name’)
#### delete file
s3.delete_object(Bucket=’bucket_name’,Key=’file_name’)
SNS. AWS’s simple notification system. This service will send notifications to groups and users based on conditions set by the admin. Here are some functions for SNS:
import boto3, login into ‘sns’ via boto.client#### create topicresponse=sns.create_topic(Name=’topic_name’)[‘TopicArn’] #sets topic and grabs response
#### list topicsresponse=sns.list_topics()
#### delete topicssns.delete_topic(TopicArn=’full_topic_arn_value’)
#### create subscriptionresp_sms = sns.subscribe(TopicArn = topic_name,
Protocol='delivery_method,
Endpoint=’phone_email_etc’)
#### list subscriptionsresponse = sns.list_subscriptions_by_topic(TopicArn = topic_name)
subs = pd.DataFrame(response['Subscriptions']) #converts list to a df#### delete subscriptionsns.unsubscribe(SubscriptionArn=’full_sub_arn’)
#### send messages##### publish to a topicresponse = sns.publish(TopicArn = topic_arn,
Message = ‘body of message’ #can use string formatting,
Subject = ‘Subject line’)
##### sending single smsresponse = sns.publish(PhoneNumber = ‘phone_number’,Message = ‘body of message’) #can use string formatting
Comprehend. Comprehend is AWS’s natural language processing service. It can determine what language a file is written, translate, and perform sentiment analysis. Here are those functions:
#### text translatetranslate = ‘translate’ via boto.client
response=translate.translate_text(Text=variable,’string’,etc.,
SourceLanguageCode = ‘auto’,
TargetLanguageCode = ’language_to_translate_to’)
#### detecting languagecomprehend = ‘comprehend’ via boto.client
response=comprehend.detect_dominant_language(Text=variable,’string’,etc.,)
#### sentiment analysiscomprehend = ‘comprehend’ via boto.client
response=comprehend.detect_sentiment(Text=variable,’string’,etc.,LanguageCode=’language’)
Rekognition. This is AWS’s image recognition service. It actually does a pretty good job of detecting objects and extracting text from images. Here are some functions:
#### object detectionimport boto3, login into ‘s3’ via boto.client
#upload file where images are located
rekog = ‘rekognition’ via boto.client
response=rekog.detect_labels(Image=’image_name’:
{‘Bucket: ’bucket_name’, ’Name’: ’file_name’},
MaxLabels = maximum # of objects to detect
MinConfidence = set confidence level of classification)
#### text detectionimport boto3, login into ‘s3’ via boto.client
#upload file where images are locatedrekog = ‘rekognition’ via boto.clientresponse = rekog.detect_text(Image=’image_name’:
{‘Bucket: ’bucket_name’, ’Name’: ’file_name’},
MinConfidence = set confidence level of classification)
Obviously, there are many services and a lot more functions that would take quite some time to summarize. This will give you a nice start and the opportunity to go explore AWS services through Python. As always, the repos for these items are available through my Github at https://github.com/Jason-M-Richards/Data-Science-Toolkit.