The world’s leading publication for data science, AI, and ML professionals.

Sentiment Analysis & Entity Extraction with AWS Comprehend

Quick overview of using AWS Lambda, Boto3, and Comprehend for high-level NLP tasks in Python.

Image from Unsplash
Image from Unsplash

Introduction

Amazon Web Services (AWS) has been constantly expanding its Machine Learning services in various domains. AWS Comprehend is the AWS powerhouse for Natural Language Processing (NLP). Two common projects in NLP include Sentiment Analysis and Entity Extraction. Often times we build Custom Models from scratch using libraries such as NLTK, Spacy, Transformers, etc. While custom models definitely have their purpose and perform especially well when you have domain knowledge of the problem you are attacking, they also are very time-consuming to build from ground-up. This is where AWS Comprehend comes in, offering high-level services for Sentiment Analysis and other NLP tasks. For this article, we will be using Comprehend for Sentiment Analysis and Entity Detection. To access these services through AWS Comprehend we use another AWS service called AWS Lambda. Lambda is a serverless computing platform that allows you to call these services through Boto3, the AWS SDK for Python. I’ll provide a list of the services we are going to be using along with more in-depth definitions following this blurb, but feel free to skip to the code demonstration of Sentiment Analysis & Entity Extraction if already familiar with these services.

Table of Contents

  1. AWS Services
  2. Sentiment Analysis with Aws Comprehend
  3. Entity Detection with AWS Comprehend
  4. Code & Conclusion

AWS Services

AWS Comprehend: AWS NLP service that uses ML to perform tasks such as Sentiment Analysis, Entity Extraction, Topic Modeling, and more. For this example we are only exploring two of these tasks.

Amazon Comprehend – Natural Language Processing (NLP) and Machine Learning (ML)

AWS Lambda: A serverless computing service, that allows developers to run code without managing or provisioning servers.

AWS Lambda – Serverless Compute – Amazon Web Services

Boto3: AWS Software Development Kit (SDK) for Python developers, can use this in your Lambda functions to call the Comprehend API and its specific services.

Boto3 documentation – Boto3 Docs 1.16.6 documentation

AWS SageMaker: Allows for the building, training, and deploying of custom ML models. Also includes various pre-trained AWS models that can be used for specific tasks.

Amazon SageMaker

Sentiment Analysis with AWS Comprehend

Data

You can access data in AWS in many ways. S3 is the most popular data storage choice for most developers and it is the most frequently used in real time projects or for large datasets. However, since all we have is a basic example, we will be using a few sentences of text in JSON format for our AWS Lambda function to access.

Code & Explanation of Sentiment Analysis

Once you have extracted the data either through a JSON, S3 or whatever storage format you are using, it’s time to see the magic of Comprehend. In nearly 5 lines of code, the dirty work done in custom model building & tuning is taken care of. The two parameters you have to input are the text you are analyzing and the language your text is in. As you can see in the results below, a general Sentiment of Positive, Negative, or Neutral is returned, as well as a percentage for each of these respective sentiments.

Results from Sentiment Analysis call (Screenshot by Author).
Results from Sentiment Analysis call (Screenshot by Author).

One key point to note with the detect_sentiment call is that the text string cannot be larger than 5000 bytes of UTF-8 encoded characters. When dealing with larger documents or strings, a solution to this can be using the batch_detect_sentiment call, which allows for up to 25 strings/documents all capped at 5000 bytes. The process is once again very similar in the batch call, but depending on your specific use case you can figure out how to preprocess/split your data for the batch call.

Entity Extraction with AWS Comprehend

Similar to the Sentiment Analysis call, the detect_entities call takes two arguments in the text input and the language of the text. There’s also a third argument for custom models with an endpoint ARN to access the model you have created for entity extraction rather than the default Comprehend model. As you can see in the results below, the call returns the confidence it has in its detection, type of entity, the text that was identified as an entity, and the location of the text. For example, we see New York and Virginia in the Location type of entity.

Code & Explanation Entity Extraction

Results from Entity Detection Call (Screenshot by Author).
Results from Entity Detection Call (Screenshot by Author).

Similarly to detect_sentiment, detect_entities can’t take a string larger than 5000 bytes of UTF-8 encoded characters. Luckily we also have a batch_detect_entities call which also allows for up to 25 strings/documents all capped at 5000 bytes.

Entire Code & Conclusion

RamVegiraju/ComprehendDemo

Boto3 paired with AWS Comprehend lets non-ML engineers or Data Scientists easily do many tasks that normally take hours of time. Of course, a Custom Model will perform much more accurately given domain knowledge and time to analyze the problem, but Comprehend eliminates much of the time taken to preprocess, clean, build, and train your own model. For those interested in building Custom Models in NLP or any other domain in ML, AWS SageMaker allows for you to build, train, and deploy your own models in a Jupyter Notebook environment and much more conventional ML workflow.

I hope that this article has been useful for anyone trying to work with AWS services and NLP. Feel free to leave any feedback in the comments or connect with me on Linkedln if interested in chatting about ML & Data Science. Thank you for reading!


Related Articles