· Motivation
· Content Moderation in Audio Files
· Results
· Conclusion
Motivation
User-generated content is screened and monitored by an online platform based on rules and policies relevant to that platform.
To put it another way, when a user submits content to a website, it typically goes through a screening process (also called the moderation process) to ensure that it adheres to the website’s rules, and is not inappropriate, harassing, illegal, etc.
When texts are exchanged online or on social media, sensitive content can be detected using content moderation models that are typically AI-driven.
In addition to transcribing information from audio or video sources, some of the finest Speech-to-Text APIs include content moderation.
Topics relating to drugs, alcohol, violence, delicate social concerns, hate speech, and more, are frequently among the sensitive content that content moderation APIs challenge to tackle.
Therefore, in this post, I will demonstrate how you can use the AssemblyAI API to detect sensitive content in an audio file.
Let’s begin 🚀 !
Content Moderation in Audio Files
With the help of AssemblyAI, you can moderate and predict mentions of any severity, drug abuse, hate speech, and more in the given audio/video file.
The image below depicts the transcription workflow of AssemblyAI.
Below is the step-by-step tutorial on content moderation on Audio files using AssemblyAI.
The transcription API will perform speech-to-text conversion and detect the sensitive content (if any) in the given file. These include mentions of accidents, disasters, hate speech, gambling, etc.
Step 1: Get the Token
Firstly, we need to get AssemblyAI’s API Token to access the services.
Now that we are ready with the API token, let’s define the headers.
Step 2: Upload the File
Next, we will upload the input audio file to the hosting service of AssemblyAI, which will return a URL that will be used for further requests.
Step 3: Transcription
Once we receive the URL from AssemblyAI, we can proceed with the transcription that will also detect the sensitive content.
Here, we will specify the content_safety
parameter as True
. This will invoke the content moderation models.
Step 4: Fetch Results
A GET request using the id
returned in the POST request is required as the last step. We will make repeated GET requests until the status of the response is marked as ‘completed
‘ or ‘error
‘.
Step 5: Storing the Output
The response from the transcription services is then stored in a text file.
Results
Now, let’s interpret the content moderation output.
The results for content moderation are available under the content_safety_labels
key of the JSON response received from AssemblyAI.
The outer text
field contains the audio file’s text transcription.
Moreover, as shown in the output above, the results of the content safety detection will be added to the content_safety_labels
key.
The description of the keys with the content_safety_labels
key is given below:
results
: This represents a list of segments of the audio transcription that the model classified as sensitive content.results.text
: This field contains the text transcription which triggered the content moderation model.results.labels
: This field contains all the labels corresponding to sentences detected as the sensitive content. The confidence and severity metrics are also included with each JSON object in this list.summary
: This field contains the confidence scores for every label predicted results in the entire audio file.severity_score_summary
: Describes the total impact of each resultant predicted label on the whole audio file.
Each projected label will include its confidence
and severity
values – both of which are different.
The severity
value depicts how severe the flagged content is on a scale of 0–1
. On the other hand, the confidence
score reveals the model’s confidence while predicting the output label.
Conclusion
To conclude, in this article, we discussed the content moderation process of Audio files using AssemblyAI API.
The API endpoint provides capabilities to help you recognize sensitive material in audio and video files.
Additionally, I demonstrated how to interpret the content moderation results and identify whether any sensitive content was detected in the audio input.
Thanks for reading!
🚀 Subscribe to the Daily Dose of Data Science. Here, I share elegant tips and tricks on Data Science, one tip a day. Receive these tips right in your inbox daily.
🧑💻 Become a Data Science PRO! Get the FREE Data Science Mastery Toolkit with 450+ Pandas, NumPy, and SQL questions.
✉️ Sign-up to my Email list to never miss another article on data science guides, tricks and tips, Machine Learning, SQL, Python, and more. Medium will deliver my next articles right to your inbox.