A Comprehensive Guide to Moderating Sensitive Audio Content

Content Moderation Made Simple

Published in

Towards Data Science

4 min readDec 13, 2022

· Motivation
· Content Moderation in Audio Files
· Results
· Conclusion

Motivation

User-generated content is screened and monitored by an online platform based on rules and policies relevant to that platform.

To put it another way, when a user submits content to a website, it typically goes through a screening process (also called the moderation process) to ensure that it adheres to the website’s rules, and is not inappropriate, harassing, illegal, etc.

When texts are exchanged online or on social media, sensitive content can be detected using content moderation models that are typically AI-driven.

A high-level overview of the severity prediction model (Image by Author)

In addition to transcribing information from audio or video sources, some of the finest Speech-to-Text APIs include content moderation.

Topics relating to drugs, alcohol, violence, delicate social concerns, hate speech, and more, are frequently among the sensitive content that content moderation APIs challenge to tackle.

Therefore, in this post, I will demonstrate how you can use the AssemblyAI API to detect sensitive content in an audio file.

Let’s begin 🚀!

Content Moderation in Audio Files

With the help of AssemblyAI, you can moderate and predict mentions of any severity, drug abuse, hate speech, and more in the given audio/video file.

The image below depicts the transcription workflow of AssemblyAI.

Transcription workflow to use the AssemblyAI API (Image by author)

Below is the step-by-step tutorial on content moderation on Audio files using AssemblyAI.

The transcription API will perform speech-to-text conversion and detect the sensitive content (if any) in the given file. These include mentions of accidents, disasters, hate speech, gambling, etc.

Step 1: Get the Token

Firstly, we need to get AssemblyAI’s API Token to access the services.

Now that we are ready with the API token, let’s define the headers.

Step 2: Upload the File

Next, we will upload the input audio file to the hosting service of AssemblyAI, which will return a URL that will be used for further requests.

Step 3: Transcription

Once we receive the URL from AssemblyAI, we can proceed with the transcription that will also detect the sensitive content.

Here, we will specify the content_safety parameter as True. This will invoke the content moderation models.

Step 4: Fetch Results

A GET request using the id returned in the POST request is required as the last step. We will make repeated GET requests until the status of the response is marked as ‘completed’ or ‘error’.

Step 5: Storing the Output

The response from the transcription services is then stored in a text file.

Results

Now, let’s interpret the content moderation output.

The results for content moderation are available under the content_safety_labels key of the JSON response received from AssemblyAI.

The outer text field contains the audio file’s text transcription.

Moreover, as shown in the output above, the results of the content safety detection will be added to the content_safety_labels key.

The description of the keys with the content_safety_labels key is given below:

results: This represents a list of segments of the audio transcription that the model classified as sensitive content.
results.text: This field contains the text transcription which triggered the content moderation model.
results.labels: This field contains all the labels corresponding to sentences detected as the sensitive content. The confidence and severity metrics are also included with each JSON object in this list.
summary: This field contains the confidence scores for every label predicted results in the entire audio file.
severity_score_summary: Describes the total impact of each resultant predicted label on the whole audio file.

Each projected label will include its confidence and severity values — both of which are different.

The severity value depicts how severe the flagged content is on a scale of 0–1. On the other hand, the confidence score reveals the model's confidence while predicting the output label.

Conclusion

To conclude, in this article, we discussed the content moderation process of Audio files using AssemblyAI API.

The API endpoint provides capabilities to help you recognize sensitive material in audio and video files.

Additionally, I demonstrated how to interpret the content moderation results and identify whether any sensitive content was detected in the audio input.

Thanks for reading!

🚀 Subscribe to the Daily Dose of Data Science. Here, I share elegant tips and tricks on Data Science, one tip a day. Receive these tips right in your inbox daily.

🧑‍💻 Become a Data Science PRO! Get the FREE Data Science Mastery Toolkit with 450+ Pandas, NumPy, and SQL questions.

✉️ Sign-up to my Email list to never miss another article on data science guides, tricks and tips, Machine Learning, SQL, Python, and more. Medium will deliver my next articles right to your inbox.