A Step-by-Step Guide to Detecting Topics in an Audio File

Topic Detection Made Easy

Published in

Towards Data Science

6 min readOct 12, 2022

· Introduction
· Introduction to Topic Detection
· Detecting Topics from an Audio File
· Insights
· Conclusion

Introduction

Topic Detection (also known as Topic Modeling) is a technique to identify the broad topics in a given piece of information.

Topic Modeling is sometimes misconstrued with summarization in natural language processing. However, they are different.

With summarization, the objective is to generate an interpretable and readable textual summary of the information at hand.

Whereas with topic modeling, the aim is to predict topics corresponding to the input.

From the perspective of building a data-driven intelligent machine learning system, the goal here is to automatically label the given information with a list of topic classes as defined in the problem.

A high-level workflow of a Topic Detection Pipeline (Image by Author)

From the applicability standpoint, Topic Modeling holds immense potential in driving crucial business decisions. This is because it allows businesses to generate critical insights from large amounts of data.

Moreover, with the introduction of intelligence, the process has been applied to a large scale — increasing business output and efficiency.

While the machine learning models for topic modeling can be built using both supervised and unsupervised learning approaches, the former is much more prevalent in building accurate and custom topic-tagging models.

Therefore, in this post, I will demonstrate how you can extract topics from an audio file. To achieve this, we will use the AssemblyAI API and build a Topic Extractor in Python.

Let’s begin 🚀!

Introduction to Topic Detection

As elaborated above, topic detection encompasses a set of techniques in natural language processing to identify potential topics in a given piece of information.

From the technical perspective of building an intelligence-embedded topic detection system, the techniques leveraged to solve this problem can be broadly categorized into two categories:

Categorization of Topic Detection Techniques (Image by Author)

Supervised Topic Modeling

As the name suggests, supervised techniques require labeled data for detecting topics to train a topic detection model — making it more of a classification problem.

The topic tags, in this approach, have a pre-defined notation and are related to the problem at hand.

Supervised Topic Detection (Image by Author)

Therefore, the model’s output is entirely interpretable, and it is easy to evaluate the performance of a supervised topic modeling approach.

Unsupervised Topic Modeling

In contrast to supervised learning approaches, unsupervised techniques used unlabeled data to generate topics.

Generally speaking, the techniques used to extract topics in an unsupervised fashion lie under the umbrella of traditional clustering algorithms and their variations.

Unsupervised Topic Detection (Image by Author)

The topic tags, in this approach, do not have a pre-defined notation, and it is left to the model’s proficiency to identify potential topics — making it difficult to evaluate the performance.

Detecting Topics from an Audio File

Now that we have a brief understanding of the problem and how it is typically solved using machine learning, in this section, I will demonstrate the use of AssemblyAI API to extract topics from the input audio file.

The AssemblyAI API takes a supervised approach to topic modeling, capable of identifying 698 potential categorical topics from the input audio/video. (Read more: here)

For this tutorial, I will use this audio file.

The steps to extract topics from the audio file are demonstrated below:

Step 1: Import Dependencies

To run this project in Python, you should import the following libraries:

Step 2: Get the API Token

Next, to access the AssemblyAI services, you should generate an API Access token.

Create an account on the AssemblyAI website and get the access token.

For this project, let’s define it as assemblyai_auth_key.

Step 3: Define the Transcription Endpoint and Input URL.

The headers dictionary holds the API access token and the transcription_endpoint defines the API endpoint to be invoked for transcribing audio files.

Step 4: Post Transcription Request

As an argument to this method, we pass the input URL. While posting the transcription request, we specify iab_categories as True in the json object.

Next, we post the transcription request to the transcription_endpoint of AssemblyAI. Once we create a request, we return the request identifier (id), which we can use to retrieve the results later.

Step 5: Get Transcription Result

The next step in the process is to retrieve the results of transcription.

We define the get_transcription_result() method above, which takes the transcription_id returned by the post_transcription_request() method as an argument.

We return the results once the status of the request changes to completed or unavailable (indicating an error).

Step 6: Execute the Pipeline

Now that our functions and other variables have been defined, we can execute the topic detection pipeline.

The transcription results (results) returned by AssemblyAI is a JSON response that holds the topics detected in the input file.

We can refer to the iab_categories_result key to view the topic modeling results.

Below, let’s understand the information this key (iab_categories_result) includes.

results: This is a list of topics the model predicted, along with the corresponding text that generated the topic.
results.text: Within the results key, the text key depicts the transcription text.
results.labels: Within the results key, the labels key holds the list of topics identified. Further, the relevance key is a confidence score (between 0 and 1) to estimate the relevance of each label corresponding to the text.
summary: This holds all the unique topics identified in the input file and their relevance to the entire input audio file.

Insights

Once the results are ready, we can perform the following analysis to understand the quality and distribution of the topics detected.

For better readability, let’s gather the results from the iab_categories_result key and convert them to a Pandas DataFrame.

Next, let’s generate a similar DataFrame for the summary labels in the audio results.

Distribution of All Topic Labels

The audio was primarily centered around Political News, covering topics like War and Conflicts, Elections, etc.

Most Relevant Topic to the Audio

We can find the three most relevant topics to the audio using the nlargest() method as follows:

Conclusion

To conclude, in this post, we discussed a popular natural language processing use case of the AssemblyAI API.

Specifically, we saw how to extract topics from a pre-recorded audio file and interpret the results obtained.

Thanks for reading!

🚀 Subscribe to the Daily Dose of Data Science. Here, I share elegant tips and tricks on Data Science, one tip a day. Receive these tips right in your inbox daily.

🧑‍💻 Become a Data Science PRO! Get the FREE Data Science Mastery Toolkit with 450+ Pandas, NumPy, and SQL questions.

✉️ Sign-up to my Email list to never miss another article on data science guides, tricks and tips, Machine Learning, SQL, Python, and more. Medium will deliver my next articles right to your inbox.