What is Speech Recognition
Speech Recognition, which is also known as Automatic Speech Recognition or Speech-to-Text, is a field that lies ** in the intersection of _Computer Scienc_e and Computational Linguistics that develops certain techniques enabling computer systems to process human speech and convert it into textual format. In other words, speech recognition methodologies and tools are used to translate speech from verbal format into tex**t.
The best performing algorithms used in the setting of Speech Recognition utilise techniques and concepts from the fields of Artificial Intelligence and Machine Learning. Most of these algorithms improve over time as they are capable of enhancing their capabilities and performance through the interactions.
Applications of speech recognition can be found in numerous industries that use such technologies in order to help users, consumers and businesses be more efficient. Virtual agents such as Apple Siri, Amazon Alexa and Google Assistant utilise speech recognition technology that enables the access to certain functionality through voice commands. Other applications of speech recognition include voice-activated navigation systems in cars, document dictation in healthcare and even voice-based authentication in the setting of security.
Due to the increasing demand of speech recognition technologies the field has seen great development that makes it easy for developers and organisations to incorporate them into their code bases and products respectively. In the following sections, we will explore how to perform speech recognition with Python and AssemblyAI API, in just a few lines of code.
Speech-to-Text using AssemblyAI API
AssemblyAI offers a powerful Speech-To-Text API which is powered by advanced AI and enables user accurately transcribe audio and video files. In today’s guide we are going use this API in order to perform speech recognition over an mp3 audio file.
If you want to follow along this tutorial, all you need is an API Key that you could get if you sign up for a free AssemblyAI account. Once you do so, your key should be visible on your Account section.
For the purposes of this tutorial, I have prepared a short audio file that you can find below. Feel free to create your own or use the one I already created for you.
Now going forward, we will be using the requests
library in order to call AssemblyAI API that requires the API key you obtained as well as the headers defined below.
The next step, is to read in the file and upload it on AssemblyAI hosting service in order to get back a link that we’ll then use it in order to transcribe the actual audio.
Now that we have successfully uploaded our audio file to AssemblyAI’s hosting service, we can go ahead and send upload url granted in the previous step to the AssemblyAI’s transcription endpoint. An example response from the endpoint is shown in the comments at the end of the Gist.
We can finally access the transcription result by providing the transcript ID we received in the response from the transcript endpoint of the previous step. Note that we will have to make repeated GET
requests until the status in the response is either completed
or error
in case the audio file failed to process.
An example response from the transcript endpoint which is received upon successful completion is shared as a comment at the end of the Gist below.
Finally, assuming that the file processing ends successfully we can then write the final response into a text file.
For our example audio file the output we get from AssemblyAI Speech-To-Text API and will be written to the output text file is
You know, demons on TV like that. And for people to expose themselves to being rejected on TV or humiliated by fear factor or.
which is pretty accurate!
Full Code
The full code we used in order to use AssemblyAI Speech-to-Text API can be found in the GitHub Gist below. To summarise, the code will upload an audio file from your local system to AssemblyAI hosting service, and will then submit it to the transcript service that will perform the speech-to-text task. Finally, we process the output response and store it into a text file on our local filesystem.
Note that in this tutorial I’ve uploaded a local file to the AssemblyAI hosting service but you can even submit an audio URL from any cloud service such as AWS. For more details and examples, you can refer to this section in AssemblyAI official documentation.
Final Thoughts
Speech Recognition is a rapidly-evolving field that has been greatly benefited from the huge advancement of Machine Learning, Artificial Intelligence and Natural Language Processing. Due to the increasing demand of speech-to-text applications, a huge variety of tools were made available that enable quick access to such technologies.
In today’s article we explored how you can quickly perform Speech Recognition in Python with only a few lines of code, using AssemblyAI which is a powerful API used by thousands of organisations across the globe. The API offers a wide range of features we haven’t covered in this article but you can explore here.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.