
Introduction
In one of my latest articles we explored how to perform offline speech recognition with AssemblyAI API and Python. In other words, we uploaded the desired audio file to a hosting service and then we used the transcript endpoint of the API in order to perform speech-to-text.
In today’s guide we will showcase how to perform real-time speech-to-text using the real-time transcription feature of AssemblyAI API that lets us transcribe audio-streams in real-time with high accuracy.
Let’s get started!
Installing PyAudio and Websockets
In order to be able to build real-time speech recognition we need a tool that will let us record audio. PyAudio, is a Python library that provides bindings for PortAudio, the cross-platform audio I/O library. Using this library we can play or record audio at real-time on pretty much any platform including OSX, Linux and MS Windows.
First, we need to install portaudio
. On OSX you can do so by using HomeBrew:
brew install portaudio
and then install PyAudio from PyPI:
pip install pyaudio
If you are on Windows, you can install PyAudio through a wheel file you can find here, based on your Python version.
Additionally, we’ll need to install websockets
pip install websockets
If you want to follow along this tutorial, all you need is an API Key that you could get if you sign up for an AssemblyAI account. Once you do so, your key should be visible on your Account section. Additionally, you’ll need to upgrade your account (go to Billing to do so) in order to access premium features.
Real-time Speech-to-Text using AssemblyAI API
AssemblyAI offers a Speech-To-Text API that is built using advanced Artificial Intelligence methods and facilitates transcription of both video and audio files. In today’s guide we are going use this API in order to perform speech recognition at real-time!
Now the first thing we need to do is open a stream using PyAudio by specifying a few parameters such as the frames per buffer, sample rate, format and number of channels. Our stream will look like the one shown in the code snippet below:
The above code snippet will open an audio stream that receives the input from our microphone.
Now that we’ve opened the stream, we somehow need to pass it to AssemblyAI at real time using web-sockets in order to perform speech recognition. To do so, we need to define an asynchronous function that will open a websocket so that we can send and receive data at the same time. Therefore, we need to define two inner asynchronous functions – one will be used for reading in chunks of data and the second one will be used for receiving chunks of data.
Here’s the first method for sending messages over the websocket connection:
And here’s the second method used for receiving data over the websocket connection:
Note that in the methods above we are not handling any specific exceptions but you may wish to process different exceptions and error codes appropriately as required by your specific use case. For more details regarding the error conditions you can refer to the relevant section of the official AssemblyAI’s documentation that defines closing and status codes.
Now the main asynchronous function that makes uses of the two aforementioned functions is defined below.
Full Code
The Gist below contains the full code that we are going to use in order to perform real-time speech-to-text using AssemblyAI’s API.
Demonstration
Now we have a complete code that is capable of opening an audio stream and sending the input from our microphone and receive the response from AssemblyAI API asynchronously.
In order to run the code all we need to do is pass our async function to asyncio.run()
:
asyncio.run(speech_to_text())
Now you should be able to speak through your microphone and transcribe the streamed audio.
For the purposes of this tutorial, I’ve uploaded the audio I streamed from my computer in order to perform speech-to-text at real time.
The output after running the program we’ve just written using the above audio stream is shown below:
Final Thoughts
In today’s article we explored how to perform Speech Recognition at real-time, by opening an audio stream and websockets in order to interact with AssemblyAI API. Note we only covered just a small subset of the overall features provided by AssemblyAI API. Make sure to check their full list here.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You’ll also get full access to every story on Medium.