Speech-to-Text with OpenAI’s Whisper

Easy speech to text

Dhilip Subramanian
Towards Data Science

Photo by Guillaume de Germain on Unsplash

OpenAI has recently released a new speech recognition model called Whisper. Unlike DALLE-2 and GPT-3, Whisper is a free and open-source model.

Whisper is an automatic speech recognition model trained on 680,000 hours of multilingual data collected from the web. As per OpenAI, this model is robust to accents, background noise and technical language. In addition, it supports 99 different languages’ transcription and translation from those languages into English.

This article explains how to convert speech into text using the Whisper model and Python. And, it won’t cover how the model works or the model architecture. You can check more about the Whisper here.

Whisper has five models (refer to the below table). Below is the table available on OpenAI’s GitHub page. According to OpenAI, four models for English-only applications, which is denoted as .en. The model performs better for tiny.en and base.en, however, differences would become less significant for the small.en and medium.en models.

Ref: OpenAI’s GitHHub Page

For this article, I am converting Youtube video into audio and passing the audio into a whisper…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Responses (7)

What are your thoughts?

Great article. I find Whisper fascinating. I feel that Whisper can really be used for the good, but can it also be used for wrong-doing? I wrote something about the topic myself:

--

Bro how to get timestamps on this.

--

bro antha dialgaoue kagave claps aaa podlam

--