Speech-to-Text with OpenAI’s Whisper

Easy speech to text

Dhilip Subramanian

Published in

Towards Data Science

4 min readOct 1, 2022

Photo by Guillaume de Germain on Unsplash

OpenAI has recently released a new speech recognition model called Whisper. Unlike DALLE-2 and GPT-3, Whisper is a free and open-source model.

Whisper is an automatic speech recognition model trained on 680,000 hours of multilingual data collected from the web. As per OpenAI, this model is robust to accents, background noise and technical language. In addition, it supports 99 different languages’ transcription and translation from those languages into English.

This article explains how to convert speech into text using the Whisper model and Python. And, it won’t cover how the model works or the model architecture. You can check more about the Whisper here.

Whisper has five models (refer to the below table). Below is the table available on OpenAI’s GitHub page. According to OpenAI, four models for English-only applications, which is denoted as .en. The model performs better for tiny.en and base.en, however, differences would become less significant for the small.en and medium.en models.

Ref: OpenAI’s GitHHub Page

For this article, I am converting Youtube video into audio and passing the audio into a whisper…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with email

Already have an account? Sign in

Published in Towards Data Science

Last published 10 hours ago

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Written by Dhilip Subramanian

Business Intelligence Consultant | Data Engineer

Responses (7)

What are your thoughts?

Also publish to my profile

More from Dhilip Subramanian and Towards Data Science

Stable Diffusion with Automatic 1111 on Jarvislabs

In

Jarvislabs.ai

by

Vishnu Subramanian

Stable Diffusion with Automatic 1111 on Jarvislabs

Generating images with Stable diffusion(SD) has gotten super easy over the last few months. Automatic 1111 is a popular open-source UI tool…

Jan 2, 2023

Water Cooler Small Talk: Benford’s Law

In

Towards Data Science

by

Maria Mouschoutzi, PhD

Water Cooler Small Talk: Benford’s Law

A look into the strange first digit distribution of naturally occurring datasets

5d ago

The Death of Human-Written Code Tutorials in the ChatGPT Era … Or Not?

In

Towards Data Science

by

Murtaza Ali

The Death of Human-Written Code Tutorials in the ChatGPT Era … Or Not?

An argument in favor of human-written coding tutorials in the new age of LLMs.

4d ago

Guide for finetuning Stablediffusion with your images

In

Jarvislabs.ai

by

Vishnu Subramanian

Guide for finetuning Stablediffusion with your images

Create personalized images of you, your favorite character, or your cat

Jan 9, 2023

See all from Dhilip Subramanian

See all from Towards Data Science

Recommended from Medium

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

In

DataDrivenInvestor

by

Austin Starks

I used OpenAI’s o1 model to develop a trading strategy. It is DESTROYING the market

It literally took one try. I was shocked.

Sep 15, 2024

Building a 2 Billion Parameter LLM from Scratch Using Python

In

Level Up Coding

by

Fareed Khan

Building a 2 Billion Parameter LLM from Scratch Using Python

It starts making sense

5d ago

Lists

Predictive Modeling w/ Python

20 stories1789 saves

Practical Guides to Machine Learning

10 stories2165 saves

Natural Language Processing

1889 stories1549 saves

data science and AI

40 stories320 saves

Detailed Guide to Fine-Tuning LLaMA (Large Language Model Meta AI)

Engr Muhammad Tanveer sultan

Detailed Guide to Fine-Tuning LLaMA (Large Language Model Meta AI)

Introduction

Aug 16, 2024

Mastering PyTorch Inference Time Measurement

Mark Ai Code

Mastering PyTorch Inference Time Measurement

Are you looking to optimize your PyTorch models for real-world applications? Understanding how to measure inference time accurately is…

Aug 1, 2024

Backpropagation: The Backbone of Neural Network Training

LM Po

Backpropagation: The Backbone of Neural Network Training

Backpropagation, short for “backward propagation of errors,” is a fundamental algorithm in the training of deep neural networks. It…

Sep 14, 2024

Cosine Learning Rate Schedulers in PyTorch

Utkrisht Mallick

Cosine Learning Rate Schedulers in PyTorch

In machine learning, particularly in deep learning, optimizing model performance requires not only selecting the right architecture but…

Oct 27, 2024

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams