OpenAI

OpenAI Whisper Tutorial: Unlocking Speech Recognition Capabilities

OpenAI Whisper tutorial with code examples for speech recognition.

Introducing Whisper: OpenAI's Groundbreaking Speech Recognition System

Whisper stands tall as OpenAI's cutting-edge speech recognition solution, expertly honed with 680,000 hours of web-sourced multilingual and multitask data. This robust and versatile dataset cultivates exceptional resilience to accents, ambient noise, and technical terminology. Furthermore, it supports seamless transcription in various languages and translation into English. OpenAI graciously unveils models and codes, paving the way for ingenious developers to construct valuable applications that harness the remarkable potential of speech recognition.

How to Use Whisper

The Whisper model is available on GitHub. You can download it with the following command directly in the Jupyter Notebook:

!pip install git+https://github.com/openai/whisper.git

Whisper needs ffmpeg installed on the current machine to work. You may already have it installed, but it's likely your local machine needs this program installed first.

OpenAI refers to multiple ways to install this package, but we will be using the Scoop package manager. Here is a tutorial on how to do it manually.

In the Jupyter Notebook you can install it with the following command:

scoop install ffmpeg

After the installation, a restart is required if you are using your local machine. Now we can continue. Next, we import all necessary libraries:

import whisper

Using GPU for Whisper

Using a GPU is the preferred way to use Whisper. If you are using a local machine, you can check if you have a GPU available. The first line results False if a CUDA-compatible Nvidia GPU is not available and True if it is available. The second line of code sets the model to prefer GPU whenever it is available.

import torch
is_cuda = torch.cuda.is_available()
model = "base" if is_cuda else "tiny"

Loading the Whisper Model

Now we can load the Whisper model. The model is loaded with the following command:

model = whisper.load_model(model)

Please keep in mind that there are multiple different models available. You can find all of them here. Each one of them has tradeoffs between accuracy and speed (compute needed). We will use the 'base' model for this tutorial.

Transcribing Audio Files

Next, you need to load your audio file that you want to transcribe:

audio_file = "path_to_your_audio_file.wav"

Detecting Language

The detect_language function detects the language of your audio file:

language = model.detect_language(audio_file)

Transcribing First 30 Seconds

We transcribe the first 30 seconds of the audio using the DecodingOptions and the decode command:

result = model.transcribe(audio_file, max_length=30)
print(result["text"])

Transcribing the Whole Audio File

This will print out the whole audio file transcribed, after the execution has finished:

result_full = model.transcribe(audio_file)
print(result_full["text"])

Creating Your Own Whisper Application

Now it's up to you to create your own Whisper application. Get creative and have fun! I'm sure you will find a lot of useful applications for Whisper. The best way is to identify a problem around you and craft a solution to it. Maybe during our AI Hackathons?

Conclusion

With the power of OpenAI's Whisper, the possibilities for innovative developments in speech recognition technology are endless. Whether it's for transcribing meetings, creating accessible content, or developing multilingual communication tools, Whisper is set to revolutionize how we interact with audio data.

Reading next

Speaker identification process using OpenAI Whisper and Pyannote.
Illustration depicting OpenAI Whisper transcribing a YouTube video

Leave a comment

All comments are moderated before being published.

Trang web này được bảo vệ bằng hCaptcha. Ngoài ra, cũng áp dụng Chính sách quyền riêng tưĐiều khoản dịch vụ của hCaptcha.