OpenAI Whisper Tutorial: Create a Docker Container for Speech Recognit

Discover Whisper: OpenAI's Premier Speech Recognition System

Whisper is a groundbreaking speech recognition system developed by OpenAI. With an expansive training dataset comprising 680,000 hours of web-sourced multilingual and multitask data, Whisper showcases extraordinary accuracy and adaptability.

This unique dataset equips Whisper to excel in understanding various accents, filtering background noise, and interpreting technical jargon. Furthermore, it supports transcription in multiple languages, facilitating seamless translations into English. OpenAI provides accessible models and code for Whisper, creating an ideal foundation for innovative developers to build next-generation Whisper applications and significantly advance the speech recognition field.

Getting Started with Docker for Whisper

Before diving into running the Whisper API, you need to have Docker installed on your local machine. Follow the installation instructions here.

Step-by-Step Setup

Follow these steps to set up your Whisper API:

Create a folder for your files (you can name it whisper-api).
Create a file named requirements.txt and add the Flask dependency to it.
Create a file named Dockerfile.

Understanding the Dockerfile

In your Dockerfile, you'll need to add a sequence of commands:

FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt ./
RUN apt-get update && apt-get install -y git
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/openai/whisper.git
RUN apt-get install -y ffmpeg
EXPOSE 5000
CMD ["flask", "run", "--host=0.0.0.0"]

Here’s a breakdown:

FROM specifies the base image, using a slim Python image.
WORKDIR creates a working directory called /app.
COPY copies the requirements.txt file into the working directory.
RUN executes commands: updates the package manager, installs essential tools, and installs the required Python packages.
EXPOSE indicates the port your app will run on, and CMD runs the Flask server.

Creating the API Route

Next, you'll create the app.py file, where you will import required packages and initialize both Flask and Whisper.

from flask import Flask, request
import whisper

app = Flask(__name__)
model = whisper.load_model('base')

Now, let’s set up a route that accepts a POST request with an audio file:

@app.route('/whisper', methods=['POST'])
def transcribe():
    audio_file = request.files['file']
    result = model.transcribe(audio_file)
    return {'transcript': result['text']}

Running the Docker Container

To build and run your Docker container, follow these steps:

Open a terminal window and navigate to the folder containing your files.
Run the command to build the container:
```
docker build -t whisper-api .
```
Run the container with:
```
docker run -p 5000:5000 whisper-api
```

Testing the Whisper API

You can test the Whisper API by sending a POST request to http://localhost:5000/whisper with a file in it. Make sure the body is set to form-data. Here’s a sample curl command:

curl -X POST http://localhost:5000/whisper -F "file=@path_to_your_audio_file"

If everything goes well, you should receive a JSON object containing the transcript:

{"transcript": ""}

Deploying the API

Your Whisper API can be deployed on any platform that supports Docker. Remember, the current setup uses CPU for processing audio files. To leverage GPU capabilities, modifications to the Dockerfile are necessary. Check out references on using Docker with GPU.

Participate in AI Hackathons

Now that you've set up the Whisper API, why not apply your skills during our upcoming AI hackathons? Join a vibrant community focused on innovation and exploration!

Find the Complete Code Here

You can find the comprehensive code for this setup and more details on this project here.

OpenAI Whisper Tutorial: Create a Docker Container for Speech Recognition API