OpenAI Whisper Tutorial: Create OpenAI Whisper API in Docker

Discover Whisper: OpenAI's Premier Speech Recognition System

Whisper, developed by OpenAI, is an innovative speech recognition system that sets a new standard in the field of audio transcription. Leveraging an extraordinary dataset derived from 680,000 hours of multilingual and multitask audio, Whisper excels in understanding diverse accents, managing background noise, and processing technical jargon. This robust capability not only makes it useful across various domains but also supports effective transcription in several languages. Users can leverage Whisper’s functionality to create seamless translations into English, making it an invaluable tool for global communication.

Key Features of Whisper

Wide Language Support: Transcribes speech from various languages, ensuring inclusivity.
High Resilience: Handles diverse accents and technical terminologies effortlessly.
Open Source Accessibility: OpenAI provides public access to Whisper models and code, encouraging development and innovation.

How to Start with Docker

If you're eager to run the Whisper container on your local machine, the first step is to install Docker. Follow the provided installation instructions for your operating system.

Step-by-Step Setup Instructions:

Create a folder for your project, naming it whisper-api.
Create a requirements.txt file in this folder and include flask in it.
Next, establish a Dockerfile in the same folder. This file will contain the necessary instructions to build the container.

Understanding the Dockerfile

The Dockerfile will contain the following essential lines:

FROM python:3.10-slim
WORKDIR /python-docker
COPY requirements.txt .
RUN apt-get update && apt-get install -y git
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/openai/whisper.git
RUN apt-get install -y ffmpeg
EXPOSE 5000
CMD ["flask", "run", "--host=0.0.0.0"]

Here's what happens in the Dockerfile:

The base image python:3.10-slim is selected for a lightweight environment.
A working directory /python-docker is created for organizational purposes.
The requirements.txt file is copied into the working directory.
The package manager is updated, and git is installed.
Dependencies mentioned in the requirements.txt are installed.
The Whisper package is installed directly from GitHub.
ffmpeg is installed for audio file processing.
Port 5000 is exposed to run the Flask server.

Creating Your Route

Create an app.py file where you'll import the necessary packages, initializing the Flask app and Whisper. Here are the foundational lines to include:

from flask import Flask, request
import whisper

app = Flask(__name__)
model = whisper.load_model("base")

Next, you will need to develop a route to accept a POST request containing an audio file. Enhance your app.py file with the following lines:

@app.route('/whisper', methods=['POST'])
def transcribe():
    file = request.files['file']
    # Perform transcription here
    return {'transcript': result}

How to Run the Container?

To build and run your Docker container, navigate to your project folder in the terminal and execute the following commands:

docker build -t whisper-api .
docker run -p 5000:5000 whisper-api

Testing the API

Once your API is running, you can test it by sending a POST request to http://localhost:5000/whisper with an audio file included in the request body, formatted as form-data. Use the following curl command to test:

curl -X POST http://localhost:5000/whisper -F 'file=@path_to_your_audio_file'

Upon success, you should receive a JSON object containing the transcript of the audio file.

Deploying the API

You can deploy the Whisper API to any platform supporting Docker technology. Note that the current configuration employs the CPU for audio processing. To utilize GPU for enhanced performance, modifications to the Dockerfile will be necessary. However, this basic guide covers only the essentials.

Join the AI Revolution

What you’ve learned can be put to the test in upcoming AI Hackathons! Don't hesitate to engage with the developer community and innovate using the tools provided by OpenAI.

For the complete code and further resources, visit GitHub to explore and enhance your skills with Whisper.