API

OpenAI Whisper Tutorial: Build a Speech Recognition API in Docker

OpenAI Whisper Speech Recognition API diagram for developers

Discover Whisper: OpenAI's Premier Speech Recognition System

Whisper is a groundbreaking speech recognition system developed by OpenAI, designed to revolutionize how we interact with technology using our voices. With a training dataset comprising 680,000 hours of multilingual and multitask data sourced from the web, Whisper stands out for its remarkable ability to adapt to various accents, background noises, and technical jargon.

Key Features of Whisper

  • Multilingual Support: Whisper can transcribe and translate spoken language into English, making it a highly versatile tool for users around the globe.
  • Robust Performance: The system excels in challenging audio conditions, ensuring high accuracy even in noisy environments.
  • Developer-Friendly: OpenAI offers access to Whisper's models and code, empowering developers to create innovative applications that leverage this advanced speech recognition technology.

How to Get Started with Docker

If you're considering running Whisper on your local machine, the first step is to install Docker. This software allows you to create isolated environments for your applications.

Setting Up Your Project

  1. Create a folder for your files, naming it whisper-api.
  2. Within this folder, create a file called requirements.txt and add flask as a dependency.
  3. Create another file named Dockerfile to configure your Docker environment.

Building the Dockerfile

Your Dockerfile should contain the following instructions:

FROM python:3.10-slim
WORKDIR /python-docker
COPY requirements.txt .
RUN apt-get update && apt-get install -y git
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/openai/whisper.git
RUN apt-get install -y ffmpeg
EXPOSE 5000
CMD ["flask", "run"]

Understanding the Dockerfile

Here’s a breakdown of what each line does:

  • FROM python:3.10-slim: Sets the base image for your container.
  • WORKDIR /python-docker: Creates and sets a working directory within the container.
  • COPY requirements.txt .: Copies your requirements file into the Docker environment.
  • RUN apt-get update && apt-get install -y git: Updates the package manager and installs Git for version control.
  • RUN pip install -r requirements.txt: Installs the dependencies listed in the requirements file.
  • RUN pip install git+https://github.com/openai/whisper.git: Installs the Whisper package directly from GitHub.
  • RUN apt-get install -y ffmpeg: Installs FFmpeg, a powerful multimedia framework for processing audio and video files.
  • EXPOSE 5000: Exposes port 5000 for accessing the Flask server.
  • CMD ["flask", "run"]: Starts the Flask application when the container runs.

Creating Your API Route

Next, create a file named app.py where you will import the necessary packages and initialize both the Flask app and Whisper:

from flask import Flask, request
import whisper

app = Flask(__name__)
model = whisper.load_model("base")

Then, create a route to accept POST requests with an audio file:

@app.route('/whisper', methods=['POST'])
def transcribe():
    file = request.files['file']
    audio = whisper.load_audio(file)
    result = model.transcribe(audio)
    return {'transcript': result['text']}

Running the Docker Container

To build and run your container, open a terminal and navigate to your project folder. Execute the following commands:

# Build the container
$ docker build -t whisper-api .
# Run the container
$ docker run -p 5000:5000 whisper-api

Testing Your API

You can test the API by sending a POST request to http://localhost:5000/whisper with a file in it. Ensure the body of the request is form-data. Use this curl command for testing:

curl -X POST -F "file=@path_to_your_file" http://localhost:5000/whisper

If everything is set up correctly, you should receive a JSON response containing the transcript of the audio file.

Deploying the API

This API can be deployed on any platform that supports Docker. Remember, the current setup utilizes CPU for processing audio files. To leverage a GPU, you will need to adjust your Dockerfile to share the GPU resources. For more details on this, refer to the official NVIDIA documentation.

Participate in Upcoming AI Hackathons

What better way to utilize your newfound skills than by joining an AI hackathon? Engage with the community and explore real-world applications of the technologies you’re learning!

Explore the Complete Code

You can find the full code repository here.

Reading next

Illustration of a person using AI tools for SEO strategy
Infographic on prompt engineering for AI image generation using Craiyon.

Leave a comment

All comments are moderated before being published.

This site is protected by hCaptcha and the hCaptcha Privacy Policy and Terms of Service apply.