Faster Whisper

This guide will walk you through deploying and invoking a transcription API using the Faster Whisper model on Beam. The API can be invoked with either a URL to an .mp3 file or a base64-encoded audio file.

View the Code

See the code for this example on Github.

Initial Setup

In your Python file, add the following code to define your endpoint and handle the transcription:

app.py

from beam import endpoint, Image, Volume, env
import base64
import requests
from tempfile import NamedTemporaryFile

BEAM_VOLUME_PATH = "./cached_models"

# These packages will be installed in the remote container
if env.is_remote():
    from faster_whisper import WhisperModel, download_model

# This runs once when the container first starts
def load_models():
    model_path = download_model("large-v3", cache_dir=BEAM_VOLUME_PATH)
    model = WhisperModel(model_path, device="cuda", compute_type="float16")
    return model

@endpoint(
    on_start=load_models,
    name="faster-whisper",
    cpu=2,
    memory="32Gi",
    gpu="A10G",
    image=Image(
        base_image="nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04",
        python_version="python3.10",
    )
    .add_python_packages(["git+https://github.com/SYSTRAN/faster-whisper.git", "huggingface_hub[hf-transfer]"])
    .with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"),
    volumes=[
        Volume(
            name="cached_models",
            mount_path=BEAM_VOLUME_PATH,
        )
    ],
)
def transcribe(context, **inputs):
    # Retrieve cached model from on_start
    model = context.on_start_value

    # Inputs passed to API
    language = inputs.get("language")
    audio_base64 = inputs.get("audio_file")
    url = inputs.get("url")

    if audio_base64 and url:
        return {"error": "Only a base64 audio file OR a URL can be passed to the API."}
    if not audio_base64 and not url:
        return {
            "error": "Please provide either an audio file in base64 string format or a URL."
        }

    binary_data = None

    if audio_base64:
        binary_data = base64.b64decode(audio_base64.encode("utf-8"))
    elif url:
        resp = requests.get(url)
        binary_data = resp.content

    text = ""

    with NamedTemporaryFile() as temp:
        try:
            # Write the audio data to the temporary file
            temp.write(binary_data)
            temp.flush()

            segments, _ = model.transcribe(temp.name, beam_size=5, language=language)

            for segment in segments:
                text += segment.text + " "

            print(text)
            return {"text": text}

        except Exception as e:
            return {"error": f"Something went wrong: {e}"}

Deployment

To deploy the app, run the following command:

If you named your file something different than app.py, make sure to customize the command with your correct file name.

beam deploy app.py:transcribe

This command will deploy your app as a web endpoint. The endpoint URL will be printed out in the shell.

Invoking the API

Once the API is running, you can invoke it with a URL to an .mp3 file using the following cURL command:

If you want to test with sample .mp3 files, you can find many samples on this website.

curl -X POST 'https://faster-whisper-7157fd0-v1.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR-AUTH-TOKEN]' \
-d '{"url":"https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3"}'

Replace the URL with the URL printed in your shell, and [YOUR-AUTH-TOKEN] with your authentication token.

Summary

You’ve successfully set up a highly performant serverless API for transcribing audio files using the Faster Whisper model on Beam. The API can handle both URLs to audio files and base64-encoded audio files. With the provided setup, you can easily serve, invoke, and develop your transcription API.

Large Language Models (LLMs)

Image and Video

Audio and Transcription

Web Apps

Agents

Fine-Tuning

Faster Whisper

View the Code

Initial Setup

Deployment

Invoking the API

Summary

Large Language Models (LLMs)

Image and Video

Audio and Transcription

Web Apps

Agents

Fine-Tuning

View the Code

​Initial Setup

​Deployment

​Invoking the API

​Summary

Initial Setup

Deployment

Invoking the API

Summary