Faster Whisper
This guide will walk you through deploying and invoking a transcription API using the Faster Whisper model on Beam. The API can be invoked with either a URL to an .mp3
file or a base64-encoded audio file.
View the Code
See the code for this example on Github.
Initial Setup
In your Python file, add the following code to define your endpoint and handle the transcription:
from beam import endpoint, Image, Volume, env
import base64
import requests
from tempfile import NamedTemporaryFile
BEAM_VOLUME_PATH = "./cached_models"
# These packages will be installed in the remote container
if env.is_remote():
from faster_whisper import WhisperModel, download_model
# This runs once when the container first starts
def load_models():
model_path = download_model("large-v3", cache_dir=BEAM_VOLUME_PATH)
model = WhisperModel(model_path, device="cuda", compute_type="float16")
return model
@endpoint(
on_start=load_models,
name="faster-whisper",
cpu=2,
memory="32Gi",
gpu="A10G",
image=Image(
base_image="nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04",
python_version="python3.10",
python_packages=["git+https://github.com/SYSTRAN/faster-whisper.git"],
),
volumes=[
Volume(
name="cached_models",
mount_path=BEAM_VOLUME_PATH,
)
],
)
def transcribe(context, **inputs):
# Retrieve cached model from on_start
model = context.on_start_value
# Inputs passed to API
language = inputs.get("language")
audio_base64 = inputs.get("audio_file")
url = inputs.get("url")
if audio_base64 and url:
return {"error": "Only a base64 audio file OR a URL can be passed to the API."}
if not audio_base64 and not url:
return {
"error": "Please provide either an audio file in base64 string format or a URL."
}
binary_data = None
if audio_base64:
binary_data = base64.b64decode(audio_base64.encode("utf-8"))
elif url:
resp = requests.get(url)
binary_data = resp.content
text = ""
with NamedTemporaryFile() as temp:
try:
# Write the audio data to the temporary file
temp.write(binary_data)
temp.flush()
segments, _ = model.transcribe(temp.name, beam_size=5, language=language)
for segment in segments:
text += segment.text + " "
print(text)
return {"text": text}
except Exception as e:
return {"error": f"Something went wrong: {e}"}
Serving the API
In your shell, serve the API by running:
beam serve app.py:transcribe
This command will:
- Spin up a container.
- Run it with the specified CPU, memory, and GPU resources.
- Sync your local files to the remote container.
- Print a cURL request to invoke the API.
- Stream logs to your shell.
Invoking the API
Once the API is running, you can invoke it with a URL to an .mp3 file using the following cURL command:
curl -X POST 'https://app.beam.cloud/endpoint/id/[YOUR-ENDPOINT-ID]' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR-AUTH-TOKEN]' \
-d '{"url":"http://commondatastorage.googleapis.com/codeskulptor-demos/DDR_assets/Kangaroo_MusiQue_-_The_Neverwritten_Role_Playing_Game.mp3"}'
Replace [YOUR-ENDPOINT-ID] with your actual endpoint ID and [YOUR-AUTH-TOKEN] with your authentication token.
Summary
You’ve successfully set up a highly performant serverless API for transcribing audio files using the Faster Whisper model on Beam. The API can handle both URLs to audio files and base64-encoded audio files. With the provided setup, you can easily serve, invoke, and develop your transcription API.
Was this page helpful?