This guide will walk you through deploying and invoking a transcription API using the Faster Whisper model on Beam. The API can be invoked with either a URL to an .mp3 file or a base64-encoded audio file.
In your Python file, add the following code to define your endpoint and handle the transcription:
app.py
from beam import endpoint, Image, Volume, envimport base64import requestsfrom tempfile import NamedTemporaryFileBEAM_VOLUME_PATH = "./cached_models"# These packages will be installed in the remote containerif env.is_remote(): from faster_whisper import WhisperModel, download_model# This runs once when the container first startsdef load_models(): model_path = download_model("large-v3", cache_dir=BEAM_VOLUME_PATH) model = WhisperModel(model_path, device="cuda", compute_type="float16") return model@endpoint( on_start=load_models, name="faster-whisper", cpu=2, memory="32Gi", gpu="A10G", image=Image( base_image="nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04", python_version="python3.10", ) .add_python_packages(["git+https://github.com/SYSTRAN/faster-whisper.git", "huggingface_hub[hf-transfer]"]) .with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"), volumes=[ Volume( name="cached_models", mount_path=BEAM_VOLUME_PATH, ) ],)def transcribe(context, **inputs): # Retrieve cached model from on_start model = context.on_start_value # Inputs passed to API language = inputs.get("language") audio_base64 = inputs.get("audio_file") url = inputs.get("url") if audio_base64 and url: return {"error": "Only a base64 audio file OR a URL can be passed to the API."} if not audio_base64 and not url: return { "error": "Please provide either an audio file in base64 string format or a URL." } binary_data = None if audio_base64: binary_data = base64.b64decode(audio_base64.encode("utf-8")) elif url: resp = requests.get(url) binary_data = resp.content text = "" with NamedTemporaryFile() as temp: try: # Write the audio data to the temporary file temp.write(binary_data) temp.flush() segments, _ = model.transcribe(temp.name, beam_size=5, language=language) for segment in segments: text += segment.text + " " print(text) return {"text": text} except Exception as e: return {"error": f"Something went wrong: {e}"}
You’ve successfully set up a highly performant serverless API for transcribing audio files using the Faster Whisper model on Beam. The API can handle both URLs to audio files and base64-encoded audio files. With the provided setup, you can easily serve, invoke, and develop your transcription API.