This guide will walk you through deploying and invoking a transcription API using the Faster Whisper model on Beam. The API can be invoked with either a URL to an .mp3 file or a base64-encoded audio file.
In your Python file, add the following code to define your endpoint and handle the transcription:
from beam import endpoint, Image, Volume, envimport base64import requestsfrom tempfile import NamedTemporaryFileBEAM_VOLUME_PATH ="./cached_models"# These packages will be installed in the remote containerif env.is_remote():from faster_whisper import WhisperModel, download_model# This runs once when the container first startsdefload_models(): model_path = download_model("large-v3", cache_dir=BEAM_VOLUME_PATH) model = WhisperModel(model_path, device="cuda", compute_type="float16")return model@endpoint( on_start=load_models, name="faster-whisper", cpu=2, memory="32Gi", gpu="A10G", image=Image( base_image="nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04", python_version="python3.10", python_packages=["git+https://github.com/SYSTRAN/faster-whisper.git"],), volumes=[ Volume( name="cached_models", mount_path=BEAM_VOLUME_PATH,)],)deftranscribe(context,**inputs):# Retrieve cached model from on_start model = context.on_start_value# Inputs passed to API language = inputs.get("language") audio_base64 = inputs.get("audio_file") url = inputs.get("url")if audio_base64 and url:return{"error":"Only a base64 audio file OR a URL can be passed to the API."}ifnot audio_base64 andnot url:return{"error":"Please provide either an audio file in base64 string format or a URL."} binary_data =Noneif audio_base64: binary_data = base64.b64decode(audio_base64.encode("utf-8"))elif url: resp = requests.get(url) binary_data = resp.content text =""with NamedTemporaryFile()as temp:try:# Write the audio data to the temporary file temp.write(binary_data) temp.flush() segments, _ = model.transcribe(temp.name, beam_size=5, language=language)for segment in segments: text += segment.text +" "print(text)return{"text": text}except Exception as e:return{"error":f"Something went wrong: {e}"}
You’ve successfully set up a highly performant serverless API for transcribing audio files using the Faster Whisper model on Beam. The API can handle both URLs to audio files and base64-encoded audio files. With the provided setup, you can easily serve, invoke, and develop your transcription API.