In this example, we’ll build a simple app which transcribes YouTube videos using Whisper, a state-of-the-art model for speech recognition.

Setting up the environment

First, you’ll setup your compute environment. You’ll specify:

  • Compute requirements, including a GPU
  • Python and system-level packages to install in the runtime
app = App(
                "pytube @ git+",
            commands=["apt-get update && apt-get install -y ffmpeg"],

Transcribing YouTube Videos

We’ll write a basic function which takes in a YouTube video URL, uses the youtube_dl library to download the video as an Output, and runs the video through Whisper to generate a text transcript.


You’ll deploy the app by entering your shell, and running:

beam deploy

Your Beam Dashboard will open in a browser window, and you can monitor the deployment status in the web GUI.

Calling the API

You’ll call the API by pasting in the cURL command displayed in the browser window.

  curl -X POST --compressed "" \
   -H 'Authorization: [YOUR_AUTH_TOKEN]' \
   -H 'Content-Type: application/json' \
   -d '{"video_url": ""}'

The API will return a transcript with our video:

  "pred": " Welcome to the Pets Show. That is, Physics Explained in 10 Seconds. For the next month, in addition to Minute Physics, I'll be making one 10-second video every day. 10 Seconds of Physics explaining 5 seconds of titles on either end."