Whisper Video Transcription

In this example, we’ll build a simple app which transcribes YouTube videos using Whisper, a state-of-the-art model for speech recognition.

Setting up the environment

First, you’ll setup your compute environment. You’ll specify:

Compute requirements, including a GPU
Python and system-level packages to install in the runtime

app.py
app = App(
    name="whisper-example",
    runtime=Runtime(
        cpu=2,
        memory="16Gi",
        gpu="A10G",
        image=Image(
            python_version="python3.8",
            python_packages=[
                "git+https://github.com/openai/whisper.git",
                "pytube @ git+https://github.com/felipeucelli/pytube@03d72641191ced9d92f31f94f38cfb18c76cfb05",
            ],
            commands=["apt-get update && apt-get install -y ffmpeg"],
        ),
    ),
)

Transcribing YouTube Videos

We’ll write a basic function which takes in a YouTube video URL, uses the youtube_dl library to download the video as an Output, and runs the video through Whisper to generate a text transcript.

run.py
import whisper
import youtube_dl


def transcribe(**inputs):
    # Grab the video URL passed from the API
    video_url = inputs["video_url"]

    # Create YouTube object
    yt = YouTube(video_url)
    video = yt.streams.filter(only_audio=True).first()

    # Download audio to the output path
    out_file = video.download(output_path="./")
    base, ext = os.path.splitext(out_file)
    new_file = base + ".mp3"
    os.rename(out_file, new_file)
    a = new_file

    # Load Whisper and transcribe audio
    model = whisper.load_model("small")
    result = model.transcribe(a)

    print(result["text"])
    return {"pred": result["text"]}


if __name__ == "__main__":
    video_url = "https://www.youtube.com/watch?v=adJFT6_j9Uk&ab_channel=minutephysics"
    transcribe(video_url=video_url)

Deployment

You’ll deploy the app by entering your shell, and running:

beam deploy app.py

Your Beam Dashboard will open in a browser window, and you can monitor the deployment status in the web GUI.

Calling the API

You’ll call the API by pasting in the cURL command displayed in the browser window.

  curl -X POST --compressed "https://apps.beam.cloud/77n0o" \
   -H 'Authorization: [YOUR_AUTH_TOKEN]' \
   -H 'Content-Type: application/json' \
   -d '{"video_url": "https://www.youtube.com/watch?v=adJFT6_j9Uk"}'

The API will return a transcript with our video:

{
  "pred": " Welcome to the Pets Show. That is, Physics Explained in 10 Seconds. For the next month, in addition to Minute Physics, I'll be making one 10-second video every day. 10 Seconds of Physics explaining 5 seconds of titles on either end."
}

Getting Started

Customizing the Environment

Managing Data

Endpoints and Web Servers

Task Queues

Serverless Functions

Autoscaling and Concurrency

Other Topics

Self-Hosting

Resources

Security

Whisper Video Transcription

Setting up the environment

Transcribing YouTube Videos

Deployment

Calling the API

Getting Started

Customizing the Environment

Managing Data

Endpoints and Web Servers

Task Queues

Serverless Functions

Autoscaling and Concurrency

Other Topics

Self-Hosting

Resources

Security

​Setting up the environment

​Transcribing YouTube Videos

​Deployment

​Calling the API

Setting up the environment

Transcribing YouTube Videos

Deployment

Calling the API