Import Beam modules

You’ll start by importing a Beam App and Runtime

  • App is the namespace for a project. You’ll give it a unique name as an identifier.
  • Inside the App is a Runtime. The Runtime is a definition of the hardware your container will run on.
from beam import App, Runtime

app = App(name="hello-beam", runtime=Runtime())

Multiply some numbers

This function multiplies two numbers.

In order to run it on Beam, we add an @app.run() decorator to the function.

@app.run()
def multiply_numbers():
    print("This is running remotely on Beam!")
    x = 43
    y = 177
    print(f"🔮 {x} * {y} is {x * y}")

Run it on Beam

To run the function, run this command. Be sure to add the name of your file below:

beam run your_file.py:multiply_numbers

When this is run, it will run on the cloud instead of your laptop. The function will be packaged into a container and shipped onto an instance with the compute requirements you’ve specified in Runtime(), and the logs will be streamed to your terminal.

In addition, your files in the working directory will be recursively synced to the remote environment, so you can access them while running your function.

Feel free to close your terminal window after running the command. The function will continue running asynchronously on Beam, and you can leave and return later to retrieve the task results.

Customize the container image

Now we’re going to build a more interesting example that scrapes Wikipedia, and saves the page links to a text file.

This example requires some extra Python libraries, so we’ll customize our container image with beautifulsoup4 and requests.

Inside your app’s Runtime() is where you’ll add an Image. An Image is used to customize the container image for your function.

We’ll use the Image to define the Python libraries we need, using the python_packages argument. You can also add shell commands, using the commands field, but we’ll get to that in another section.

Beam containers have two defaults to be aware of:

Default Container OS: Ubuntu 20.04

Default CUDA: CUDA 12.2

from beam import App, Runtime, Image, Output

app = App(
    name="web_scraper",
    runtime=Runtime(
        image=Image(
            python_packages=["requests", "beautifulsoup4"],
        ),
    ),
)

Save file outputs

This function will save the scraped Wikipedia links to a text file, so we need to save that file somewhere.

We’re now going to introduce another concept of Beam, which are task Outputs. Outputs let you save any files created while running your function.

For this example, we’ll save our output file as results.txt. A new file will be created each time this function is run.

@app.run(
    outputs=[Output(path="results.txt")],
)

Scraping logic

Here’s the actual application code that will scrape Wikipedia:

"""
These packages don't necessarily need to be installed locally.
They will be added in the container image defined below.
"""
from beam import App, Runtime, Image, Output

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin

app = App(
    name="web_scraper",
    runtime=Runtime(
        image=Image(
            python_packages=["requests", "beautifulsoup4"],
        ),
    ),
)


@app.run(outputs=[Output(path="results.txt")])
def scrape_wikipedia():
    url = "https://en.wikipedia.org/wiki/Main_Page"

    response = requests.get(url)
    soup = BeautifulSoup(response.text, "html.parser")

    for link in soup.find_all("a", href=True):
        absolute_link = urljoin(url, link["href"])
        with open("results.txt", "a") as file:
            print(f"Found link: {absolute_link}")
            file.write(absolute_link + "\n")

To run the function, run this command. Be sure to customize it with whatever you named your file:

beam run your_file.py:scrape_wikipedia

You’ll see the scraped page links printed to your terminal, but remember - you can close your laptop and return to this later. Beam has a web dashboard which you can use to view the logs and retrieve outputs from asynchronous functions.

Run it on a schedule

You might want to run this function on a schedule instead. Let’s add a schedule() decorator to the app, which will run this function every hour.

@app.schedule(
    when="every 1h",
    outputs=[Output(path="results.txt")],
)

To deploy the scheduled job, enter your shell and run:

beam deploy your_file.py:scrape_wikipedia 

This function will now run once an hour.

Retrieve task outputs

This task produces Outputs which we’ll want to retrieve when the task has finished running.

For this example, we’ll grab the outputs using the /v1/task/{task_id}/status/ API below. Make sure to replace the TASK_ID variable in the request URL with the ID created by your task.

You can find the Task ID in the shell, after running your task. Or, in the web dashboard, by clicking App -> Runs -> a specific run

This request returns a url to the generated text file in the outputs object.

Setup task callbacks

You can also add a callback_url argument to recieve notifications when your tasks finish running. Each time a task runs, a POST request will be fired to the URL provided.

@app.schedule(
    when="every 1h",
    outputs=[Output(path="results.txt")],
    callback_url="https://your-server.io/beam-task-complete"
)

Further Reading

Was this page helpful?