Basic Quickstart
Let’s jump into a simple example of a program running on Beam.
Import Beam modules
You’ll start by importing a Beam App
and Runtime
App
is the namespace for a project. You’ll give it a unique name as an identifier.- Inside the
App
is aRuntime
. TheRuntime
is a definition of the hardware your container will run on.
from beam import App, Runtime
app = App(name="hello-beam", runtime=Runtime())
Multiply some numbers
This function multiplies two numbers.
In order to run it on Beam, we add an @app.run()
decorator to the function.
@app.run()
def multiply_numbers():
print("This is running remotely on Beam!")
x = 43
y = 177
print(f"🔮 {x} * {y} is {x * y}")
Run it on Beam
To run the function, run this command. Be sure to add the name of your file below:
beam run your_file.py:multiply_numbers
When this is run, it will run on the cloud instead of your laptop.
The function will be packaged into a container and shipped onto an instance
with the compute requirements you’ve specified in Runtime()
, and the logs will be streamed to your terminal.
In addition, your files in the working directory will be recursively synced to the remote environment, so you can access them while running your function.
Feel free to close your terminal window after running the command. The function will continue running asynchronously on Beam, and you can leave and return later to retrieve the task results.
Customize the container image
Now we’re going to build a more interesting example that scrapes Wikipedia, and saves the page links to a text file.
This example requires some extra Python libraries, so we’ll customize our container image with beautifulsoup4
and requests
.
Inside your app’s Runtime()
is where you’ll add an Image
. An Image
is used to customize the container image for your function.
We’ll use the Image
to define the Python libraries we need, using the python_packages
argument. You can also add shell commands, using the commands
field, but we’ll get to that in another section.
Beam containers have two defaults to be aware of:
Default Container OS: Ubuntu 20.04
Default CUDA: CUDA 12.2
from beam import App, Runtime, Image, Output
app = App(
name="web_scraper",
runtime=Runtime(
image=Image(
python_packages=["requests", "beautifulsoup4"],
),
),
)
Save file outputs
This function will save the scraped Wikipedia links to a text file, so we need to save that file somewhere.
We’re now going to introduce another concept of Beam, which are task Outputs
. Outputs let you save any files created while running your function.
For this example, we’ll save our output file as results.txt
. A new file will be created each time this function is run.
@app.run(
outputs=[Output(path="results.txt")],
)
Scraping logic
Here’s the actual application code that will scrape Wikipedia:
"""
These packages don't necessarily need to be installed locally.
They will be added in the container image defined below.
"""
from beam import App, Runtime, Image, Output
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
app = App(
name="web_scraper",
runtime=Runtime(
image=Image(
python_packages=["requests", "beautifulsoup4"],
),
),
)
@app.run(outputs=[Output(path="results.txt")])
def scrape_wikipedia():
url = "https://en.wikipedia.org/wiki/Main_Page"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
for link in soup.find_all("a", href=True):
absolute_link = urljoin(url, link["href"])
with open("results.txt", "a") as file:
print(f"Found link: {absolute_link}")
file.write(absolute_link + "\n")
To run the function, run this command. Be sure to customize it with whatever you named your file:
beam run your_file.py:scrape_wikipedia
You’ll see the scraped page links printed to your terminal, but remember - you can close your laptop and return to this later. Beam has a web dashboard which you can use to view the logs and retrieve outputs from asynchronous functions.
Run it on a schedule
You might want to run this function on a schedule instead. Let’s add a schedule()
decorator to the app, which will run this function every hour.
@app.schedule(
when="every 1h",
outputs=[Output(path="results.txt")],
)
To deploy the scheduled job, enter your shell and run:
beam deploy your_file.py:scrape_wikipedia
This function will now run once an hour.
Retrieve task outputs
This task produces Outputs
which we’ll want to retrieve when the task has finished running.
For this example, we’ll grab the outputs using the /v1/task/{task_id}/status/
API below. Make sure to replace the TASK_ID
variable in the request URL with the ID created by your task.
You can find the Task ID in the shell, after running your task. Or, in the web dashboard, by clicking App
-> Runs
-> a specific run
This request returns a url
to the generated text file in the outputs
object.
Setup task callbacks
You can also add a callback_url
argument to recieve notifications when your tasks finish running. Each time a task runs, a POST
request will be fired to the URL provided.
@app.schedule(
when="every 1h",
outputs=[Output(path="results.txt")],
callback_url="https://your-server.io/beam-task-complete"
)
Further Reading
Was this page helpful?