# Building AI Agents
Source: https://docs.beam.cloud/v2/agents/introduction
Beam is launching a new type of agent framework that is stateful and has concurrency built-in.
• Multi-task
• Automatically synchronize state
• Run each task in an isolated environment • Scale to 100s of GPUs
• Sandboxed compute environments
• Concurrency
• Task management and queuing
• Edge deployment and autoscaling
• Authentication • Lots of GPUs
## Introduction
Today, most agent frameworks are based on graph DAGs. While useful for simple tasks, this limits you to performing one action at a time (i.e. using one tool at a time).
Beam uses a new agentic concurrency model, based on [*petri nets*](https://en.wikipedia.org/wiki/Petri_net), which are capable of multi-tasking complex, multi-threaded workflows.
By combining this agent framework with Beam's cloud compute, you can build powerful, parallelized applications.
## Core Concepts
Our agent framework has three important components: locations, transitions, and markers.
For this example, suppose we're modeling an eCommerce store.
* **Locations** -- these are specific *states* or *conditions* that hold tokens. For example, `in_shopping_cart` or `in_queue`.
* **Transitions** -- events or actions that cause state changes. For example `place_order` or `accept_payment`. Each transition has:
* **Inputs** -- the locations the data is consumed from.
* **Outputs** -- the locations the data is sent.
* **Markers** -- markers are *data types*. For example, `order_12345_red_shoes`.
Think of a Petri net as a factory assembly line, where parts (markers) move
between workstations (locations), and tasks (transitions) are performed when
all required parts are in place.
## Initial Setup
Let's setup a simple `hello world` chatbot. This chatbot will respond to messages from a user. It will ask the user for their name, and attempt to update the status of their order.
### Pre-requisites
* A free Beam account, and [Beam installed on your computer](/v2/getting-started/installation)
* An [OpenAI API key](https://platform.openai.com/api-keys)
### Hello World
We'll start by creating a `bot`, a `transition`, and initial state `markers`:
```python theme={null}
from beam import Bot
# Create the bot -- make sure to add your own OpenAI API key
bot = Bot(
model="gpt-4o",
api_key="YOUR_OPENAI_API_KEY",
locations=[],
description="A simple bot to cancel orders.",
)
```
## Managing State
Now we'll add `locations` and `markers`, which represent *state*.
```python app.py {2,6-11,19-20} theme={null}
from beam import Bot, BotContext, BotLocation
from pydantic import BaseModel
# Marker states for the bot
class UserName(BaseModel):
name: str
class OrderStatus(BaseModel):
message: str
# Create the bot -- make sure to add your own OpenAI API key
bot = Bot(
model="gpt-4o",
api_key="YOUR_OPENAI_API_KEY",
locations=[
BotLocation(marker=UserName),
BotLocation(marker=OrderStatus),
],
description="A simple bot to cancel orders.",
)
```
## Adding Transitions
Let's add our first **transition**. A transition is a state change. It takes our `UserName` location and returns an `OrderStatus` location.
```python app.py {27-33} theme={null}
from beam import Bot, BotContext, BotLocation
from pydantic import BaseModel
# Marker states for the bot
class UserName(BaseModel):
name: str
class OrderStatus(BaseModel):
message: str
# Create the bot -- make sure to add your own OpenAI API key
bot = Bot(
model="gpt-4o",
api_key="YOUR_OPENAI_API_KEY",
locations=[
BotLocation(marker=UserName),
BotLocation(marker=OrderStatus),
],
description="A simple bot to cancel orders.",
)
# This transition prompts the user for their name and cancels their orders
@bot.transition(
cpu=1,
memory=128,
inputs={UserName: 1},
outputs=[OrderStatus],
description="Cancels a users order.",
)
```
## Interacting with User Input
Let's add basic logic in the transition. We'll accept a username, and update a dict with the user's order status.
### Adding Prompts
We'll introduce a new concept, called `context`, which is a class that provides various convenience methods for your bot.
```python app.py {14-15, 37-46} theme={null}
from beam import Bot, BotContext, BotLocation
from pydantic import BaseModel
# Marker states for the bot
class UserName(BaseModel):
name: str
class OrderStatus(BaseModel):
message: str
# Hardcoded user data (mock database)
USER_DATA = {"Alice": "processing", "Bob": "shipped"}
# Create the bot -- make sure to add your own OpenAI API key
bot = Bot(
model="gpt-4o",
api_key="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",
locations=[
BotLocation(marker=UserName),
BotLocation(marker=OrderStatus),
],
description="A simple bot to cancel orders.",
)
# This transition prompts the user for their name and cancels their orders
@bot.transition(
cpu=1,
memory=128,
inputs={UserName: 1},
outputs=[OrderStatus],
description="Cancels a users order.",
)
def cancel_order(context: BotContext, inputs):
# Get the name provided by the user
user_name = inputs[UserName][0].name
# Update the user's order status
USER_DATA[user_name] = "cancelled"
# Send a message in the chat
context.say(f"Order cancelled for {user_name}")
# Return a marker state
return {OrderStatus: [OrderStatus(message="order_cancelled")]}
```
### Human-in-the-loop
We can also add a confirmation prompt, so that user input is required before the bot can proceed to the next step.
Let's add the `confirm` flag to the transition:
```python app.py {48-51} theme={null}
@bot.transition(
cpu=1,
memory=128,
inputs={UserName: 1},
outputs=[OrderStatus],
description="Cancels a users order.",
confirm=True
)
```
The bot will only proceed if the user confirms the request.
### Adding Transitions
Let's add a second transition, which will issue a refund to the user after they cancel their order.
This transition will fire when an `OrderStatus` marker is created.
```python theme={null}
class RefundStatus(BaseModel):
message: str
@bot.transition(
cpu=1,
memory=128,
inputs={OrderStatus: 1},
outputs=[RefundStatus],
description="Offers a refund to the user after cancelling their order.",
expose=False, # The bot won't take this into account when asking the user for input
)
def offer_refund(context: BotContext, inputs):
# Process the refund (mock logic)
refund_message = f"Your refund for the cancelled order has been processed. You should see it in your account within 3-5 business days."
# Send a message in the chat
context.say(refund_message)
```
You'll notice that we're not returning a marker, because this transition marks the end of our workflow. After this transition runs, there's no state left to update.
## Advanced Usage
### Controlling Bot Awareness
Based on the [system prompt](https://github.com/beam-cloud/beta9/blob/main/pkg/abstractions/experimental/bot/prompt.yaml), the bot automatically knows about all the locations and transitions defined in the network. This means that the bot will understand its role based on the data you add to your transitions.
However, you might not want the bot to know about certain transitions or locations!
If you want certain things hidden from the bot's context, you can pass
`expose=false` to locations and transitions.
Think of hidden transitions as 'backstage actions' -- users can still interact with them, but the bot won't take it into account in its reasoning.
```python {7} theme={null}
@bot.transition(
cpu=1,
memory=128,
inputs={OrderStatus: 1},
outputs=[RefundStatus],
description="Offers a refund to the user after cancelling their order.",
expose=False, # This prevents the bot from using this transition in its reasoning
)
```
### Using Context Commands
We provide a number of helper commands using a class called `context`.
Context variables can be used for prompting the user for input, creating blocking requests to the bot, and sending message to the user.
***Available Commands***
| Method | Description |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `context.confirm()` | Pause a transition until a user says yes or no. |
| `context.prompt()` | Send a blocking or non-blocking request to the model (e.g., "summarize these reviews"). You can pass an optional `wait_for_response=False` boolean to make this non-blocking. |
| `context.remember()` | Add an arbitrary JSON-serializable object to the conversation memory. |
| `context.say()` | Output text to the user's chat window. |
| `context.send_file()` | Send a file to the user created during a transition. |
| `context.get_file()` | Retrieve a file from the user during a transition. |
## Development Workflow
### Testing
We'll start by running the bot from our shell, as a temporary development server.
```sh theme={null}
$ beam serve app.py:bot
```
This command will spin up a container in the cloud for the bot `transition`, and create an interactive dialogue in your shell.
```sh theme={null}
=> Building image
=> Using cached image
=> Syncing files
=> Files synced
=> Invocation details
websocat 'wss://979563f1-b569-4f2c-8113-dc6ebca007d1.app.beam.cloud'
=> Session started: 062a4f
=> Chat with your bot below...
```
You can interact with your bot by typing into the shell. In your shell, you'll see responses from the bot, as well as event logs from each transition that fires.
```sh theme={null}
{
"type": "agent_message",
"value": "Hello! How can I assist you today? If you would like to cancel an order, please provide the user's name to get started.",
"metadata": {
"request_id": "b2f58de9-7c93-4eda-9a8d-fc42d4e40561",
"session_id": "062a4f"
}
}
# hi - please cancel alice's order
#
{
"type": "agent_message",
"value": "I've noted the request to cancel Alice's order. If there's anything else you need, just let me know!",
"metadata": {
"request_id": "93c0f077-b663-46a0-b110-d63376c8821f",
"session_id": "062a4f"
}
}
{
"type": "transition_fired",
"value": "cancel_order",
"metadata": {
"session_id": "062a4f",
"task_id": "bf45f661-daaa-4d70-83f1-0479587fafe9",
"transition_name": "cancel_order"
}
}
{
"type": "transition_started",
"value": "cancel_order",
"metadata": {
"session_id": "062a4f",
"task_id": "bf45f661-daaa-4d70-83f1-0479587fafe9",
"transition_name": "cancel_order"
}
}
{
"type": "agent_message",
"value": "\u2705 Order cancelled for alice",
"metadata": {
"session_id": "062a4f",
"task_id": "bf45f661-daaa-4d70-83f1-0479587fafe9",
"transition_name": "cancel_order"
}
}
{
"type": "transition_completed",
"value": "cancel_order",
"metadata": {
"session_id": "062a4f",
"task_id": "bf45f661-daaa-4d70-83f1-0479587fafe9",
"transition_name": "cancel_order"
}
}
```
### Deployment
```
$ beam deploy app.py:bot --name order-bot
```
You can login to the Beam Dashboard and use the web UI to chat with your bot, view the network graph, and view the event logs for each task.
## Creating Public Chatbots
You can also create sharable pages for your chatbot by adding `authorized=False` to your `bot`:
```python app.py {11} theme={null}
from beam import Bot, BotContext, BotLocation
bot = Bot(
model="gpt-4o",
api_key="YOUR_OPENAI_API_KEY",
locations=[
BotLocation(marker=UserName),
BotLocation(marker=OrderStatus),
],
description="A simple bot to cancel orders.",
authorized=False,
)
```
When deployed, you can access a public URL for your bot, which looks like this:
# Example: Research Assistant
Source: https://docs.beam.cloud/v2/agents/synchronization
Beam's agent framework is designed for concurrency and synchronization. In this example, we'll show how you can deploy an app that scrapes online product reviews.
You can follow along with the tutorial in the video below.
## Why Beam?
Beam's Petri Net framework is ideal for workflows that require concurrency and scalability. This app uses Beam to:
* **Retrieve Google Shopping URLs** for a product name you provide to the bot.
* **Scrape review pages** for those products.
* **Summarize reviews** into a report.
## Pre-requisites
You'll need three API keys to run the example below:
* [Firecrawl API key](https://docs.firecrawl.dev/introduction) (free), used for scraping product pages
* [SerpApi API key](https://serpapi.com/) (free for 100 searches a month), used to retrieve Google Shopping URLs
* [OpenAI API Key](https://platform.openai.com/docs/quickstart)
Set up your environment variables by adding these keys to a `.env` file in your project directory.
```
OPEN_AI_API_KEY=your_openai_api_key
SERPAPI_API_KEY=your_serpapi_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
```
## Setup
### Defining Locations
Locations represent the states of data flowing through the network. In this app, we'll use three states:
* **ProductName**: The product to search for (i.e. "headphones")
* **URL**: URLs of product pages retrieved from Google Shopping
* **ReviewPage**: Online product pages with customer reviews
Define these locations in your code:
```python theme={null}
from pydantic import BaseModel
class ProductName(BaseModel):
product_name: str
class URL(BaseModel):
url: str
class ReviewPage(BaseModel):
review_page: str
```
### Create the Bot
Let's setup the bot, which is what manages the workflow. Add your API keys and define the locations (states) it will manage.
```python theme={null}
from beam import Bot, BotLocation
bot = Bot(
model="gpt-4o",
api_key=OPEN_AI_API_KEY,
locations=[
BotLocation(marker=ProductName),
BotLocation(marker=URL, expose=False),
BotLocation(marker=ReviewPage, expose=False),
],
description="This bot will take a product category as input, search for reviews, and summarize them.",
)
```
## Adding Transitions
Transitions are events or actions in your bot, triggered by changes to the locations (state).
### Retrieve Product URLs
The first transition takes a product category (e.g., "headphones") and uses SerpAPI to retrieve Google Shopping URLs for the product.
```python theme={null}
from beam import Image
from serpapi import GoogleSearch
@bot.transition(
inputs={ProductName: 1},
outputs=[URL],
description="Retrieve Google Shopping results for a product.",
cpu=1,
memory=128,
image=Image(python_packages=["serpapi", "python-dotenv"]),
)
def get_product_urls(context, inputs):
product_name = inputs[ProductName][0].product_name
params = {
"engine": "google_shopping",
"q": product_name,
"api_key": SERPAPI_API_KEY,
}
search = GoogleSearch(params)
results = search.get_dict()
urls = results["shopping_results"][:3]
return {URL: [URL(url=url["product_link"]) for url in urls]}
```
### Scrape Review Pages
The second transition scrapes review pages from each product URL using Firecrawl.
```python theme={null}
from firecrawl import FirecrawlApp
import json
@bot.transition(
inputs={URL: 1},
outputs=[ReviewPage],
description="Scrape review pages for product URLs.",
cpu=1,
memory=128,
image=Image(python_packages=["firecrawl-py", "python-dotenv"]),
expose=False,
)
def scrape_reviews(context, inputs):
url = inputs[URL][0].url
app = FirecrawlApp(api_key=FIRECRAWL_API_KEY)
scrape_result = app.scrape_url(url, params={"formats": ["markdown"]})
return {ReviewPage: [ReviewPage(review_page=json.dumps(scrape_result))]}
```
### Summarize Reviews
The final transition summarizes reviews from all the scraped pages into a markdown file.
Pay close attention to the `inputs` field below. **This transition will not begin running until 3 `ReviewPage` markers have been created from the previous transition.**
```python {2} theme={null}
@bot.transition(
inputs={ReviewPage: 3},
outputs=[],
description="Summarize product reviews.",
cpu=1,
memory=128,
image=Image(python_packages=["python-dotenv"]),
expose=False,
)
def summarize_reviews(context, inputs):
all_review_pages = "\n".join([input.review_page for input in inputs[ReviewPage]])
prompt = f"""
The following pages contain markdown reviews for products.
Summarize the key takeaways, including 1-3 direct quotes from reviewers.
Ensure the product name and URL are included:
{all_review_pages}
"""
event = context.prompt(msg=prompt, timeout_seconds=30)
summary = event.value
file_path = "/tmp/product-reviews.md"
with open(file_path, "w") as f:
f.write(summary)
context.say("Product reviews summarized successfully!")
if context.confirm(description="Do you want a sharable link to the summary?"):
context.send_file(path=file_path, description="Product Review Summary")
```
Once deployed, you'll be able to see the tasks in the dashboard, with the transition waiting until all `ReviewPage` markers have been emitted.
## Deploying the Bot
```sh theme={null}
$ beam deploy app.py:bot --name product-review-bot
```
Deploying the bot gives you access to a dashboard, where you can interact with the bot using a Chat UI.
## What's next?
With the bot deployed, there are a few things you can try:
### Create a Public Chat Page
You can create a public, sharable Chat Page for your bot by adding an `authorized=False` argument to the `bot`:
```python {6} theme={null}
from beam import Bot, BotLocation
bot = Bot(
model="gpt-4o",
api_key=OPEN_AI_API_KEY,
authorized=False,
locations=[
BotLocation(marker=ProductName),
BotLocation(marker=URL, expose=False),
BotLocation(marker=ReviewPage, expose=False),
],
description="This bot will take a product category as input, search for reviews, and summarize them.",
)
```
When deployed, this gives you a sharable Chat UI. You can retrieve the URL to the Chat UI by clicking next to the "lock" icon.
Here's what the Chat UI looks like:
### Add Interactivity
We provide a number of helper commands using a class called `context`.
Context variables can be used for prompting the user for input, creating blocking requests to the bot, and sending message to the user.
***Available Commands***
| Method | Description |
| --------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `context.confirm()` | Pause a transition until a user says yes or no. |
| `context.prompt()` | Send a blocking or non-blocking request to the model (e.g., "summarize these reviews"). You can pass an optional `wait_for_response=False` boolean to make this non-blocking. |
| `context.remember()` | Add an arbitrary JSON-serializable object to the conversation memory. |
| `context.say()` | Output text to the user's chat window. |
| `context.send_file()` | Send a file to the user from a transition. |
| `context.get_file()` | Retrieve a file from the user during a transition. |
## View The Code
You can see the full code for this example below.
```python theme={null}
from beam import Bot, BotContext, BotLocation, Image
from pydantic import BaseModel
from dotenv import load_dotenv
import os
load_dotenv()
OPEN_AI_API_KEY = os.getenv("OPEN_AI_API_KEY")
SERPAPI_API_KEY = os.getenv("SERPAPI_API_KEY")
FIRECRAWL_API_KEY = os.getenv("FIRECRAWL_API_KEY")
NUMBER_OF_PRODUCT_REVIEWS_TO_SUMMARIZE = 3
# Define Locations (States)
class ProductName(BaseModel):
product_name: str
class URL(BaseModel):
url: str
class ReviewPage(BaseModel):
review_page: str
# Create the Bot
bot = Bot(
model="gpt-4o",
api_key=OPEN_AI_API_KEY,
locations=[
BotLocation(marker=ProductName),
BotLocation(marker=URL, expose=False),
BotLocation(marker=ReviewPage, expose=False),
],
description="This bot will take a product category as input (i.e. 'headphones') and search Google shopping for those products, lookup reviews for each of them, and then summarize the reviews of all products in a summary.",
)
# Transition 1: Retrieve 3 Google shopping URLs for each product
@bot.transition(
inputs={ProductName: 1},
outputs=[URL],
description="Takes a product name and retrieves 5 Google shopping results",
cpu=1,
memory=128,
image=Image(python_packages=["serpapi", "google-search-results", "python-dotenv"]),
)
def get_product_urls(context: BotContext, inputs):
product_name = inputs[ProductName][0].product_name
from serpapi import GoogleSearch
params = {
"engine": "google_shopping",
"q": product_name,
"api_key": SERPAPI_API_KEY,
}
search = GoogleSearch(params)
results = search.get_dict()
urls = results["shopping_results"][:NUMBER_OF_PRODUCT_REVIEWS_TO_SUMMARIZE]
# Return a product url
return {URL: [URL(url=url["product_link"]) for url in urls]}
# Transition 2: Scrape review page
@bot.transition(
inputs={URL: 1},
outputs=[ReviewPage],
description="Scrapes the review page for each URL provided.",
cpu=1,
memory=128,
image=Image(python_packages=["firecrawl-py", "python-dotenv"]),
expose=False,
)
def scrape_reviews(context: BotContext, inputs):
url = inputs[URL][0].url
import json
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=FIRECRAWL_API_KEY)
# Scrape reviews from the product page
scrape_result = app.scrape_url(url, params={"formats": ["markdown"]})
print(scrape_result)
return {ReviewPage: [ReviewPage(review_page=json.dumps(scrape_result))]}
# Transition 3: Summarize the product reviews
@bot.transition(
inputs={ReviewPage: NUMBER_OF_PRODUCT_REVIEWS_TO_SUMMARIZE},
outputs=[],
description="Summarizes the reviews.",
cpu=1,
memory=128,
image=Image().add_python_packages(["python-dotenv"]),
expose=False,
)
def summarize_reviews(context: BotContext, inputs):
try:
all_review_pages = "\n".join(
[input.review_page for input in inputs[ReviewPage]]
)
print(all_review_pages)
prompt = f"""
The following page contains markdown with a review for a product.
Please highlight the key takeaways from all the reviews,
and include 1-3 direct quotes from reviewers to support your points.
In each quote, make sure to cite the name of the reviewer (if available).
Make sure to include the name of the product, and a URL to buy it, in your response:
{all_review_pages}
"""
event = context.prompt(
msg=prompt,
timeout_seconds=30,
)
context.say("I've summarized product reviews like so: " + event.value)
file_path = "/tmp/product-reviews.md"
with open(file_path, "w") as f:
f.write(event.value)
if context.confirm(description="Do you want a sharable link to the summary?"):
context.send_file(path=file_path, description="Summary of product reviews")
except AttributeError:
context.say("Review not found.")
```
# Mounting S3 Buckets
Source: https://docs.beam.cloud/v2/data/external-storage
Attach S3 buckets to your apps
Beam allows you to mount your own S3 buckets to your apps. Buckets are mounted using AWS's [mountpoint-s3](https://github.com/awslabs/mountpoint-s3). In general, any provider with an S3-compatible API should work. For instance, [AWS S3](https://aws.amazon.com/s3/), [Cloudflare R2](https://www.cloudflare.com/developer-platform/r2/), and [Tigris](https://www.tigrisdata.com/) all work out of the box.
Mountpoint is optimized for reading large files with high throughput and writing new files from a single client at a time. It does not provide full POSIX compliance. For instance, it does not support appending to files.
Cloud buckets allow you to expose your own S3-compatible storage as a file
system.
### Mounting an S3 Bucket
External S3 buckets are special cases of the [volume abstraction](/v2/data/volume) with some extra configuration. To connect your bucket to your app, you need to provide the following:
1. The bucket name
2. An AWS access key
3. An AWS secret key
4. An S3 endpoint (if you're using a non-AWS S3-compatible provider)
These will need to have permissions to read, write, and list objects in the bucket. Within AWS, you can use [IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html) to control these permissions. Other S3-compatible providers (like Cloudflare R2 and Tigris) often provide these keys when you sign up for their service.
You will need to store the access key and secret key in Beam's [secret manager](/v2/environment/secrets). You can do this using the Beam CLI:
```bash theme={null}
beam secret create S3_KEY "your-access-key"
beam secret create S3_SECRET "your-secret-key"
```
When a request is received to start the container, Beam looks up these secrets and uses them to mount the bucket. This means that you can use any names you like for the secrets.
The secrets' names must match the values you enter in the `CloudBucketConfig`.
The endpoint is optional. If you're using AWS S3, you can omit it, but if you're using a non-AWS S3-compatible provider, you will need to provide it.
```python theme={null}
from beam import CloudBucket, CloudBucketConfig, function
mount_path = "./weights"
weights = CloudBucket(
name="weights",
mount_path=mount_path,
config=CloudBucketConfig(
access_key="S3_KEY",
secret_key="S3_SECRET",
),
)
@function(volumes=[weights])
def sandbox():
import os
import uuid
# Write to the bucket.
file_name = f"{uuid.uuid4()}.txt"
file_path = os.path.join(weights.mount_path, file_name)
try:
with open(file_path, "w") as f:
f.write("hello world")
except Exception as e:
print(e)
# Read from the bucket.
with open(file_path, "r") as f:
print(f.read())
if __name__ == "__main__":
sandbox.remote()
```
The `name` field in the `CloudBucket` constructor must be the name of the
bucket you created in the cloud provider.
### Read Only Buckets
You can mount your bucket as read only by setting the `read_only` flag to `True`. This will prevent any writes to the bucket.
```python theme={null}
weights = CloudBucket(
name="weights",
mount_path="./weights",
config=CloudBucketConfig(
access_key="S3_KEY",
secret_key="S3_SECRET",
read_only=True,
),
)
```
### Specifying a region
You can specify a region for your bucket by setting the `region` field in the `CloudBucketConfig`. This option can be important when mounting AWS buckets.
From mountpoint's [documentation](https://github.com/awslabs/mountpoint-s3/blob/main/doc/CONFIGURATION.md#region-detection):
> Amazon S3 buckets are associated with a single AWS Region. Mountpoint attempts to automatically detect the region for your S3 bucket at startup time and directs all S3 requests to that region. However, in some scenarios like cross-region mount with a directory bucket, this region detection may fail, preventing your bucket from being mounted and displaying Access Denied or No Such Bucket errors.
```python theme={null}
weights = CloudBucket(
name="weights",
mount_path="./weights",
config=CloudBucketConfig(
access_key="S3_KEY",
secret_key="S3_SECRET",
region="us-east-1",
),
)
```
### Egress Fees
If your bucket is in a different region than your Beam container, you might get charged egress fees by your cloud provider. You can read more about AWS S3 egress fees [here](https://aws.amazon.com/s3/pricing/).
Tigris and Cloudflare R2 do not charge egress fees.
# Ephemeral Files and Images
Source: https://docs.beam.cloud/v2/data/output
Storing ephemeral files for images, audio files, and more.
You may want to save data produced by your tasks. Beam provides an abstraction called `Output`, which allows you to save files or directories and generate public URLs to access them.
## Saving Files
To save an `Output`, you can write any filetype to Beam's `/tmp` directory.
Here's what your code might look like:
```python theme={null}
from beam import function, Output
@function()
def save_output():
# File is saved to /tmp directory
file_name = "/tmp/my_output.txt"
# Write to new text file
with open(file_name, "w") as f:
f.write("This is an output, a glorious text file.")
# Save output
output_file = Output(path=file_name)
output_file.save()
# Generate and return a public URL
public_url = output_file.public_url(expires=400)
return {"output_url": public_url}
```
### Directories
You can also create public URLs for directories, by passing in a directory path:
```python theme={null}
# Generate a public URL for a directory
file_path = "./tmp/waveforms"
output = Output(path=file_path)
output.save()
# Returns https://app.beam.cloud/output/id/abe0c95a-2cd1-40b3-bace-9225f2c79c6d
output_url = output.public_url()
```
### PIL Images
If your app uses PIL, `Output` includes a wrapper around PIL to simplify the process of generating a public URL for the PIL image file:
```python theme={null}
# Save a PIL image
image = pipe(...)
# Persist the PIL image to an Output
output = Output.from_pil_image(image).save()
```
Here's a full example:
```python theme={null}
from beam import Image as BeamImage, Output, function
@function(
image=BeamImage(
python_packages=[
"pillow",
],
),
)
def save_image():
from PIL import Image as PILImage
# Generate PIL image
pil_image = PILImage.new(
"RGB", (100, 100), color="white"
) # Creating a 100x100 white image
# Save image file
output = Output.from_pil_image(pil_image)
output.save()
# Retrieve pre-signed URL for output file
url = output.public_url(expires=400)
print(url)
# Print other details about the output
print(f"Output ID: {output.id}")
print(f"Output Path: {output.path}")
print(f"Output Stats: {output.stat()}")
print(f"Output Exists: {output.exists()}")
return {"image": url}
if __name__ == "__main__":
save_image()
```
When you run this function, it will return a pre-signed URL to the image:
```bash theme={null}
https://app.beam.cloud/output/id/abe0c95a-2cd1-40b3-bace-9225f2c79c6d
```
## Generating Public URLs
Your app might return files from the API, such as images or MP3s. You can use `Output` to generate a public URL to access the content.
### Expiring Public URLs
You can pass an optional `expires` parameter to `output.public_url` to control how long to persist the file before it is deleted.
By default, public URLs are automatically deleted after 1 hour.
```python theme={null}
# Delete public URL after 5 minutes
output.public_url(expires=300)
```
# Distributed Storage Volumes
Source: https://docs.beam.cloud/v2/data/volume
Attach distributed storage volumes to your apps
Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
Beam Volumes are mounted directly to the containers that run your code, so they are more performant than using cloud object storage.
We strongly recommend storing your data in Beam Volumes for any data you plan to access from your Beam functions.
## How to Write Files in Beam Containers
There are two use-cases for saving files: *persistent* files, that you want to access between tasks, and *temporary* files that will be deleted when your container spins down.
1. **Persisting Files**: write to a volume.
2. **Temporary Files**: temporary files can be written to the `/tmp` directory in your Beam container, for example you could save an image to `/tmp/myimage.png`.
## Reading and Writing to Volumes
You can read and write to your Volume like any ordinary Python file:
```python theme={null}
from beam import function, Volume
VOLUME_PATH = "./model_weights"
@function(
volumes=[Volume(name="model-weights", mount_path=VOLUME_PATH)],
)
def access_files():
# Write files to a volume
with open(f"{VOLUME_PATH}/somefile.txt", "w") as f:
f.write("This is being written to a file in the volume")
# Read files from a volume
with open(f"{VOLUME_PATH}/somefile.txt", "r") as f:
print(f.readlines())
if __name__ == "__main__":
access_files()
```
It can take up to 60 seconds for any files written to a distributed volume to
become available to other containers.
To run this code, run `python [filename].py`. You'll see it print the text we just wrote to the file.
```
(.venv) $ python reading_and_writing_data.py
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
Collecting files from /Users/beta9/beam/examples/06_volume
=> Files synced
=> Running function:
['This is being written to a file in the volume']
=> Function complete
```
## Creating a Volume
Volumes can be attached anything you run on Beam.
By default, Volumes are shared across all apps in your Beam account.
```python theme={null}
from beam import function, Volume
VOLUME_PATH = "./model_weights"
@function(
volumes=[Volume(name="model-weights", mount_path=VOLUME_PATH)],
)
def load_model():
from transformers import AutoModel
# Load model from cloud storage cache
AutoModel.from_pretrained(VOLUME_PATH)
```
If you add a volume to your app, it will be created automatically. You can also create volumes manually in the CLI, by using:
```sh theme={null}
$ beam volume create my-volume
Name Created At Updated At Workspace Name
───────────────────────────────────────────────────────
my-volume just now just now f6fa28
```
## Uploading Data
You can upload files with the CLI using the `beam cp` command.
```sh theme={null}
beam cp [local-file] beam://[volume-name]
```
### Files
```bash theme={null}
beam cp file.txt beam://myvol/ # ./file.txt => beam://myvol/file.txt
beam cp file.txt beam://myvol/file.txt # ./file.txt => beam://myvol/file.txt
beam cp file.txt beam://myvol/file.new # ./file.txt => beam://myvol/file.new
beam cp file.txt beam://myvol/hello # ./file.txt => beam://myvol/hello.txt (keeps the extension)
```
### Directories
```bash theme={null}
beam cp mydir beam://myvol # ./mydir/file.txt => beam://myvol/file.txt
beam cp mydir beam://myvol/mydir # ./mydir/file.txt => beam://myvol/mydir/file.txt
beam cp mydir beam://myvol/newdir # ./mydir/file.txt => beam://myvol/newdir/file.txt
```
## Downloading Data
### Files
```bash theme={null}
beam cp beam://myvol/file.txt . # beam://myvol/file.txt => ./file.txt
beam cp beam://myvol/file.txt file.new # beam://myvol/file.txt => ./file.new
```
### Directories
```bash theme={null}
beam cp beam://myvol/mydir . # beam://myvol/mydir/file.txt => ./file.txt
```
## CLI Management Commands
### Create a Volume
```bash theme={null}
beam volume create [VOLUME-NAME]
```
```bash theme={null}
$ beam volume create weights
Name Created At Updated At Workspace Name
───────────────────────────────────────────────────────
weights May 07 2024 May 07 2024 cf2db0
```
### Delete a Volume
```bash theme={null}
beam volume delete [VOLUME-NAME]
```
```bash theme={null}
$ beam volume delete model-weights
Any apps (functions, endpoints, task queue, etc) that
refer to this volume should be updated before it is deleted.
Are you sure? (y/n) [n]: y
Deleted volume model-weights
```
### List Volumes
```bash theme={null}
beam volume list
```
```bash theme={null}
$ beam volume list
Name Size Created At Updated At Workspace Name
─────────────────────────────────────────────────────────────────────────────────────
weights 240.23 MiB 2 days ago 2 days ago cf2db0
1 volumes | 240.23 MiB used
```
### List Volume Contents
```bash theme={null}
beam ls [VOLUME-NAME]
```
```bash theme={null}
$ beam ls weights
Name Size Modified Time IsDir
──────────────────────────────────────────────────────────────────
.locks 0.00 B 29 minutes ago Yes
models--facebook--opt-125m 240.23 MiB 28 minutes ago Yes
2 items | 240.23 MiB used
```
### Copy Files to Volumes
```bash theme={null}
beam cp [LOCAL-PATH] beam://[VOLUME-NAME]
```
```bash theme={null}
$ beam cp my-file beam://my-volume
[beam://my-volume/my-file] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MiB 1.29 MiB/s 0:00:07
```
### Move Files in Volumes
```bash theme={null}
beam mv [SOURCE] [DEST]
```
```bash theme={null}
$ beam mv file.txt files/text-files
Moved file.txt to files/text-files/file.txt
```
### Remove Files from Volumes
```bash theme={null}
beam rm [FILE]
```
```bash theme={null}
=> weights/app.py (1 object deleted)
app.py
```
# Keeping Containers Warm
Source: https://docs.beam.cloud/v2/endpoint/keep-warm
Control how long your apps stay running before shutting down.
By default, Beam is serverless, which means your applications will shut off automatically when they're not being used.
## Configuring Keep Warm
You can control how long your containers are kept alive by using the `keep_warm_seconds` flag in your deployment trigger.
For example, by adding a `keep_warm_seconds=300` argument to an endpoint, your app will stay running for 5 minutes before shutting off:
```python theme={null}
from beam import endpoint
# Container stays alive for 5 min before shutting down automatically
@endpoint(keep_warm_seconds=300)
def handler():
return {}
```
When `keep_warm_seconds` is set in your deployment, it will count as billable
usage.
## Setting Always-On Containers
Any running containers count towards billable usage. Take care to avoid
setting `min_containers` unless you're comfortable paying for usage 24/7.
You can configure the number of containers running at baseline using the `min_containers` field.
By setting `min_containers=1`, 1 container will *always* remain running until the deployment is stopped.
If you redeploy an app that has `min_containers` set, make sure to explicitly
stop the previous deployment versions in order to avoid running containers
that you are no longer using.
```python theme={null}
from beam import endpoint, QueueDepthAutoscaler
@endpoint(
autoscaler=QueueDepthAutoscaler(
min_containers=1, max_containers=3, tasks_per_container=1
),
)
def handler():
return {"success": "true"}
```
## Pre-Warming Your Container
You can pre-warm your containers by adding `/warmup` to the end of your deployment URL:
```sh theme={null}
curl -X POST 'https://hello-world-a4bdc39-v1.app.beam.cloud/warmup' \
-H 'Authorization: Bearer [YOUR_TOKEN]'
```
When invoked, this endpoint will send a request to the container to warm-up.
You can add `/warmup` to the end of any of your deployment APIs to warm-up your container:
```
id/:stubId/warmup
/:deploymentName/warmup
/:deploymentName/latest/warmup
/:deploymentName/v:version/warmup
```
## Default Container Spin-down Times
After handling a request, Beam keeps containers running ("warm") for a certain amount of time in order to quickly handle future requests. By default, these are the container "keep warm" times for each deployment type:
| Deployment Type | Container Keep Warm Duration |
| ----------------------- | ---------------------------- |
| Endpoints/ASGI/Realtime | 180s |
| Task Queues | 10s |
| Pods | 600s |
# Pre-Loading Models
Source: https://docs.beam.cloud/v2/endpoint/loaders
This guide shows how you can optimize performance by pre-loading models when your container first starts.
Beam includes an optional `on_start` lifecycle hook which you can add to your functions. The `on_start` function will be run exactly once when your container first starts.
```python app.py theme={null}
from beam import endpoint
def download_models():
# Do something that only needs to happen once
return {}
# The on_start function runs once when the container starts
@endpoint(on_start=download_models)
def handler():
return {}
```
Anything returned from `on_start` can be retrieved in the `context` variable that is automatically passed to your handler:
```python theme={null}
from beam import endpoint
def download_models():
# Do something that only needs to happen once
x = 10
return {"x": x}
# The on_start function runs once when the container starts
@endpoint(on_start=download_models)
def handler(context):
# Retrieve cached values from on_start
on_start_value = context.on_start_value
return {}
```
# Example: Downloading Model Weights
```python theme={null}
from beam import Image, endpoint, Volume
CACHE_PATH = "./weights"
def download_models():
from transformers import AutoTokenizer, OPTForCausalLM
model = OPTForCausalLM.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
return model, tokenizer
@endpoint(
on_start=download_models,
volumes=[Volume(name="weights", mount_path=CACHE_PATH)],
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_version="python3.8",
python_packages=[
"transformers",
"torch",
],
),
)
def predict(context, prompt):
# Retrieve cached model from on_start function
model, tokenizer = context.on_start_value
# Generate
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=30)
result = tokenizer.batch_decode(
generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(result)
return {"prediction": result}
```
## Using Loaders with Multiple Workers
If you are scaling out vertically with
[workers](/v2/scaling/concurrency#increasing-throughput-in-a-single-container),
the loader function will run once for each worker that starts up.
# Creating a Web Endpoint
Source: https://docs.beam.cloud/v2/endpoint/overview
Deploying and invoking web endpoints on Beam
Beam allows you to deploy web endpoints that can be invoked via HTTP requests. These endpoints can be used to run arbitrary code. For instance, you could perform inference using one of our GPUs, or just run a simple function that multiplies two numbers.
```python theme={null}
from beam import endpoint
@endpoint(
cpu=1.0,
memory=128,
)
def multiply(**inputs):
result = inputs["x"] * 2
return {"result": result}
```
**Endpoints vs. Task Queues**
Endpoints are RESTful APIs, designed for synchronous tasks that can complete in 180 seconds or less. For longer running tasks, you'll want to use an asynchronous [`task_queue`](/v2/task-queue/running-tasks) instead.
#### Launch a Preview Environment (Optional)
[`beam serve`](/v2/reference/cli#serve) monitors changes in your local file system, live-reloads the remote environment as you work, and forwards remote container logs to your local shell.
Serve is great for prototyping. You can develop in a containerized cloud environment in real-time, with adjustable CPU, memory, GPU resources.
It's also great for testing an app before deploying it. Served functions are orchestrated identically to deployments, which means you can test your Beam workflow end-to-end before deploying.
To start an ephemeral `serve` session, you'll use the `serve` command:
```sh theme={null}
beam serve [FILE.PY]:[ENTRY-POINT]
```
For example, to start a session for the `multiply` function in `app.py`, run:
```sh theme={null}
beam serve app.py:multiply
```
To end the session, you can use `Ctrl + C` in the terminal where you started the session.
Serve sessions end automatically after 10 minutes of inactivity. The entire
duration of the session is counted towards billable usage, even if the session
is not receiving requests.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
### Deploying the Endpoint
When you're finished with prototyping and want to make a persistent deployment of the endpoint, enter your shell and run this command from the working directory:
```bash theme={null}
beam deploy [FILE.PY]:[ENTRY-POINT]
```
After running this command, you'll see some logs in the console that show the progress of your deployment.
```bash theme={null}
$ beam deploy app.py:multiply
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
=> Deploying endpoint
=> Deployed
=> Invocation details
curl -X POST 'https://multiply-712408b-v1.app.beam.cloud' \
-H 'Accept: _/_' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
```
The container handling the endpoint will spin down after 180 seconds of inactivity by default, or customized with the `keep_warm_seconds` parameter. The container will be billed for the time it is active and handling requests.
### Calling the Endpoint
After deploying the API, you'll be able to make a web request to hit the API with cURL or libraries of your choice.
Open another terminal window to invoke the API:
### Example Request
```sh theme={null}
curl -X POST 'https://multiply-712408b-v1.app.beam.cloud' \
-H 'Accept: */*' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{"x": 10}'
```
### Example Response
```json theme={null}
{
"result": 20
}
```
In Python, you can use the `requests` library to make a POST request to the endpoint:
```python theme={null}
import requests
url = "https://multiply-712408b-v1.app.beam.cloud"
headers = {
"Connection": "keep-alive",
"Content-Type": "application/json",
"Authorization": "Bearer [YOUR_AUTH_TOKEN]",
}
data = {"x": 10}
response = requests.post(url, headers=headers, json=data)
print(response.json())
```
### Example Response
```json theme={null}
{ "result": 20 }
```
To send other payloads other than JSON, you can encode the data as a base64 string and include it in the JSON payload, or upload the file to a S3 bucket and mount the bucket to the endpoint.
For more detailed examples, checkout the [Sending File Payloads](/v2/endpoint/sending-file-payloads) documentation.
```
```
# Realtime and Streaming
Source: https://docs.beam.cloud/v2/endpoint/realtime
## Deploying a Realtime App
This is a simple example of a realtime streaming app. When deployed, this app will be exposed as a public websocket endpoint.
The `realtime` handler accepts a single parameter, called `event`, with the event payload.
The `realtime` decorator is an abstraction above `asgi`.
This means that additional parameters in `asgi`, such as [`concurrent_requests`](/v2/endpoint/web-server#concurrent-requests) can be used too.
```python app.py theme={null}
from beam import realtime
@realtime(
cpu=1,
memory="1Gi",
concurrent_requests=10, # Process 10 requests at a time
authorized=False, # Don't require auth to invoke
)
def stream(event):
# Echo back the event payload sent to the websocket
return {"response": event}
```
This app can be deployed in traditional Beam fashion:
```sh theme={null}
beam deploy app.py:stream
```
## Streaming Responses from the Client
Realtime Endpoints can be connected to from any websocket client.
The code below uses the Beam Javascript SDK to send requests to the realtime app.
Make sure to add an `.env` file to your project with your `BEAM_DEPLOYMENT_ID` and `BEAM_TOKEN`:
```javascript client.js theme={null}
import beam from "@beamcloud/beam-js";
const streamResponse = async () => {
const client = await beam.init(process.env.BEAM_TOKEN);
const deployment = await client.deployments.get({ id: process.env.BEAM_DEPLOYMENT_ID });
const connection = await deployment.realtime();
const payload = {
"event": "Echo this back",
}
connection.onmessage = (message) => {
console.log(`Response: ${message.data}`);
};
connection.send(JSON.stringify(payload));
setTimeout(() => {
connection.close();
}, 1000);
};
streamResponse();
```
The code below uses the native WebSocket API to send requests to the realtime app.
```javascript client.js theme={null}
const socket = new WebSocket("wss://1c0f0cbe-e0d1-49ae-a556-5daffe23eb4c.app.beam.cloud");
// Connection opened
socket.addEventListener("open", (event) => {
socket.send("Hello Server!");
});
// Listen for messages
socket.addEventListener("message", (event) => {
console.log(event.data); // {"response":"Hello Server!"}
});
```
# Sending File Payloads
Source: https://docs.beam.cloud/v2/endpoint/sending-file-payloads
Sending file payloads to Endpoints and Web Servers
There are two easy ways to send files to your Beam endpoints and ASGI web servers.
## Sending Files to Endpoints Using Base64
The simplest way to send files to your Beam endpoint is to use Base64 encoding. In the example below, we will use this method to send an image to an endpoint. The first step is to define an endpoint that accepts an encoded string.
```python theme={null}
import base64
import io
from beam import endpoint
from beam import Image as BeamImage
from PIL import Image
@endpoint(name="image_endpoint", image=BeamImage().add_python_packages(["pillow"]))
def image_endpoint(image: str):
image = base64.b64decode(image)
image = Image.open(io.BytesIO(image))
# do something with the image
return {"message": "Image processed successfully"}
```
We can then deploy our endpoint with the command `beam deploy app.py:image_endpoint`. The simple script below can be used to send an image to the endpoint.
```python Python theme={null}
import base64
import requests
with open("./cool-picture.png", "rb") as image_file:
encoded_string = base64.b64encode(image_file.read())
b64_image = encoded_string.decode("utf-8")
url = "https://image-endpoint-53b4230-v1.app.beam.cloud"
headers = {
"Connection": "keep-alive",
"Content-Type": "application/json",
"Authorization": "Bearer ",
}
data = {"image": b64_image}
response = requests.post(url, headers=headers, json=data)
```
```bash Curl theme={null}
export B64_FILE=$(base64 -i ./cool-picture.png)
curl -X POST "https://image-endpoint-53b4230-v1.app.beam.cloud" \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer ' \
-d '{"image": "$B64_FILE"}'
```
## Using S3 to Send Files
With Beam, you can easily [mount S3 buckets](/v2/data/external-storage) to your endpoints and web servers. This allows you to upload files to S3 and access them in your endpoint or web server. This method is recommended if you are sending large payloads (20+ MB). Another benefit of using S3 is that you will not need to include decoding logic in your endpoint.
We can modify our previous example by accepting a filename and reading the image from a mounted S3 bucket. Our frontend will need to upload the image to the S3 bucket and then pass the filename to our endpoint.
```python theme={null}
import os
from beam import CloudBucket, CloudBucketConfig, endpoint
from beam import Image as BeamImage
from PIL import Image
mount_path = "./uploads"
uploads = CloudBucket(
name="uploads",
mount_path=mount_path,
config=CloudBucketConfig(
access_key="BEAM_S3_KEY",
secret_key="BEAM_S3_SECRET",
),
)
@endpoint(name="image_endpoint", image=BeamImage().add_python_packages(["pillow"]), volumes=[uploads])
def image_endpoint(image_name: str):
image_path = os.path.join(uploads.mount_path, image_name)
image = Image.open(image_path)
# do something with the image
return {"message": "Image processed successfully"}
```
In order to correctly mount the S3 bucket, we need to make sure that our secrets are set. We can do this using the Beam CLI.
```bash theme={null}
beam secret create BEAM_S3_KEY "your-access-key"
beam secret create BEAM_S3_SECRET "your-secret-key"
```
Once again, we can deploy our endpoint with the command `beam deploy app.py:image_endpoint`.
To test this method, we can upload an image to the S3 bucket using the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) and then pass the filename to our endpoint.
```bash theme={null}
aws s3 cp ./test.png s3://uploads/
```
The image will be uploaded to the S3 bucket and the endpoint will be able to read it. We can verify this by invoking our endpoint with the filename.
```bash theme={null}
curl -X POST "https://image-endpoint-53b4230-v1.app.beam.cloud" \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer ' \
-d '{"image_name": "test.png"}'
```
# Versioning
Source: https://docs.beam.cloud/v2/endpoint/versioning
Deployment URLs are versioned in this format:
`https://[APP-NAME]-[APP-ID]-[VERSION].app.beam.cloud`
### Accessing the Latest Version
The latest version of your app will always be available at the root URL. For example, by removing the `-v1` suffix, this will invoke the latest version:
```
https://multiply-712408b.app.beam.cloud
```
### Invoking Specific Versions
You can invoke specific versions of your apps by specifying the version in the app URL.
Here are some examples:
* To invoke latest: `https://multiply-712408b.app.beam.cloud`
* To invoke version `3`: `https://multiply-712408b-v3.app.beam.cloud`
* To invoke version `17`: `https://multiply-712408b-v17.app.beam.cloud`
# Hosting a Web Server
Source: https://docs.beam.cloud/v2/endpoint/web-server
Deploying web servers on Beam
With Beam, you can deploy web servers that use the [ASGI](https://asgi.readthedocs.io/en/latest/introduction.html) protocol. This means that you can deploy applications built with popular frameworks like FastAPI and Django.
## Multiple Endpoints Per App
In the example below, we are deploying a FastAPI web server that uses the Huggingface Transformers library to perform sentiment analysis and text generation.
We also included a warmup endpoint so that we can preemptively get our container ready for incoming requests.
This example uses Pydantic to serialize request inputs. [You can read more
about it here](https://fastapi.tiangolo.com/tutorial/body/).
```python app.py theme={null}
from beam import Image, asgi
from pydantic import BaseModel
# Request payload for API, declared with Pydantic
class GenerateInput(BaseModel):
text: str
max_length: int
class SentimentInput(BaseModel):
text: str
def init_models():
from transformers import pipeline
model = "gpt2"
# Initialize two simple models
sentiment_analyzer = pipeline("sentiment-analysis")
text_generator = pipeline("text-generation", model="gpt2")
return sentiment_analyzer, text_generator, model
@asgi(
name="sentiment-and-generation",
image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
on_start=init_models,
memory=2048,
)
def handler(context):
import asyncio
from fastapi import FastAPI, Query
app = FastAPI()
sentiment_analyzer, text_generator, generate_model = context.on_start_value
@app.post("/sentiment")
async def analyze_sentiment(input: SentimentInput):
# Unpack request input and send to ML model
result = sentiment_analyzer(input.text)
return result
@app.post("/generate")
async def generate_text(input: GenerateInput):
result = text_generator(input.text, max_length=input.max_length)
return result
@app.post("/warmup")
async def warmup():
return {"status": "warm"}
return app
```
As shown above, the handler function for the web server must return the ASGI
app object.
## Launch a Preview Environment (Optional)
Just like an endpoint, you can prototype your web server using [`beam serve`](/v2/reference/cli#serve). This command will monitor changes in your local file system, live-reload the remote environment as you work, and forward remote container logs to your local shell.
```sh theme={null}
beam serve app.py:web_server
```
Serve sessions end automatically after 10 minutes of inactivity. The entire
duration of the session is counted towards billable usage, even if the session
is not receiving requests.
## Deploying the Web Server
When you are ready to deploy your web server, run the following command:
```bash theme={null}
beam deploy app.py:web_server
```
You'll see some logs in the console that show the progress of your deployment.
```bash theme={null}
=> Building image
=> Syncing files
...
=> Invocation details
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{}'
```
The container handling the app will spin down after 180 seconds of inactivity
by default, or customized with the `keep_warm_seconds` parameter. The
container will be billed for the time it is active and handling requests.
## Sending Requests
If we wanted to perform sentiment analysis using our deployed example from above, we would send a POST request like this:
```bash theme={null}
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/generate' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{"text": "The meaning of life is "}'
```
## Concurrent Requests
When building an ASGI app, you can specify the number of concurrent requests your app can handle using the `concurrent_requests` parameter in the `@asgi` decorator.
```python theme={null}
@asgi(
name="sentiment-and-generation",
image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
on_start=init_models,
memory=1024,
concurrent_requests=10
)
```
This allows you to increase the number of requests your app can handle at once, which can help you achieve higher throughput. For instance, if your app is doing I/O-bound work, additional requests can be handled while your I/O operations complete in the background.
We can simulate this by adding a `model` endpoint that pretends to do some expensive I/O to our example from above.
```python theme={null}
@app.get("/model")
async def model(model: str = Query(...)):
# Pretend we're doing expensive I/O here to demonstrate the value of concurrent requests
await asyncio.sleep(10)
return {"model": model}
```
Now, if you send a request to `model` and then send another request to `generate`, you will see that the second request will complete before the first.
```bash Model Request theme={null}
curl -X GET 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/model?model=gpt2' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
```
```bash Generate Request theme={null}
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/generate' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{"text": "Bananas are ", "max_length": 50}'
```
## Response Types
Beam supports various response types, including any FastAPI response type. [You can find a list of FastAPI response types here](https://fastapi.tiangolo.com/advanced/custom-response/).
## Uploading Local Files
If your web server needs access to local files like model weights or other resources, you can use [Beam volumes](/v2/data/volume).
To add files to a volume, you can use the `beam cp` command.
```bash theme={null}
beam cp [local-file] beam://[volume-name]
```
Then, you can define a volume and pass it into your `@asgi` decorator like this:
```python theme={null}
from beam import asgi, Volume, Image
@asgi(
name="sentiment-analysis",
image=Image(python_packages=["fastapi"]),
volumes=[Volume(name="model-weights", mount_path="./model_weights")],
)
def web_server():
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
with open("./model_weights/somefile.txt", "r") as f:
return {"message": f.read()}
return app
```
# Container Images
Source: https://docs.beam.cloud/v2/environment/custom-images
Applications on Beam are run inside *containers*. A container is like a lightweight VM that packages a set of software packages required by your application. The benefit of using containers is portability. The required runtime environment is packaged alongside the application.
Containers are based on container *images* which are instructions for how a container should be built.
Because you are building a custom application, it is likely that your application depends on some custom software to run. This could include custom python packages, libraries, binaries, and drivers.
You can customize the container image used to run your Beam application with the [`Image`](/v2/reference/py-sdk#image) class. The options specified in the `Image` class will influence how the image is built.
## Exploring the Beam Image Class
Every application that runs on Beam instantiates the [`Image`](/v2/reference/py-sdk#image) class. This class provides a variety of methods for customizing the container image used to run your application.
It exposes options for:
* Installing a specific version of Python
* Adding custom shell commands that run during the build process
* Adding custom Python packages to install in the container
* Choosing a custom base image to build on top of
* Using a custom Dockerfile to build your own base image
* Setting up a custom conda environment using micromamba
The default Beam image uses `ubuntu:22.04` as its base and installs Python
3.10.
```python theme={null}
from beam import function, Image
image = Image()
# This function will use ubuntu:22.04 with Python 3.10
@function(image=image)
def hello_world():
return "Hello, world!"
hello_world.remote()
```
## Adding Python Packages
The most common way to customize your image is to add the Python packages required by your application. This is done by calling the `add_python_packages` method on the `Image` object with a list of package names.
Pinning the version of the package is recommended. This ensures that when you
re-deploy your application, you won't accidentally pick up a new version that
breaks your application.
```python theme={null}
from beam import Image, endpoint
image = Image(python_version="python3.11").add_python_packages(["numpy==2.2.0"])
@endpoint(image=image)
def handler():
return {}
```
### Importing `requirements.txt`
If you already have a `requirements.txt` file, you can also use that directly using the `Image` constructor's `python_packages` parameter:
```python theme={null}
from beam import Image, endpoint
image = Image(python_version="python3.11", python_packages="requirements.txt")
@endpoint(image=image)
def handler():
return {}
```
## Adding Shell Commands
Sometimes, it is necessary to run additional shell commands while building your image. This can be achieved by calling the `add_commands` method on the `Image` object with a list of commands.
For instance, you might need to install `libjpeg-dev` when using the `Pillow` library. In the example below, we'll install `libjpeg-dev` and then install `Pillow`.
```python theme={null}
from beam import Image, endpoint
image = (
Image(python_version="python3.11")
.add_commands(["apt-get update", "apt-get install libjpeg-dev -y"])
.add_python_packages(["Pillow"])
)
@endpoint(image=image)
def handler():
return {}
```
## Customizing the Base Image
Some applications and libraries require specific dependencies that are not available in the default Beam image. In these cases, you can use a custom base image.
Some of the most common custom base images are the CUDA development images from NVIDIA (e.g. `nvcr.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04`). These images come with additional libraries, debugging tools, and `nvcc` installed.
The image below will use a custom CUDA image as the base.
```python theme={null}
from beam import Image, function
image = Image(
base_image="nvcr.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04"
)
@function(image=image)
def hello_world():
return "Hello, world!"
hello_world.remote()
```
### CUDA Drivers & NVIDIA Kernel Drivers
When choosing a custom base image, it is important to understand the difference between the NVIDIA Kernel Driver and the CUDA Runtime & Libraries.
| **Component** | **Location** | **Role** |
| ---------------------------- | ---------------- | -------------------------------------------------------- |
| **NVIDIA Kernel Driver** | **Host Machine** | Low-level GPU management, talks directly to hardware. |
| **CUDA Runtime & Libraries** | **Container** | Provides high-level APIs and libraries for applications. |
The NVIDIA Kernel Driver on the host must support the CUDA version used by the container.
In general, if the CUDA version on the host is greater than or equal to the CUDA version in the container, then the NVIDIA Kernel Driver on the host will support the CUDA version used by the container.
For example, using a CUDA 12.2 image on a host with a CUDA 12.4 driver will
work. However, using a CUDA 12.8 image on a host with a CUDA 12.4 driver *will
not* work.
You can consult the table below to help you choose a compatible base image.
| GPU | Driver Version | CUDA Version |
| ------- | -------------- | ------------ |
| A10G | 550.90.12 | 12.4 |
| RTX4090 | 550.127.05 | 12.4 |
| H100 | 550.127.05 | 12.4 |
## Using a Specific Python Version
To install a specific version of Python, you can use the `python_version` parameter:
```python theme={null}
from beam import function, Image
# This function will use ubuntu:22.04 with Python 3.11
@function(image=Image(python_version="python3.11"))
def hello_world():
return "Hello, world!"
hello_world.remote()
```
This function will use the CUDA image as the base and install Python 3.10 because no `python_version` is specified and the CUDA image has no Python version installed.
```python theme={null}
from beam import Image, function
@function(
image=Image(
base_image="nvcr.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04",
)
)
def custom_image_no_python():
return "Hello, world!"
```
This function will use the CUDA image as the base and install Python 3.11 because a `python_version` *is* specified.
```python theme={null}
from beam import Image, function
@function(
image=Image(
base_image="nvcr.io/nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04",
python_version="python3.11",
)
)
def custom_image_python_requested():
return "Hello, world!"
```
If your image comes with a pre-installed version of Python3, it will be used by default *as long as* you don't specify a `python_version` in your `Image` constructor. This function will use the PyTorch image as the base and will use the Python version that already exists in the PyTorch image.
```python theme={null}
from beam import Image, function
@function(
image=Image(
base_image="docker.io/pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel"
)
)
def custom_image_pytorch():
return "Hello, world!"
```
## Building on GPU
By default, Beam builds your images on CPU-only machines. However, sometimes you might need the build to occur on a machine with a GPU.
For instance, some libraries might compile CUDA kernels during installation. In these cases, you can use the `build_with_gpu()` command to run your build on the GPU of your choice.
```python theme={null}
from beam import Image
image = (
Image()
.add_commands(
[
"apt-get update -y",
"apt-get install ffmpeg -y",
"apt-get install nvidia-cuda-toolkit -y", # Requires GPU to install
]
)
.build_with_gpu(gpu="T4") # Install on a T4
)
```
## Building with Environment Variables
Often, shell commands require certain environment variables to be set. You can set these using the `with_envs` command:
```python theme={null}
from beam import Image
image = (
Image()
.add_python_packages(["huggingface_hub[cli]", "accelerate"])
.with_envs(["HF_HUB_ENABLE_HF_TRANSFER=1", "HF_HOME"=/models])
.add_commands(["huggingface-cli download meta-llama/Llama-3.2-3B"])
)
```
### Injecting Secrets
Sometimes, you might not want the environment variables to be set in plain text. In these cases, you can leverage Beam secrets and the `with_secrets` command:
You can create secrets like this, using the CLI: `beam secret create HF_TOKEN `.
```python theme={null}
from beam import Image
image = (
Image()
.add_python_packages(["huggingface_hub[cli]", "accelerate"])
.with_envs(["HF_HUB_ENABLE_HF_TRANSFER=1", "HF_HOME"=/models])
.with_secrets(["HF_TOKEN"]) # Models with a user agreement often require a token
.add_commands(["huggingface-cli download meta-llama/Llama-3.2-3B"])
)
```
**Note** Adding secrets and environment variables to the build environment *does not* make them available in the runtime environment.
Runtime environment variables and secrets must be specified in the function decorator directy:
```python theme={null}
from beam import function
@function(env_vars={"HF_HOME": "/models"}, secrets=["HF_TOKEN"])
def download_model():
return "Hello, world!"
```
## Using a Dockerfile
You also have the option to build your own custom base image using a Dockerfile.
The `from_dockerfile()` command accepts a path to a valid Dockerfile as well as an optional path to a context directory:
```python theme={null}
from beam import Image, endpoint
image = Image().from_dockerfile("./Dockerfile").add_python_packages(["numpy"])
@endpoint(image=image, name="test_dockerfile")
def handler():
return {}
```
The context directory serves as the root for any paths used in commands like `COPY` and `ADD`, meaning all relative paths are relative to this directory.
The image built from your Dockerfile will be used as the base image for a Beam application.
Ports *will not* be exposed in the runtime environment, and the entrypoint
will be overridden.
## Conda Environments
Beam supports using Anaconda environments via [micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html). To get started, you can chain the `micromamba` method to your `Image` definition and then specify packages and channels via the `add_micromamba_packages` method.
```python theme={null}
from beam import Image
image = (
Image(python_version="python3.11")
.micromamba()
.add_micromamba_packages(packages=["pandas", "numpy"], channels=["conda-forge"])
.add_python_packages(packages=["huggingface-hub[cli]"])
.add_commands(commands=["micromamba run -n beta9 huggingface-cli download gpt2 config.json"])
)
```
You can still use `pip` to install additional packages in the `conda` environment and you can run shell commands too.
If you need to run a shell command inside the conda environment, you should
prepend the command with `micromamba run -n beta9` as shown above.
# Custom Registries
Source: https://docs.beam.cloud/v2/environment/custom-registries
Beam supports importing images from custom public and private registries.
## Public Docker Registries
You can import existing images from remote Docker registries, like [Docker Hub](https://hub.docker.com/search?q=), [Google Artifact Registry](https://cloud.google.com/artifact-registry), [ECR](https://aws.amazon.com/ecr/), [GitHub Container Registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry), [NVIDIA](https://catalog.ngc.nvidia.com/containers) and more.
Just supply a `base_image` argument to [`Image`](/v2/reference/py-sdk#image).
```python theme={null}
from beam import endpoint, Image
image = (
Image(
base_image="docker.io/nvidia/cuda:12.3.1-runtime-ubuntu20.04",
python_version="python3.9",
)
.add_commands(["apt-get update -y", "apt-get install neovim -y"])
.add_python_packages(["torch"])
)
@endpoint(image=image)
def handler():
import torch
return {"torch_version": torch.__version__}
```
Beam only supports Debian-based images. In addition, make sure your image is
built for the correct x86 architecture.
## Private Docker Registries
Beam supports importing images from the following private registries: [AWS ECR](https://aws.amazon.com/ecr/), [Google Artifact Registry](https://cloud.google.com/artifact-registry), [Docker Hub](https://hub.docker.com/), [GitHub Container Registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry), and [NVIDIA Container Registry](https://catalog.ngc.nvidia.com/containers).
Private registries require credentials, and you can pass the credentials to Beam in two ways: as a dictionary, or exported from your shell so Beam can automatically lookup the values.
**Passing Credentials as a Dictionary**
You can provide the values for the registry as a dictionary directly, like this:
```python theme={null}
from beam import Image
image = Image(
base_image="111111111111.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
base_image_creds={
"AWS_ACCESS_KEY_ID": "xxxx",
"AWS_SECRET_ACCESS_KEY": "xxxx",
"AWS_REGION": "xxxx"
},
)
```
**Passing Credentials from your Environment**
Alternatively, you can export your credentials in your shell and pass the environment variable names to `base_image_creds` as a list:
```python theme={null}
from beam import Image
image = Image(
base_image="111111111111.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
base_image_creds=[
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_REGION"
],
)
```
### AWS ECR
To use a private image from Amazon ECR, export your AWS environment variables. Then configure the Image object with those environment variables.
You can authenticate with either your static AWS credentials or an AWS STS
token. If you use the AWS STS token, your `AWS_SESSION_TOKEN` key must also be
set.
```python theme={null}
from beam import Image
image = Image(
python_version="python3.12",
base_image="111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest",
base_image_creds=["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_REGION"]
)
@endpoint(image=image)
def handler():
pass
```
### GCP Artifact Registry
To use a private image from Google Artifact Registry, export your access token.
```sh theme={null}
export GCP_ACCESS_TOKEN=$(gcloud auth print-access-token --project=my-project)
```
Then configure the Image object to use the environment variable.
```python theme={null}
from beam import Image
image = Image(
python_version="python3.12",
base_image="us-east4-docker.pkg.dev/my-project/my-repo/my-image:0.1.0",
base_image_creds=["GCP_ACCESS_TOKEN"]
)
@endpoint(image=image)
def handler():
pass
```
### NVIDIA GPU Cloud (NGC)
To use a private image from NVIDIA GPU Cloud, export your API key.
```sh theme={null}
export NGC_API_KEY=abc123
```
Then configure the Image object to use the environment variable.
```python theme={null}
from beam import Image
image = Image(
python_version="python3.12",
base_image="nvcr.io/nvidia/tensorrt:24.10-py3",
base_image_creds=["NGC_API_KEY"]
)
@endpoint(image=image)
def handler():
pass
```
### Docker Hub
To use a private image from Docker Hub, export your Docker Hub credentials.
```sh theme={null}
export DOCKERHUB_USERNAME=user123
export DOCKERHUB_PASSWORD=pass123
```
Then configure the Image object with those environment variables.
```python theme={null}
from beam import Image
image = Image(
python_version="python3.12",
base_image="docker.io/my-org/my-image:0.1.0",
base_image_creds=["DOCKERHUB_USERNAME", "DOCKERHUB_PASSWORD"]
)
@endpoint(image=image)
def handler():
pass
```
### GitHub Container Registry
To use a private image from GitHub Container Registry, export your GitHub credentials. You will need a [personal access token](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry).
```sh theme={null}
export GITHUB_USERNAME=user123
export GITHUB_TOKEN=token123
```
Then configure the Image object with those environment variables.
```python theme={null}
from beam import Image
image = Image(
python_version="python3.12",
base_image="ghcr.io/my-username/my-image:0.1.0",
base_image_creds=["GITHUB_USERNAME", "GITHUB_TOKEN"]
)
@endpoint(image=image)
def handler():
pass
```
# GPU Acceleration
Source: https://docs.beam.cloud/v2/environment/gpu
## Running Tasks on GPU
You can run any code on a cloud GPU by passing a `gpu` argument in your function decorator.
```python theme={null}
from beam import endpoint
@endpoint(gpu="H100")
def handler():
# Prints the available GPU drivers
import subprocess
print(subprocess.check_output(["nvidia-smi"], shell=True))
return {"gpu":"true"}
```
### Available GPUs
Currently available GPU options are:
* `A10G` (24Gi)
* `RTX4090` (24Gi)
* `H100` (80Gi)
### Check GPU Availability
Run `beam machine list` to check whether a machine is available.
```bash theme={null}
$ beam machine list
GPU Type Available
──────────────────────
A10G Yes
RTX4090 Yes
```
## Prioritizing GPU Types
You can split traffic across multiple GPUs by passing a list to the `gpu` parameter.
The list is ordered by priority. You can choose which GPUs to prioritize by specifying them at the front of the list.
```python theme={null}
gpu=["T4", "A10G", "H100"]
```
In this example, the `T4` is prioritized over the `A10G`, followed by the `H100`.
## Using Multiple GPUs
You can run workloads across multiple GPUs by using the `gpu_count` parameter.
This feature is available *by request only*. Please send us a message in
Slack, and we'll enable it on your account.
```python theme={null}
from beam import endpoint
@endpoint(gpu="A10G", gpu_count=2)
def handler():
return {"hello": "world"}
```
## GPU Regions
Beam runs on servers distributed around the world, with primary locations in the United States, Europe, and Asia. If you would like your workloads to run in a specific region of the globe, [please reach out](https://join.slack.com/t/beam-cloud/shared_invite/zt-3enuvj3r7-OeAzVPYvyqQHy9avNrLL0w).
# Working in Jupyter Notebooks
Source: https://docs.beam.cloud/v2/environment/jupyter-notebook
You can run Beam functions from Jupyter Notebook cells, which is useful for outsourcing heavy computation to Beam's serverless cloud.
Beam works in local notebooks and cloud notebooks like Google Colab.
Try out an example Jupyter Notebook
## Initial Setup
The first is installing `beam-client` and adding your Beam credentials to the notebook:
```
# Colab Setup: Install beam-client
!pip install beam-client
# Import the Beam client
import beam
# Add your Beam API key
!beam configure default --token [YOUR-BEAM-TOKEN]
!beam config select default
```
## Running Functions
Your local notebook server has access to the Beam credentials on your computer, so you can run Beam functions in the notebook cells like you normally would.
You can run GPU accelerated functions, mount storage volumes, and use the full-functionality of Beam from the notebook.
## Launching a Local Notebook Server
You can spin up a local Jupyter notebook server using the `jupyter` CLI.
**If you already have a local Jupyter environment, you can skip this step.**
If you don't have it installed yet, you can do it with `pip`:
```python theme={null}
pip3 install --upgrade pip && pip3 install jupyter
```
Launch the notebook server. The typically opens the server on `localhost:8888`.
```sh theme={null}
jupyter notebook
```
# Remote vs. Local Environment
Source: https://docs.beam.cloud/v2/environment/remote-versus-local
## Differences Between the Remote and Local Environments
Typically, your apps that run on Beam will be using packages that you don't have installed locally.
If your Beam app uses packages that aren't installed locally, you'll need to ensure your Python interpreter doesn't try to load these packages locally.
## Avoiding Import Errors
There are two ways to avoid import errors when using packages that aren't installed locally.
### Import Packages Inline
Importing packages inline is safe because the functions will only be invoked in the remote Beam environment that has these packages installed.
```python theme={null}
from beam import endpoint, Image
@endpoint(image=Image(python_packages=["torch", "pandas", "numpy"]))
def handler():
import torch
import pandas
import numpy
```
### Use `env.is_remote()`
An alternative to using inline imports is to use a special check called `env.is_remote()` to conditionally import packages *only* when inside the remote environment.
```python theme={null}
from beam import env
if env.is_remote():
import torch
import pandas
import numpy
```
This command checks whether the Python script is running remotely on Beam, and will only try to import the packages in its scope if it is.
While it might be tempting to use the `env.is_remote()` flag for other logic in your app, this command should only be used for package imports.
# CPU and RAM
Source: https://docs.beam.cloud/v2/environment/resources
## Configuring CPU and Memory
In addition to choosing a GPU, you can choose the amount of CPU and Memory to allocate:
```python theme={null}
from beam import function
@function(cpu=2, memory="2Gi")
def some_function():
pass
```
*GPU graphics cards* have VRAM and run on *servers* with RAM.
### RAM vs. VRAM
VRAM is the amount of memory available on the GPU device. For example, if you are running inference on a 13B parameter LLM, you'll usually need at least 40Gi of VRAM in order for the model to be loaded onto the GPU.
In contrast, RAM is responsible for the *amount of data* that can be stored and accessed by the CPU on the server. For example, if you try downloading a 20Gi file, you'll need sufficient disk space and RAM.
In the context of LLMs, here are some approximate guidelines for resources to use in your apps:
| LLM Parameters | Recommended CPU | Recommended Memory (RAM) | Recommended GPU |
| -------------- | --------------- | ------------------------ | ---------------- |
| 0-7B | 2 | 32Gi | A10G (24Gi VRAM) |
| 7-14B+ | 4 | 32Gi | H100 (80Gi VRAM) |
### Monitoring Resource Usage
In the web dashboard, you can monitor the amount of CPU, Memory, and GPU memory used for your tasks.
On a deployment, click the `Metrics` button.
On this page, you can see the resource usage over time. The graph will also show the periods when your resource usage exceeded the resource limits set on your app:
# Storing Secrets
Source: https://docs.beam.cloud/v2/environment/secrets
How to store secrets and environment variables in Beam
### Storing Secrets and Environment Variables
Secrets and environment variables can be injected into the containers that run your apps.
You can manage secrets through the CLI:
```bash theme={null}
$ beam secret create AWS_ACCESS_KEY ASIAY34FZKBOKMUTVV7A
=> Created secret with name: 'AWS_ACCESS_KEY'
```
### Using Secrets
Once created, you can access a secret like an environment variable:
```python theme={null}
from beam import function
@function(secrets=["AWS_ACCESS_KEY"])
def handler():
import os
my_secret = os.environ["AWS_ACCESS_KEY"]
print(f"Secret: {my_secret}")
```
### Passing Secrets to `on_start`
If your app used an `on_start` function, secrets can be passed to that function as well.
```python theme={null}
from beam import endpoint
# This has access to secrets passed down in the handler
def load_models():
import os
my_secret = os.environ["AWS_ACCESS_KEY"]
print("The function can read secrets:", my_secret)
@endpoint(
secrets=["AWS_ACCESS_KEY"],
on_start=load_models,
)
def handler(context):
return {}
```
## CLI Commands
### List Secrets
```bash theme={null}
beam secret list
```
```bash theme={null}
$ beam secret list
Name Last Updated Created
──────────────────────────────────────────────────
AWS_KEY 19 hours ago 19 hours ago
AWS_ACCESS_KEY 20 seconds ago 20 seconds ago
AWS_REGION 7 seconds ago 7 seconds ago
3 items
```
### Create a Secret
```bash theme={null}
beam secret create [KEY] [VALUE]
```
```bash theme={null}
$ beam secret create AWS_ACCESS_KEY ASIAY34FZKBOKMUTVV7A
=> Created secret with name: 'AWS_ACCESS_KEY'
```
If your secret contains special characters, you may need to escape them with a
backslash. For example, `a$b` would need to be `a\$b`.
### Show a Secret
```bash theme={null}
beam secret create show [KEY]
```
```bash theme={null}
$ beam secret show AWS_ACCESS_KEY
=> Secret 'AWS_ACCESS_KEY': ASIAY34FZKBOKMUTVV7A
```
### Modify a Secret
```bash theme={null}
beam secret modify [KEY] [VALUE]
```
```bash theme={null}
$ beam secret modify AWS_ACCESS_KEY ASIAY34FZKBOKMUTVV7A
=> Modified secret 'AWS_ACCESS_KEY'
```
### Delete a Secret
```bash theme={null}
beam secret delete [KEY]
```
```bash theme={null}
$ beam secret delete AWS_ACCESS_KEY
=> Deleted secret 'AWS_ACCESS_KEY'
```
# Serverless ComfyUI
Source: https://docs.beam.cloud/v2/examples/comfy-ui
This guide shows how to deploy a ComfyUI server on Beam using [`Pod`](/v2/pod/web-service). We'll set up a server to generate images with [Flux1 Schnell](https://huggingface.co/Comfy-Org/flux1-schnell), but you can easily adapt it to use other models like Stable Diffusion v1.5.
See the code for this example on Github.
## Setting Up the ComfyUI Server
1. **Create the Deployment Script**
Create a file named `app.py` with the following code. This script sets up a Beam `Pod` with ComfyUI, installs dependencies, downloads the Flux1 Schnell model, and launches the server.
```python theme={null}
from beam import Image, Pod
ORG_NAME = "Comfy-Org"
REPO_NAME = "flux1-schnell"
WEIGHTS_FILE = "flux1-schnell-fp8.safetensors"
COMMIT = "f2808ab17fe9ff81dcf89ed0301cf644c281be0a"
image = (
Image()
.add_commands(["apt update && apt install git -y"])
.add_python_packages(
[
"fastapi[standard]==0.115.4",
"comfy-cli==1.3.5",
"huggingface_hub[hf_transfer]==0.26.2",
]
)
.add_commands(
[
"comfy --skip-prompt install --nvidia --version 0.3.10",
"comfy node install was-node-suite-comfyui@1.0.2",
"mkdir -p /root/comfy/ComfyUI/models/checkpoints/",
f"huggingface-cli download {ORG_NAME}/{REPO_NAME} {WEIGHTS_FILE} --cache-dir /comfy-cache",
f"ln -s /comfy-cache/models--{ORG_NAME}--{REPO_NAME}/snapshots/{COMMIT}/{WEIGHTS_FILE} /root/comfy/ComfyUI/models/checkpoints/{WEIGHTS_FILE}",
]
)
)
comfyui_server = Pod(
image=image,
ports=[8000],
cpu=12,
memory="32Gi",
gpu="H100",
entrypoint=["sh", "-c", "comfy launch -- --listen 0.0.0.0 --port 8000"],
)
res = comfyui_server.create()
print("ComfyUI hosted at:", res.url)
```
2. **Start ComfyUI**
```bash theme={null}
python app.py
```
This deploys the ComfyUI server to Beam. After deployment, you'll see a URL (e.g., `https://pod-12345.apps.beam.cloud`) where your server is hosted.
ComfyUI takes a minute or two to start after deploying it for the first
time.
3. **Accessing the Server**
* Open the URL from your terminal in a browser to access the ComfyUI interface.
* Use the web UI to load workflows or generate images.
## Using Different Models
You can swap the Flux1 Schnell model for another, such as Stable Diffusion v1.5, by updating the model variables in `app.py`. Here’s how:
1. **Update the Model Variables**
Define the organization, repository, weights file, and commit ID for your desired model. For example, to use Stable Diffusion v1.5:
```python theme={null}
ORG_NAME = "Comfy-Org"
REPO_NAME = "stable-diffusion-v1-5-archive"
WEIGHTS_FILE = "v1-5-pruned-emaonly-fp16.safetensors"
COMMIT = "21e044065c0b2d82dafd35397a553847c70c0445"
```
2. **Apply to the Image Commands**
The rest of the script uses these variables, so no further changes are needed to the `image` section:
```python theme={null}
image = (
Image()
.add_commands(["apt update && apt install git -y"])
.add_python_packages(
[
"fastapi[standard]==0.115.4",
"comfy-cli==1.3.5",
"huggingface_hub[hf_transfer]==0.26.2",
]
)
.add_commands(
[
"comfy --skip-prompt install --nvidia --version 0.3.10",
"comfy node install was-node-suite-comfyui@1.0.2",
"mkdir -p /root/comfy/ComfyUI/models/checkpoints/",
f"huggingface-cli download {ORG_NAME}/{REPO_NAME} {WEIGHTS_FILE} --cache-dir /comfy-cache",
f"ln -s /comfy-cache/models--{ORG_NAME}--{REPO_NAME}/snapshots/{COMMIT}/{WEIGHTS_FILE} /root/comfy/ComfyUI/models/checkpoints/{WEIGHTS_FILE}",
]
)
)
```
3. **Find Model Details**
To use any other model:
* Visit [Comfy-Org Hugging Face](https://huggingface.co/Comfy-Org) and find your desired model.
* Update `ORG_NAME`, `REPO_NAME`, `WEIGHTS_FILE`, and `COMMIT` with values from the model’s repository. Check the "Files and versions" tab for the weights file and commit hash.
## Running Workflows as APIs
You can also expose ComfyUI workflows as APIs using Beam’s ASGI support. This allows you to programmatically generate images by sending requests with prompts. Below is an example of how to set this up:
1. **Create the API Script**
```python theme={null}
from beam import Image, asgi, Output
image = (
Image()
.add_commands(["apt update && apt install git -y"])
.add_python_packages(
[
"fastapi[standard]==0.115.4",
"comfy-cli",
"huggingface_hub[hf_transfer]==0.26.2",
]
)
.add_commands(
[
"yes | comfy install --nvidia --version 0.3.10",
"comfy node install was-node-suite-comfyui@1.0.2",
"mkdir -p /root/comfy/ComfyUI/models/checkpoints/",
"huggingface-cli download Comfy-Org/flux1-schnell flux1-schnell-fp8.safetensors --cache-dir /comfy-cache",
"ln -s /comfy-cache/models--Comfy-Org--flux1-schnell/snapshots/f2808ab17fe9ff81dcf89ed0301cf644c281be0a/flux1-schnell-fp8.safetensors /root/comfy/ComfyUI/models/checkpoints/flux1-schnell-fp8.safetensors",
]
)
)
def init_models():
import subprocess
cmd = "comfy launch --background"
subprocess.run(cmd, shell=True, check=True)
@asgi(
name="comfy",
image=image,
on_start=init_models,
cpu=8,
memory="32Gi",
gpu="H100",
timeout=-1,
)
def handler():
from fastapi import FastAPI, HTTPException
import subprocess
import json
from pathlib import Path
import uuid
from typing import Dict
app = FastAPI()
# This is where you specify the path to your workflow file.
# Make sure "workflow_api.json" exists in the same directory as this script.
WORKFLOW_FILE = Path(__file__).parent / "workflow_api.json"
OUTPUT_DIR = Path("/root/comfy/ComfyUI/output")
@app.post("/generate")
async def generate(item: Dict):
if not WORKFLOW_FILE.exists():
raise HTTPException(status_code=500, detail="Workflow file not found.")
workflow_data = json.loads(WORKFLOW_FILE.read_text())
workflow_data["6"]["inputs"]["text"] = item["prompt"]
request_id = uuid.uuid4().hex
workflow_data["9"]["inputs"]["filename_prefix"] = request_id
new_workflow_file = Path(f"{request_id}.json")
new_workflow_file.write_text(json.dumps(workflow_data, indent=4))
# Run inference
cmd = f"comfy run --workflow {new_workflow_file} --wait --timeout 1200 --verbose"
subprocess.run(cmd, shell=True, check=True)
image_files = list(OUTPUT_DIR.glob("*"))
# Find the latest image
latest_image = max(
(f for f in image_files if f.suffix.lower() in {".png", ".jpg", ".jpeg"}),
key=lambda f: f.stat().st_mtime,
default=None
)
if not latest_image:
raise HTTPException(status_code=404, detail="No output image found.")
output_file = Output(path=latest_image)
output_file.save()
public_url = output_file.public_url(expires=-1)
print(public_url)
return {"output_url": public_url}
return app
```
2. **Prepare a Workflow File**
* Create a `workflow_api.json` file in the same directory as `app.py`. This file should contain your ComfyUI workflow, which you can export from the ComfyUI web interface.
* You can also store your `workflow_api.json` file in your Volume and use it like `WORKFLOW_FILE = Path("/your_volume/workflow_api.json")`
3. **Deploy the API**
```bash theme={null}
beam deploy api.py:handler
```
4. **Use the API**
Send a POST request to the `/generate` endpoint with a JSON payload containing a `prompt`:
```bash theme={null}
curl -X POST https://12345.apps.beam.cloud/generate \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_BEAM_API' \
-d '{"prompt": "A cat image"}'
```
The response will include a public URL to the generated image:
```json theme={null}
{
"output_url": "https://app.beam.cloud/output/id/9a003889-8345-4969-bdf8-2808eebc1c4b"
}
```
# Chat with DeepSeek R1
Source: https://docs.beam.cloud/v2/examples/deepseek-r1
In this example we are going to use [vLLM](https://github.com/vllm-project/vllm) to host an API for `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` on Beam.
This example requires our multi-GPU feature, which needs to be enabled on your
Beam account. Please send us a message in our [Slack
Community](https://join.slack.com/t/beam-cloud/shared_invite/zt-3enuvj3r7-OeAzVPYvyqQHy9avNrLL0w)
and we'll enable it for you!
See the code for this example on Github.
## Initial Setup
First, clone the vLLM example to your computer.
```sh theme={null}
$ beam example download vllm && cd vllm
```
We'll use our vLLM abstraction to host an OpenAI compatible DeepSeek API on Beam.
From inside the vLLM directory, run the following command to deploy the API:
```sh theme={null}
$ beam deploy models.py:deepseek_r1
=> Building image
=> Using cached image
=> Syncing files
=> Uploading
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 28.7/28.7 kB 0:00:00
=> Files synced
=> Deploying
=> Deployed
=> Invocation details
curl -X POST 'https://deepseek-r1-distill-qwen-7b-54b3408-v4.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-d '{}'
```
This code will deploy a DeepSeek R1 API on Beam, and print out the API URL.
## Running the API
We provide an interactive command line interface to run the API. You'll be prompted to enter the API URL from the deployment output above. If you select stream mode, the API will stream the response to the console.
```sh theme={null}
$ python chat.py
Welcome to the CLI Chat Application!
Type 'quit' to exit the conversation.
Enter the app URL: https://deepseek-r1-distill-qwen-7b-54b3407-v4.app.beam.cloud
Stream mode? (y/n): y
Model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B is ready
```
The first time you run the API, the model weights will be downloaded from
Hugging Face. This may take a few minutes, but will be cached for future runs.
## Interacting with DeepSeek R1
You can now interact with the DeepSeek R1 API. The API will stream the response to the console, and print out the tokens generated and the time taken.
```sh theme={null}
**Question:** What’s the meaning of life?
The question of the meaning of life is a deep and complex one, and different people and cultures may have different perspectives and answers. Some common themes across various philosophies and belief systems include:
1. **Philosophical Views**
- **Existentialism**: The belief that life is inherently meaningless, and individuals must create their own meaning.
- **Mysticism**: The idea that the meaning of life is found within oneself, through spiritual or divine connection.
- **Buddhism**: The concept of "suffering" (dukkha) and the goal of achieving liberation from suffering, often seen as the ultimate meaning of life.
2. **Cultural and Religious Views**
- **Religion**: Many religions—such as Christianity, Islam, and Buddhism—propose a higher power or purpose that gives life meaning.
- **Science and Empiricism**: A materialistic or scientific view often suggests that life’s meaning is derived from personal fulfillment, relationships, or contributing to the greater good.
3. **Personal Perspectives**
- **Purpose**: Many people find meaning in life by aligning their actions with their personal values, goals, and aspirations.
- **Relationships**: Building meaningful connections with others can provide a sense of purpose and fulfillment.
- **Creativity**: Engaging in creative activities, such as art, music, or writing, can bring meaning to life.
Ultimately, the meaning of life is often left open to interpretation, as it can vary greatly depending on individual experiences, beliefs, and contexts. Some find meaning through achieving their personal goals, while others find it in helping others or contributing to the world in some way.
Tokens Generated: 350
Time Taken: 12.66s
Tokens Per Second: 27.64
```
# Fine-tuning Gemma with LoRA
Source: https://docs.beam.cloud/v2/examples/gemma-fine-tune
In this example we are fine-tuning [Gemma 2B](https://huggingface.co/google/gemma-2b), an open source model from Google.
See the code for this example on Github.
## Fine-Tuning
In this example, we are using Low-Rank Adaption (LoRA) to fine-tune the [Gemma language model](https://blog.google/technology/developers/gemma-open-models/) using the [Open Assistant dataset](https://huggingface.co/datasets/OpenAssistant/oasst1).
The goal is to use this dataset to improve Gemma's ability to engage in helpful conversations, making it more suitable for assistant-like apps.
### LoRA
You can read more about LoRA [here](https://arxiv.org/abs/2106.09685). However, let's briefly discuss what exactly it does and why we chose to use it here.
At a high level, LoRA introduces a new small set of weights to the model that we will be training. By limiting our training to these additional weights, we can fine-tune the model much quicker. Additionally, since we are not touching the original weights, the model's initial knowledge base should intact.
### Initial Setup
In this example, we are using an [H100](https://www.nvidia.com/en-us/data-center/a100/) GPU. We are using mixed precision (FP16) to optimize for speed and memory usage. In this example, we are only training for one epoch. In practice, you can probably train longer and continue to see improved results.
No surprise here, but we are getting our compute via Beam. We are using the `function` decorator so that we can run our fine-tuning application as if it were on our local machine.
```python theme={null}
from beam import Volume, Image, function
# The mount path is the location on the beam volume with the model weights
MOUNT_PATH = "./gemma-ft"
@function(
volumes=[Volume(name="gemma-ft", mount_path=MOUNT_PATH)],
image=Image(
python_packages=["transformers", "torch", "datasets", "peft", "bitsandbytes"]
),
gpu="H100",
cpu=4,
)
```
### Mounting Storage Volumes
We're using Beam's persistent [storage volumes](/v2/data/volume) to store model weights and training data. This allows us to download the necessary files directly to the volume, streamlining the setup process.
Here's a simple script to handle the downloads:
```python theme={null}
from beam import function, Volume, Image, env
if env.is_remote():
from huggingface_hub import snapshot_download
from datasets import load_dataset
VOLUME_PATH = "./gemma-ft"
@function(
image=Image(python_version="python3.11")
.add_python_packages(
[
"huggingface_hub",
"datasets"
"huggingface_hub[hf-transfer]",
]
)
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"),
memory="32Gi",
cpu=4,
secrets=["HF_TOKEN"],
volumes=[Volume(name="gemma-ft", mount_path=VOLUME_PATH)],
)
def upload():
snapshot_download(
repo_id="google/gemma-2b",
local_dir=f"{VOLUME_PATH}/weights"
)
dataset = load_dataset("OpenAssistant/oasst1", split="train")
dataset.save_to_disk(f"{VOLUME_PATH}/data")
print("Files uploaded successfully")
if __name__ == "__main__":
upload()
```
This script will download the Gemma 2B model weights and the Open Assistant dataset directly to your Beam volume.
First, let's create our volume:
```bash theme={null}
beam volume create gemma-ft
```
Next, we can run our script to populate it with the model and dataset:
```bash theme={null}
python upload.py
```
Once those uploads are complete, we can move on to training.
### Start Training
We can start our training by running `python finetune.py`. After beginning training, you should see something like the following in your terminal:
```bash theme={null}
=> Building image
=> Syncing files
...
=> Running function:
Loading checkpoint shards: 0%| | 0/3 [00:00, ?it/s]
...
Generating train split: 12947 examples [00:00, 114393.80 examples/s]
...
Map: 93%|#########2| 12000/12947 [00:13<00:01, 921.12 examples/s]
...
1%| | 6/809 [00:08<16:35, 1.24s/it]
...
{'loss': 1.617, 'grad_norm': 0.4805833399295807, 'learning_rate': 0.00019752781211372064, 'epoch': 0.01}
...
```
Once it is finished, we can use the beam CLI to look at the resulting files. You should see something like this:
```bash theme={null}
$ beam ls gemma-ft/gemma-2b-finetuned
Name Size Modified Time IsDir
──────────────────────────────────────────────────────────────────────────────────
gemma-2b-finetuned/README.md 4.97 KiB Aug 10 2024 No
gemma-2b-finetuned/adapter_config.json 644.00 B Aug 10 2024 No
gemma-2b-finetuned/adapter_model.safetensors 12.20 MiB Aug 10 2024 No
gemma-2b-finetuned/checkpoint-700 36.70 MiB Aug 01 2024 Yes
gemma-2b-finetuned/checkpoint-800 36.70 MiB Aug 01 2024 Yes
gemma-2b-finetuned/checkpoint-809 36.70 MiB Aug 01 2024 Yes
gemma-2b-finetuned/special_tokens_map.json 555.00 B Aug 10 2024 No
gemma-2b-finetuned/tokenizer.json 16.71 MiB Aug 10 2024 No
gemma-2b-finetuned/tokenizer_config.json 45.21 KiB Aug 10 2024 No
9 items | 139.06 MiB used
```
## Inference
In `inference.py`, we are loading up our model with the additional fine-tuned weights and setting up an endpoint to send it requests.
Here, we make use of the Beam's `on_start` functionality so that we only load the model when the container starts instead of every time we receive a request. Let's explore the `endpoint` decorator below.
```python theme={null}
from beam import Volume, Image, endpoint
# The mount path is the location on the beam volume with the model weights
MOUNT_PATH = "./gemma-ft"
@endpoint(
name="gemma-inference",
on_start=load_finetuned_model,
volumes=[Volume(name="gemma-ft", mount_path=MOUNT_PATH)],
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_version="python3.9",
python_packages=["transformers==4.42.0", "torch", "peft"],
),
)
```
Once again, we are mounting our storage volume named "gemma-ft". Since we have already run training, this volume will now contain our fine-tuned weights alongside the base weights we got from Hugging Face.
### Choosing a GPU For Inference
Now that we've trained the model, we can run it on a machine with a weaker GPU.
Training requires more memory than inference because it must store gradients and optimizer states for all parameters, in addition to activations, whereas inference only needs to maintain the current layer's activations during a forward pass. Be sure to keep this in mind as you work on your own applications.
You can use the [Beam dashboard](https://platform.beam.cloud/) to get a sense of GPU utilization in real-time. With this information, you can make a more informed choice about how much compute you require. For this example, we use a T4 GPU. It has 16GB of VRAM and is a good choice for inference with a model this small.
### Using Signals to Reload Model Weights Automatically
We use a [`Signal`](/v2/topics/signal) abstraction to fire an event to the inference app when the model has finished training.
This allows us to communicate between apps on Beam. In this example, we have it setup to re-run our on-start method when a signal is received. This way, if we re-train our model, we can load the newest weights without restarting the container.
```python theme={null}
# Register a signal
s = experimental.Signal(
name="reload-model",
handler=load_finetuned_model,
)
```
### Deploying The Endpoint
Let's deploy our endpoint! We can do this with the `beam` CLI.
```
beam deploy inference.py:predict --name gemma-ft
```
The output will look something like this:
```bash theme={null}
=> Building image
=> Syncing files
=> Deploying
=> Deployed
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/gemma-ft/v2' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{}'
```
When calling our inference endpoint, we'll need to include a prompt. For example, we can call the deployed endpoint with `-d '{"prompt": "hi"}`. The response we get back will be in the following format:
```bash theme={null}
{"text":"Hello! How can I help you today?<|im_end|>"}
```
Note that the returned response includes the stop tokens `<|im_end|>`. You could strip this token in the endpoint logic if you would like, but it is worth keeping around if you will be appending this response to a longer running conversation.
# Hugging Face Models
Source: https://docs.beam.cloud/v2/examples/inference
A beginner's guide to running highly performant inference workloads on Beam.
This tutorial introduces several key concepts:
* Creating a container image
* Running a custom ML model
* Developing your app using Beam's live reloading workflow
* Pre-loading models and caching them in storage volumes
* Autoscaling and concurrency
See the code for this example on Github.
## Setup your app
You'll start by adding an `endpoint` decorator with an [`Image`](/v2/reference/py-sdk#image)
* `Endpoint` is the wrapper for your inference function.
* Inside the `endpoint` is an `Image`. The `Image` defines the image your container will run on.
If you'd like to make further customizations to your image -- such as adding
shell commands -- you can do so using the `commands` argument. [Read more
about custom images.](/v2/environment/custom-images)
```python theme={null}
from beam import Image, endpoint
@endpoint(
name="inference-quickstart",
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(python_version="python3.9")
.add_python_packages(["transformers", "torch", "huggingface_hub[hf-transfer]"])
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"),
)
```
## Remote vs. Local Environment
Typically, your apps that run on Beam will be using packages that you don't have installed locally.
Some of our Python packages aren't installed locally -- like Transformers -- so we'll use a special flag called `env.is_remote()` to conditionally import packages only when inside the remote cloud environment.
```python theme={null}
from beam import env
if env.is_remote():
import transformers
import torch
```
This command checks whether the Python script is running remotely on Beam, and will only try to import the packages in its scope if it is.
## Running a custom ML model
We'll create a new function to run inference on `facebook/opt-125m` via Huggingface Transformers.
Since we'll deploy this as a REST API, we add an `@endpoint` decorator above the inference function:
```python theme={null}
from beam import Image, endpoint, env
if env.is_remote():
from transformers import AutoTokenizer, OPTForCausalLM
import torch
@endpoint(
name="inference-quickstart",
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(python_version="python3.9")
.add_python_packages(["transformers", "torch", "huggingface_hub[hf-transfer]"])
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"),
)
def predict(prompt):
model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
# Generate
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=30)
result = tokenizer.batch_decode(
generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)[0]
print(result)
return {"prediction": result}
```
## Developing your app on Beam
Beam includes a live-reloading feature that allows you to run your code on the same environment you'll be running in production.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
In your shell, run `beam serve app.py:predict`. This will:
1. Spin up a container
2. Run it on a GPU
3. Print a cURL request to invoke the API
4. Stream the logs to your shell
You should keep this terminal window open while developing.
```sh theme={null}
(.venv) user@MacBook demo % beam serve app.py:predict
=> Building image
=> Using cached image
=> Syncing files
=> Invocation details
curl -X POST \
'https://app.beam.cloud/endpoint/id/bc55068e-b648-4dbc-9cb7-183e1789e011' \
-H 'Accept: */*' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
=> Watching ./inference-app for changes...
```
Now, head back to your IDE, and change a line of code. Hit save.
If you look closely at the shell running `beam serve`, you'll notice the server reloading with your code changes.
You'll use this workflow anytime you're developing an app on Beam. Trust us -- it makes the development process uniquely fast and painless.
## Performance Optimizations
If you called the API via the cURL command, you'll notice that your model was downloaded each time you invoked the API.
In order to improve performance, we'll setup a function to pre-load your models and store them on disk between API calls.
### Pre-loading
Beam includes an `on_start` method, which you can pass to your function decorators. `on_start` is run exactly once when the container first starts:
The value of the `on_start` function can be retrieved from `context.on_start_value`:
```python theme={null}
from beam import Image, endpoint, env
if env.is_remote():
from transformers import AutoTokenizer, OPTForCausalLM
import torch
def download_models():
from transformers import AutoTokenizer, OPTForCausalLM
model = OPTForCausalLM.from_pretrained("facebook/opt-125m")
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m")
return model, tokenizer
@endpoint(
name="inference-quickstart",
on_start=download_models,
image=Image(
python_version="python3.9",
python_packages=[
"transformers",
"torch",
],
),
)
def predict(context):
# Retrieve cached model from on_start function
model, tokenizer = context.on_start_value
# Do something with the model and tokenizer...
```
### Cache in a storage volume
The `on_start` method saves us from having to download the model multiple times, but we can avoid downloading the model entirely by caching it in a [Storage Volume](/v2/data/volume):
Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
```python theme={null}
from beam import Image, endpoint, Volume
# Model weights will be cached in this folder
CACHE_PATH = "./weights"
# This function runs once when the container first starts
def download_models():
from transformers import AutoTokenizer, OPTForCausalLM
model = OPTForCausalLM.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
return model, tokenizer
@endpoint(
name="inference-quickstart",
on_start=download_models,
volumes=[Volume(name="weights", mount_path=CACHE_PATH)],
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_version="python3.9",
python_packages=[
"transformers",
"torch",
],
),
)
```
Now, these models can be automatically downloaded to the volume by using the `cache_dir` argument in transformers:
```python theme={null}
model = OPTForCausalLM.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
```
These volumes are mounted directly to the container running your app, so you can read and write them to disk like any normal file.
## Configure Autoscaling (Optional)
You can control your autoscaling behavior with `QueueDepthAutoscaler`.
`QueueDepthAutoscaler` takes two parameters:
* `max_containers`
* `tasks_per_container`
```python theme={null}
from beam import endpoint, QueueDepthAutoscaler
@endpoint(autoscaler=QueueDepthAutoscaler(max_containers=5, tasks_per_container=1))
def function():
pass
```
## Deployment
With these performance optimizations in place, it's time to deploy your API to create a persistent endpoint. In your shell, run this command to deploy your app:
```sh theme={null}
beam deploy app.py:predict
```
## Monitoring Logs and Task Status
In the dashboard, you can view the status of the task and the logs from the container:
## Summary
You've successfully created a highly performant serverless API for your ML model!
# LLaMA 3.1 8B
Source: https://docs.beam.cloud/v2/examples/llama3
This guide demonstrates how to run the Meta Llama 3.1 8B Instruct model on Beam.
You need an access token from Huggingface to run this example. You can sign up
for Huggingface and access your token on [the settings
page](https://huggingface.co/settings/tokens), and store it in the [Beam
Secrets Manager](/v2/environment/secrets).
See the code for this example on Github.
## Prerequisites
1. **Request Access**: Request access to the model [here](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
2. **Retrieve HF Token**: Get your Huggingface token from [this page](https://huggingface.co/settings/tokens).
3. **Save HF Token on Beam**: Use the command `beam secret create HF_TOKEN [TOKEN]` to save your token.
## Setup Remote Environment
The first thing we'll do is set up an `Image` with the Python packages required for this app.
We use the `if env.is_remote()` flag to conditionally import the Python packages only when the script is running remotely on Beam.
```python theme={null}
from beam import endpoint, Image, Volume, env
# This ensures that these packages are only loaded when the script is running remotely on Beam
if env.is_remote():
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Model parameters
MODEL_NAME = "meta-llama/Meta-Llama-3.1-8B-Instruct"
MAX_LENGTH = 512
TEMPERATURE = 0.7
TOP_P = 0.9
TOP_K = 50
REPETITION_PENALTY = 1.05
NO_REPEAT_NGRAM_SIZE = 2
DO_SAMPLE = True
NUM_BEAMS = 1
EARLY_STOPPING = True
BEAM_VOLUME_PATH = "./cached_models"
# This runs once when the container first starts
def load_models():
tokenizer = AutoTokenizer.from_pretrained(
MODEL_NAME,
cache_dir=BEAM_VOLUME_PATH,
padding_side='left'
)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
device_map="auto",
torch_dtype=torch.float16,
cache_dir=BEAM_VOLUME_PATH,
use_cache=True,
low_cpu_mem_usage=True
)
model.eval()
return model, tokenizer
```
## Inference Function
Here’s the inference function. By adding the `@endpoint` decorator to it, we can expose this function as a RESTful API.
Note the `secrets` argument which ensures the Huggingface token is loaded into the environment.
```python theme={null}
@endpoint(
secrets=["HF_TOKEN"],
on_start=load_models,
name="meta-llama-3.1-8b-instruct",
cpu=2,
memory="16Gi",
gpu="A10G",
image=Image(python_version="python3.9")
.add_python_packages(
[
"torch",
"transformers",
"accelerate",
"huggingface_hub[hf-transfer]",
]
)
.with_envs({
"HF_HUB_ENABLE_HF_TRANSFER": "1",
"TOKENIZERS_PARALLELISM": "false",
"CUDA_VISIBLE_DEVICES": "0",
}),
volumes=[
Volume(
name="cached_models",
mount_path=BEAM_VOLUME_PATH,
)
],
)
def generate_text(context, **inputs):
# Retrieve model and tokenizer from on_start
model, tokenizer = context.on_start_value
# Inputs passed to API
messages = inputs.pop("messages", None)
if not messages:
return {"error": "Please provide messages for text generation."}
generate_args = {
"max_new_tokens": inputs.get("max_tokens", MAX_LENGTH),
"temperature": inputs.get("temperature", TEMPERATURE),
"top_p": inputs.get("top_p", TOP_P),
"top_k": inputs.get("top_k", TOP_K),
"repetition_penalty": inputs.get("repetition_penalty", REPETITION_PENALTY),
"no_repeat_ngram_size": inputs.get("no_repeat_ngram_size", NO_REPEAT_NGRAM_SIZE),
"num_beams": inputs.get("num_beams", NUM_BEAMS),
"early_stopping": inputs.get("early_stopping", EARLY_STOPPING),
"do_sample": inputs.get("do_sample", DO_SAMPLE),
"use_cache": True,
"eos_token_id": tokenizer.eos_token_id,
"pad_token_id": tokenizer.pad_token_id,
}
model_inputs_str = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
# Tokenize inputs with truncation
tokenized_inputs = tokenizer(
model_inputs_str,
return_tensors="pt",
padding=True,
truncation=True,
max_length=2048
)
input_ids = tokenized_inputs["input_ids"].to("cuda")
attention_mask = tokenized_inputs["attention_mask"].to("cuda")
input_ids_length = input_ids.shape[-1]
with torch.no_grad():
outputs = model.generate(
input_ids=input_ids, attention_mask=attention_mask, **generate_args
)
new_tokens = outputs[0][input_ids_length:]
output_text = tokenizer.decode(new_tokens, skip_special_tokens=True)
return {"output": output_text}
```
## Deploy to Production
The following command deploys our code to Beam, and hosts it as a REST API:
```sh theme={null}
beam deploy app.py:generate_text
```
## Invoking the API
Once the API is running, you can invoke it using the following cURL command:
```sh theme={null}
curl -X POST 'https://app.beam.cloud/endpoint/id/[ENDPOINT-ID]' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [AUTH-TOKEN]' \
-d '{
"messages": [
{"role": "system", "content": "You are a yoda chatbot who always responds in yoda speak!"},
{"role": "user", "content": "Who are you?"}
]
}'
```
Replace `[ENDPOINT-ID]` with your actual endpoint ID and `[AUTH-TOKEN]` with your authentication token. You'll see a response from the API, like this:
```json theme={null}
{
"output": "A Jedi I am. In the ways of the Force, trained I have been."
}
```
## Summary
You've successfully set up a highly performant serverless API for generating text using the Meta Llama 3.1 8B Instruct model on Beam.
# Stable Diffusion with LoRAs
Source: https://docs.beam.cloud/v2/examples/lora
## Introduction
This guide demonstrates how to run Stable Diffusion with custom LoRAs.
See the code for this example on Github.
## Setup Remote Environment
The first thing we'll do is setup an `Image` with the Python packages required for this app.
Because this script will run remotely, we need to make sure our local Python interpreter doesn't try to install these packages locally.
We'll use the `if env.is_remote()` flag to conditionally import the Python packages only when the script is running remotely on Beam.
```python app.py theme={null}
from beam import Image, Volume, endpoint, Output, env
# This check ensures that the packages are only imported when running this script remotely on Beam
if env.is_remote():
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import os
import uuid
# The container image for the remote runtime
image = (
Image(python_version="python3.9")
.add_python_packages(
[
"diffusers[torch]>=0.10",
"transformers",
"huggingface_hub",
"huggingface_hub[hf-transfer]",
"torch",
"peft",
"pillow",
"accelerate",
"safetensors",
"xformers",
]
)
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1")
)
```
## Pre-Load Models
Next, we'll set up a function to run once when the container first starts up. This allows us to cache the model in memory between requests and ensures we don't unnecessarily re-load the model.
```python app.py theme={null}
CACHE_PATH = "./models"
MODEL_URL = "https://huggingface.co/martyn/sdxl-turbo-mario-merge-top-rated/blob/main/topRatedTurboxlLCM_v10.safetensors"
LORA_WEIGHT_NAME = "raw.safetensors"
LORA_REPO = "ntc-ai/SDXL-LoRA-slider.raw"
# This function once when the container first boots
def load_models():
hf_hub_download(repo_id=LORA_REPO, filename=LORA_WEIGHT_NAME, cache_dir=CACHE_PATH)
pipe = StableDiffusionXLPipeline.from_single_file(
MODEL_URL,
torch_dtype=torch.float16,
safety_checker=None,
cache_dir=CACHE_PATH,
).to("cuda")
return pipe
```
## Inference Function
Here's our inference function. By adding the `@endpoint` decorator to it, we can expose this function as a RESTful API.
There are a few things to take note of:
* an `image` with the Python requirements we defined above
* an `on_start` function that runs once when the container first boots. The value from `on_start` (in this case, our `pipe` handler) is available in the inference function using the `context` value: `pipe = context.on_start_value`
* `volumes`, which are used to store the downloaded LoRAs and model weights on Beam
* `keep_warm_seconds`, which tells Beam how long to keep the container running between requests
```python app.py theme={null}
@endpoint(
image=image,
on_start=load_models,
keep_warm_seconds=60,
cpu=2,
memory="32Gi",
gpu="A10G",
volumes=[Volume(name="models", mount_path=CACHE_PATH)],
)
def generate(context, prompt="medieval rich kingpin sitting in a tavern, raw"):
# Retrieve pre-loaded model from loader
pipe = context.on_start_value
pipe.enable_sequential_cpu_offload()
pipe.enable_attention_slicing("max")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
# Use a unique adapter name
adapter_name = f"raw_{uuid.uuid4().hex}"
# Load and activate the LoRA from a local path
pipe.load_lora_weights(
LORA_REPO, weight_name=LORA_WEIGHT_NAME, adapter_name=adapter_name
)
# Activate the LoRA
pipe.set_adapters(["raw"], adapter_weights=[2.0])
# Generate image
image = pipe(
prompt,
negative_prompt="nsfw",
width=512,
height=512,
guidance_scale=2,
num_inference_steps=10,
).images[0]
# Save image file
output = Output.from_pil_image(image).save()
# Retrieve pre-signed URL for output file
url = output.public_url()
return {"image": url}
```
## Saving Image Outputs
Notice the `Output.from_pil_image(image).save()` method below.
This will generate a sharable URL to access the images created from the inference function:
```python app.py theme={null}
from beam import Output
# Save image file
output = Output.from_pil_image(image).save()
# Retrieve pre-signed URL for output file
url = output.public_url()
```
## Create a Preview Deployment
You can spin up a temporary REST API to test this endpoint on Beam, using the `beam serve` command:
```bash theme={null}
beam serve app.py:generate
```
When you run this command, Beam will spin up a GPU-backed container to test your code on the cloud:
```bash theme={null}
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/id/bcaa198b-2556-4c8c-9429-46d3202dbc95' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
=> Watching '/Users/beta9/beam/examples/07_image_generation' for changes...
```
You can paste the `curl` command in your shell to call the API.
The API will return a pre-signed URL with the image generated:
```bash theme={null}
{"image":"https://app.beam.cloud/output/id/09cb70bf-b5e8-4679-9da2-71611a1c3b57"}
```
## Deploy to Production
The `beam serve` command is used for temporary APIs. When you're ready to move to production, deploy a persistent endpoint:
```bash theme={null}
beam deploy app.py:generate
```
# Text-to-Video with Mochi
Source: https://docs.beam.cloud/v2/examples/mochi-1
This guide demonstrates how to run the Mochi-1 text-to-video model on Beam. Mochi-1 is a powerful model for generating high-quality videos based on text prompts.
See the code for this example on Github.
## Introduction
Mochi-1 is a state-of-the-art text-to-video model. This guide will help you deploy and use the model as a serverless API on Beam.
## Upload Model Weights
Before using the Mochi-1 model, you need to upload its weights to Beam. This is handled by the `upload.py` script:
```python theme={null}
from beam import function, Volume, Image, env
if env.is_remote():
from huggingface_hub import snapshot_download
VOLUME_PATH = "./mochi-1-preview"
@function(
image=Image(
python_packages=["huggingface_hub", "huggingface_hub[hf_xet]"]
),
memory="32Gi",
cpu=4,
secrets=["HF_TOKEN"],
volumes=[Volume(name="mochi-1-preview", mount_path=VOLUME_PATH)],
)
def upload():
snapshot_download(
repo_id="genmo/mochi-1-preview",
local_dir=f"{VOLUME_PATH}/weights"
)
print("Files uploaded successfully")
if __name__ == "__main__":
upload()
```
### Steps to Run the Script
Run the script locally to upload the weights:
```bash theme={null}
python upload.py
```
Once the weights are uploaded, the `generate_video` endpoint can access them for inference.
## Setup Remote Environment
The model and its dependencies are defined in the `mochi_image`. Here’s how it’s configured:
```python theme={null}
from beam import endpoint, env, Volume, Image, Output
VOLUME_PATH = "./mochi-1-preview"
if env.is_remote():
import torch
from diffusers import MochiPipeline
from diffusers.utils import export_to_video
import uuid
def load_models():
pipe = MochiPipeline.from_pretrained(
f"{VOLUME_PATH}/weights", variant="bf16", torch_dtype=torch.bfloat16)
return pipe
```
The `mochi_image` includes all necessary Python packages and system dependencies:
```python theme={null}
mochi_image = (
Image(
python_version="python3.11",
python_packages=["torch", "transformers", "accelerate",
"sentencepiece", "imageio-ffmpeg", "imageio", "ninja"]
)
.add_commands(["apt update && apt install git -y", "pip install git+https://github.com/huggingface/diffusers.git"])
)
```
## Inference Function
The `generate_video` function processes text prompts and generates a video:
```python theme={null}
@endpoint(
name="mochi-1-preview",
on_start=load_models,
cpu=4,
memory="32Gi",
gpu="A10G",
gpu_count=2,
image=mochi_image,
volumes=[Volume(name="mochi-1-preview", mount_path=VOLUME_PATH)],
timeout=-1
)
def generate_video(context, **inputs):
pipe = context.on_start_value
prompt = inputs.pop("prompt", None)
if not prompt:
return {"error": "Please provide a prompt"}
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
frames = pipe(prompt, num_frames=40).frames[0]
file_name = f"/tmp/mochi_out_{uuid.uuid4()}.mp4"
export_to_video(frames, file_name, fps=15)
output_file = Output(path=file_name)
output_file.save()
public_url = output_file.public_url(expires=-1)
print(public_url)
return {"output_url": public_url}
```
## Deployment
Deploy the API to Beam:
```bash theme={null}
beam deploy app.py:generate_video
```
## Invoking the API
To invoke the API, send a POST request with the following payload:
```json theme={null}
{
"prompt": "The camera follows behind a rugged green Jeep with a black snorkel as it speeds along a narrow dirt trail cutting through a dense jungle. Thick vines hang from towering trees with sprawling canopies, their leaves forming a vibrant green tunnel above the vehicle. Mud splashes up from the Jeep’s tires as it powers through a shallow stream crossing the path. Sunlight filters through gaps in the trees, casting dappled golden light over the scene. The dirt trail twists sharply into the distance, overgrown with wild ferns and tropical plants. The vehicle is seen from the rear, leaning into the curve as it maneuvers through the untamed terrain, emphasizing the adventure of the rugged journey. The surrounding jungle is alive with texture and color, with distant mountains barely visible through the mist and an overcast sky heavy with the promise of rain."
}
```
Here’s an example of a cURL request:
```bash theme={null}
curl -X POST 'https://app.beam.cloud/endpoint/id/[ENDPOINT-ID]' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [AUTH-TOKEN]' \
-d '{
"prompt": "Your text prompt for video generation."
}'
```
## Example Output
The API will return a generated video URL. Here’s an example:
```json theme={null}
{
"output_url": "https://app.beam.cloud/output/id/dc443a80-7fcc-42bc-928b-4605e41b0825"
}
```
## Example Video
Here is an example video generated by the Mochi-1 model:
## Summary
You’ve successfully deployed and tested a Mochi-1 text-to-video generation API using Beam.
# Examples
Source: https://docs.beam.cloud/v2/examples/overview
End-to-end examples for running real workloads on Beam
Browse complete, runnable examples grouped by what you're building. Each one shows a real workload, from container image to deployment, that you can copy and adapt.
New to Beam? Start with the [Quickstart](/v2/getting-started/quickstart) and [Core Concepts](/v2/getting-started/core-concepts) first.
## Large Language Models
Serve and run inference with open and custom LLMs.
A beginner's guide to running performant inference workloads on Beam.
Serve Meta's LLaMA 3.1 8B model on a GPU.
Host an OpenAI-compatible inference server with vLLM.
Run the DeepSeek R1 reasoning model.
Serve Qwen2.5-7B with the SGLang runtime.
## Image and Video
Generate and transform images and video on GPUs.
Host ComfyUI for image generation workflows.
Generate video from text with the Mochi model.
Run Stable Diffusion with custom LoRA adapters.
## Audio and Transcription
Transcribe and synthesize speech.
Transcribe audio with Faster Whisper.
Synthesize speech with Parler TTS.
Generate speech with the Zonos model.
## Web Apps
Host interactive apps and scrape the web.
Build a web scraper that runs on Beam functions.
Host a Streamlit app behind a public URL.
## Agents
Build and coordinate AI agents.
Build stateful agents with concurrency built in.
A research assistant that synchronizes state across tasks.
## Fine-Tuning
Fine-tune open models on GPUs.
Fine-tune Google's Gemma model with LoRA.
Fast fine-tuning of Llama 3.1 8B with Unsloth.
# Parler TTS
Source: https://docs.beam.cloud/v2/examples/parler-tts
This guide demonstrates how to set up and run the Parler TTS text-to-speech model as a serverless API on Beam.
See the code for this example on Github.
## Introduction
Parler-TTS Mini is a lightweight text-to-speech (TTS) model, trained on 45K hours of audio data, that can generate high-quality, natural sounding speech with features that can be controlled using a simple text prompt. This guide explains how to deploy and use it on Beam.
## Deployment Setup
Define the model and its dependencies using the `parlertts_image`:
```python theme={null}
from beam import endpoint, env, Image, Output
if env.is_remote():
from parler_tts import ParlerTTSForConditionalGeneration
from transformers import AutoTokenizer
import soundfile as sf
import uuid
def load_models():
model = ParlerTTSForConditionalGeneration.from_pretrained(
"parler-tts/parler-tts-mini-v1").to("cuda:0")
tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-tts-mini-v1")
return model, tokenizer
parlertts_image = (
Image(
python_version="python3.10",
python_packages=[
"torch",
"transformers",
"soundfile",
"Pillow",
"wheel",
"packaging",
"ninja",
"huggingface_hub[hf-transfer]",
],
)
.add_commands(
[
"apt update && apt install git -y",
"pip install git+https://github.com/huggingface/parler-tts.git",
]
)
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1")
)
```
## Inference Function
The `generate_speech` function processes text and generates speech audio:
```python theme={null}
@endpoint(
name="parler-tts",
on_start=load_models,
cpu=2,
memory="32Gi",
gpu="A10G",
gpu_count=2,
image=parlertts_image
)
def generate_speech(context, **inputs):
model, tokenizer = context.on_start_value
prompt = inputs.pop("prompt", None)
description = inputs.pop("description", None)
if not prompt or not description:
return {"error": "Please provide a prompt and description"}
device = "cuda:0"
input_ids = tokenizer(
description, return_tensors="pt").input_ids.to(device)
prompt_input_ids = tokenizer(
prompt, return_tensors="pt").input_ids.to(device)
generation = model.generate(
input_ids=input_ids, prompt_input_ids=prompt_input_ids)
audio_arr = generation.cpu().numpy().squeeze()
file_name = f"/tmp/parler_tts_out_{uuid.uuid4()}.wav"
sf.write(file_name, audio_arr, model.config.sampling_rate)
output_file = Output(path=file_name)
output_file.save()
public_url = output_file.public_url(expires=1200000000)
print(public_url)
return {"output_url": public_url}
```
### Deployment
Deploy the API to Beam:
```bash theme={null}
beam deploy app.py:generate_speech
```
## API Usage
Send a `POST` request with the following JSON payload:
```json theme={null}
{
"prompt": "Your text to convert to speech",
"description": "Description of the voice/style"
}
```
### Example Request
```json theme={null}
{
"prompt": "On Beam run AI workloads anywhere with zero complexity. One line of Python, global GPUs, full control!!!",
"description": "A female speaker delivers a slightly expressive and animated speech with a moderate speed and pitch. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
}
```
### Example Response
A generated audio file will be returned:
```json theme={null}
{
"output_url": "https://app.beam.cloud/output/id/dc443a80-7fcc-42bc-928b-4605e41b0825"
}
```
## Audio Example
Here’s an example of the generated audio output:
## Summary
You’ve successfully deployed a Parler TTS text-to-speech API using Beam.
# Qwen2.5-7B with SGLang
Source: https://docs.beam.cloud/v2/examples/sglang
This guide demonstrates how to deploy a high-performance language model server using [SGLang](https://github.com/sgl-project/sglang) with the [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) model from Qwen. The server runs on Beam, providing an OpenAI-compatible API endpoint for text generation.
See the full code for this example on GitHub.
## Overview
SGLang is a fast inference framework for large language models, optimized for low latency and high throughput. We use it to serve the Qwen2.5-7B-Instruct model, a 7-billion-parameter instruction-tuned model, on an H100 GPU via Beam’s `Pod` abstraction.
A test script demonstrates interaction with the server using the OpenAI Python client.
## Setup
First, create a file named `app.py`:
```python theme={null}
from beam import Image, Pod
# Image of SGLang and dependencies
image = (
Image(python_version="python3.11")
.add_python_packages([
"transformers==4.47.1",
"numpy<2",
"fastapi[standard]==0.115.4",
"pydantic==2.9.2",
"starlette==0.41.2",
"torch==2.4.0",
])
.add_commands([
'pip install "sglang[all]==0.4.1" --find-links https://flashinfer.ai/whl/cu124/torch2.4/flashinfer/'
])
)
# Define the SGLang server Beam Pod
sglang_server = Pod(
image=image,
ports=[8080],
cpu=12,
memory="32Gi",
gpu="H100",
secrets=["HF_TOKEN"],
entrypoint=[
"python",
"-m",
"sglang.launch_server",
"--model-path",
"Qwen/Qwen2.5-7B-Instruct",
"--port",
"8080",
"--host",
"0.0.0.0",
],
)
# Deploy the pod
res = sglang_server.create()
print("SGLang server hosted at:", res.url)
```
## Deployment
Deploy the server using the Beam CLI:
```bash theme={null}
python app.py
```
Here's the expected output, with the URL of the deployed app:
```bash theme={null}
=> Files synced
=> Creating container
=> Container created successfully ===> pod-b451fa2f-3c4a-47e0-bb37-333434fds22b66-add2d058
=> This container will timeout after 600 seconds.
=> Invocation details
curl -X POST 'https://b451fa2f-3c4a-47e0-bb37-333434fds22b66-8080.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-d '{}'
SGLang server hosted at: https://b451fa2f-3c4a-47e0-bb37-333434fds22b66-8080.app.beam.cloud
```
## API Usage
The SGLang server exposes an OpenAI-compatible API at `/v1`. You can interact with it using the OpenAI Python client or any HTTP client.
### Test Script
Create a file named `test.py` to test the deployed server:
```python theme={null}
import openai
# Initialize OpenAI client with Beam endpoint and Beam API key
client = openai.Client(
base_url="https://35b937b9-1a70-4343-89d9-1125b1290e4d-8080.app.beam.cloud/v1",
api_key="BEAM_API_KEY", # Replace with your actual Beam API key
)
# Send a chat completion request
response = client.chat.completions.create(
model="Qwen/Qwen2.5-7B-Instruct",
messages=[
{"role": "user", "content": "List 3 countries and their capitals."},
],
temperature=0,
max_tokens=64,
)
# Print the response
print(response.choices[0].message.content)
```
#### Running the Test
1. Replace `BEAM_API_KEY` with your actual Beam API key.
2. Update the `base_url` with your deployed pod’s URL.
3. Install the OpenAI client locally:
```bash theme={null}
pip install openai
```
4. Run the script:
```bash theme={null}
python test.py
```
Expected output for the prompt, *"List 3 countries and their capitals"*.
```
1. France - Paris
2. Japan - Tokyo
3. Brazil - Brasília
```
# Running Streamlit Apps
Source: https://docs.beam.cloud/v2/examples/streamlit
You can easily deploy Streamlit apps on Beam. In this guide, we'll show you how to deploy a simple Streamlit app that visualizes a simple dataset.
See the code for this example on Github.
### App Structure
There are two components to deploying a Streamlit app on Beam:
1. An `start_server.py` file with your Beam code. You can view the source code [here](https://github.com/beam-cloud/examples/blob/main/web_servers/streamlit_server/app.py).
2. A `app.py` file that hosts the Streamlit app
Here's what the Beam-specific code looks like:
```python start_server.py theme={null}
from beam import Image, Pod
streamlit_server = Pod(
image=Image().add_python_packages(["streamlit", "pandas", "altair", "requests"]),
ports=[8501], # Default port for streamlit
cpu=1,
memory=1024,
entrypoint=["streamlit", "run", "app.py"],
)
res = streamlit_server.create()
print("Streamlit server hosted at:", res.url)
```
## Deployment
To run the app, you can simply invoke the Python module directly:
```python theme={null}
python start_server.py
```
Running this command will print the URL of the Streamlit app to the console.
```shell theme={null}
=> Creating container
=> Container created successfully ===> pod-15fba0c6-3fe8-408e-a0f8-c99cf166dcc9-97b6207e
=> Invocation details
curl -X GET 'https://15fba0c6-3fe8-408e-a0f8-c99cf166dcc9-8888.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-d '{}'
```
You can enter the URL in your browser to view the Streamlit app!
# Fine-Tuning Meta Llama 3.1 8B with Unsloth
Source: https://docs.beam.cloud/v2/examples/unsloth
In this guide, we fine-tune the [Meta-Llama-3.1-8B-bnb-4bit](https://huggingface.co/unsloth/Meta-Llama-3.1-8B-bnb-4bit) model, optimized by Unsloth, using Low-Rank Adaptation (LoRA) on the [Alpaca-cleaned dataset](https://huggingface.co/datasets/yahma/alpaca-cleaned). We leverage Beam's infrastructure for compute and storage, then deploy an inference endpoint. Throughout the process, we'll track and evaluate our fine-tuning performance using Weights & Biases (wandb).
See the full code for this example on GitHub.
## Setup
### Environment Configuration
We define a shared `Image` configuration for both fine-tuning and inference, ensuring consistency. The image includes necessary dependencies and installs Unsloth from its GitHub repository.
To use Weights & Biases (wandb) for tracking, you'll need your API key. You
can find it in your [wandb dashboard](https://wandb.ai/settings#api) under the
"API keys" section. Copy the key and replace `YOUR_WANDB_KEY` in the `wandb
login` command.
```python finetune.py theme={null}
from beam import Image
# Weights & Biases API Key (replace with your key)
WANDB_API_KEY = "YOUR_WANDB_KEY"
image = (
Image(python_version="python3.11")
.add_python_packages([
"ninja",
"packaging",
"wheel",
"torch",
"xformers",
"trl",
"peft",
"accelerate",
"bitsandbytes",
"wandb"
])
.add_commands([
"pip uninstall unsloth -y",
'pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"',
f"wandb login {WANDB_API_KEY}"
])
)
# Constants
MODEL_NAME = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
MAX_SEQ_LENGTH = 2048
VOLUME_PATH = "./model_storage"
```
## Fine-Tuning
The fine-tuning script (`finetune.py`) uses Unsloth to adapt the model to the Alpaca-cleaned dataset while tracking metrics with Weights & Biases.
```python theme={null}
from beam import endpoint, Image, Volume, env
# Weights & Biases API Key (replace with your key)
WANDB_API_KEY = "YOUR_WANDB_KEY"
if env.is_remote():
import torch
from unsloth import FastLanguageModel
from transformers import TrainingArguments
from trl import SFTTrainer
from datasets import load_dataset
import os
import wandb
MODEL_NAME = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"
MAX_SEQ_LENGTH = 2048
VOLUME_PATH = "./model_storage"
TRAIN_CONFIG = {
"batch_size": 2,
"grad_accumulation": 4,
"max_steps": 60,
"learning_rate": 2e-4,
"seed": 3407,
}
image = (
Image(python_version="python3.11")
.add_python_packages(
[
"ninja",
"packaging",
"wheel",
"torch",
"xformers",
"trl",
"peft",
"accelerate",
"bitsandbytes",
"wandb"
]
)
.add_commands(
[
"pip uninstall unsloth -y",
'pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"',
]
)
.add_commands(
[
'echo "127.0.0.1 localhost" >> /etc/hosts',
f"wandb login {WANDB_API_KEY}"
]
)
)
@endpoint(
name="unsloth-fine-tune",
cpu=12,
memory="32Gi",
gpu="H100",
image=image,
volumes=[Volume(name="model-storage", mount_path=VOLUME_PATH)],
timeout=-1,
)
def fine_tune_model():
import os
import wandb
os.environ["WANDB_PROJECT"] = "llama-3.1-finetuning"
os.environ["WANDB_LOG_MODEL"] = "checkpoint"
output_dir = os.path.join(VOLUME_PATH, "fine_tuned_model")
os.makedirs(output_dir, exist_ok=True)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL_NAME, max_seq_length=MAX_SEQ_LENGTH, load_in_4bit=True
)
def format_alpaca_prompt(instruction, input_text, output):
template = (
"Below is an instruction that describes a task, paired with an input that "
"provides further context. Write a response that appropriately completes the request.\n"
"### Instruction:\n{}\n### Input:\n{}\n### Response:\n{}"
)
return template.format(instruction, input_text, output) + tokenizer.eos_token
def format_dataset(examples):
texts = [
format_alpaca_prompt(instruction, input_text, output)
for instruction, input_text, output in zip(
examples["instruction"], examples["input"], examples["output"]
)
]
return {"text": texts}
dataset = load_dataset("yahma/alpaca-cleaned", split="train")
dataset = dataset.map(format_dataset, batched=True)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
use_gradient_checkpointing="unsloth",
random_state=TRAIN_CONFIG["seed"],
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=MAX_SEQ_LENGTH,
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=TRAIN_CONFIG["batch_size"],
gradient_accumulation_steps=TRAIN_CONFIG["grad_accumulation"],
max_steps=TRAIN_CONFIG["max_steps"],
learning_rate=TRAIN_CONFIG["learning_rate"],
fp16=False,
bf16=True,
logging_steps=1,
output_dir=output_dir,
seed=TRAIN_CONFIG["seed"],
report_to="wandb",
save_steps=100,
),
)
with torch.autograd.set_detect_anomaly(True):
trainer.train()
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
wandb.finish()
return {
"status": "success",
"message": "Fine-tuning complete",
"model_path": output_dir,
}
```
### Running Fine-Tuning
Execute the script:
```bash theme={null}
python finetune.py
```
After completion, verify that the files are saved in your Beam Volume:
```bash theme={null}
beam ls model-storage/fine_tuned_model
```
Here's the expected output with the fine-tuned files:
```
Name Size Modified Time IsDir
─────────────────────────────────────────────────────────────
fine_tuned_model/README.md 4.99 KiB 1 hour ago No
fine_tuned_model/adapter_config.json 805.00 B 1 hour ago No
fine_tuned_model/adapter_model.safeten… 160.06 MiB 1 hour ago No
fine_tuned_model/checkpoint-60/ 1 hour ago Yes
fine_tuned_model/special_tokens_map.js… 459.00 B 1 hour ago No
fine_tuned_model/tokenizer.json 16.41 MiB 1 hour ago No
fine_tuned_model/tokenizer_config.json 49.46 KiB 1 hour ago No
...
```
### Training Performance Metrics
We tracked our fine-tuning process using Weights & Biases, which provided detailed metrics on training progress. The dashboard showed that the training loss started at approximately 1.85 and, despite significant fluctuations, exhibited a general downward trend, ending at around 0.95 by step 60. This suggests that the model was learning patterns from the Alpaca-cleaned dataset over the 60 training steps.
The dashboard shows a consistent decrease in training loss over time, confirming that our model was learning effectively from the Alpaca dataset.
## Evaluation
To understand the impact of fine-tuning the Meta Llama 3.1 8B model with Unsloth on the Alpaca-cleaned dataset, we evaluated both the base model and the fine-tuned model on two widely used benchmarks: **HellaSwag** (a commonsense reasoning task) and **MMLU** (Massive Multitask Language Understanding, covering a broad range of subjects). The results highlight the fine-tuned model's improvements over the base model, demonstrating the effectiveness of our fine-tuning process.
### Overall Performance
The table below summarizes the overall performance on HellaSwag and MMLU. The fine-tuned model shows modest but consistent gains across both benchmarks.
| Benchmark | Base Model | Fine-tuned Model | Improvement |
| --------------------- | ---------- | ---------------- | ----------- |
| HellaSwag (acc) | 59.09% | 60.37% | +1.28% |
| HellaSwag (acc\_norm) | 77.93% | 78.75% | +0.82% |
| MMLU (overall) | 61.42% | 62.33% | +0.91% |
* **HellaSwag**: The fine-tuned model improves accuracy (acc) by 1.28% and normalized accuracy (acc\_norm) by 0.82%, indicating better commonsense reasoning capabilities.
* **MMLU**: An overall improvement of 0.91% suggests the model has enhanced its general knowledge and reasoning across diverse topics.
### Analysis
The fine-tuned model demonstrates consistent improvements over the base model, particularly in tasks requiring logical reasoning, ethical judgment, and commonsense understanding. These gains align with the Alpaca-cleaned dataset's focus on instruction-following and coherent responses.
## Inference
### Inference Script
The inference script (`inference.py`) loads the fine-tuned model and exposes an endpoint for generating responses.
```python theme={null}
from beam import Image, endpoint, Volume, env
if env.is_remote():
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
image = (
Image(python_version="python3.11")
.add_python_packages(
[
"ninja",
"packaging",
"wheel",
"torch",
"xformers",
"trl",
"peft",
"accelerate",
"bitsandbytes",
]
)
.add_commands(
[
"pip uninstall unsloth -y",
'pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"',
]
)
)
MAX_SEQ_LENGTH = 2048
VOLUME_PATH = "./model_storage"
@endpoint(
name="unsloth-inference",
image=image,
cpu=12,
memory="32Gi",
gpu="H100",
timeout=-1,
volumes=[Volume(name="model-storage", mount_path=VOLUME_PATH)],
)
def generate(**inputs):
prompt = inputs.pop("prompt", None)
if not prompt:
return {"error": "Please provide a prompt"}
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=f"{VOLUME_PATH}/fine_tuned_model",
max_seq_length=MAX_SEQ_LENGTH,
load_in_4bit=True,
)
tokenizer = get_chat_template(
tokenizer,
chat_template="llama-3.1",
)
FastLanguageModel.for_inference(model)
messages = [
{
"role": "user",
"content": prompt,
},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to("cuda")
outputs = model.generate(
input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.5, min_p=0.1
)
res = tokenizer.batch_decode(outputs)
return {"output": res}
```
### Deploying the Endpoint
Run this command to deploy the inference endpoint:
```bash theme={null}
beam deploy inference.py:generate
```
You'll get back a URL with the endpoint:
```bash theme={null}
=> Building image
=> Syncing files
=> Deploying
=> Deployed
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/unsloth-inference/v1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{"prompt": "Your prompt"}'
```
# Run an OpenAI-Compatible vLLM Server
Source: https://docs.beam.cloud/v2/examples/vllm
In this example, we are going to use [vLLM](https://github.com/vllm-project/vllm) to host an OpenAI compatible InternVL3 8B API on Beam.
See the code for this example on Github.
## Introduction to vLLM
[vLLM](https://github.com/vllm-project/vllm) is a high-performance, easy-to-use library for LLM inference. It can be up to 24 times faster than HuggingFace's Transformers library and it allows you to easily setup an OpenAI compatible API for your LLM. Additionally, a number of LLMs (like Llama 3.1) support LoRA. This means that you can easily follow our [LoRA guide](/v2/examples/gemma-fine-tune) and host your resulting model using vLLM.
The key to vLLM's performance is Paged Attention. In LLMs, input tokens produce attention keys and value tensors, which are typically stored in GPU memory. Paged Attention stores these continuous keys and values in non-contiguous memory by partitioning them into blocks that are fetched on a need-to-use basis.
> Because the blocks do not need to be contiguous in memory, we can manage the keys and values in a more flexible way as in OS’s virtual memory: one can think of blocks as pages, tokens as bytes, and sequences as processes. The contiguous logical blocks of a sequence are mapped to non-contiguous physical blocks via a block table. - [vLLM Explainer Doc](https://blog.vllm.ai/2023/06/20/vllm.html)
# Hosting an OpenAI-Compatible Chat API with vLLM
With vLLM, we can host a fully functional chat API that we can use with already built SDKs to interact with. You could build this functionality yourself, but vLLM provides a great out of the box solution as well.
## Initial Setup
To get started with vLLM on Beam, we can use the `VLLM` class from the Beam SDK. This class supports all of the flags and arguments of the vLLM command line tool as arguments.
### Setup Compute Environment
Let's take a look at the code required to deploy the `OpenGVLab/InternVL3-8B-AWQ` model with an efficient configuration. We start by defining the environment and the necessary arguments for our vLLM server.
```python models.py theme={null}
from beam.integrations import VLLM, VLLMArgs
MODEL_ID = "OpenGVLab/InternVL3-8B-AWQ"
vllm_server = VLLM(
name=MODEL_ID.split("/")[-1],
cpu=4,
memory="16Gi",
gpu="A10G",
gpu_count=1,
workers=1,
vllm_args=VLLMArgs(
model=MODEL_ID,
served_model_name=[MODEL_ID],
trust_remote_code=True,
max_model_len=4096,
gpu_memory_utilization=0.90,
limit_mm_per_prompt={"image": 2},
quantization="awq",
max_num_batched_tokens=8192,
)
)
```
**Key Configuration Parameters:**
* `name`: A descriptive name for your Beam application.
* `cpu`: Number of CPU cores allocated (e.g., 4).
* `memory`: Amount of memory allocated (e.g., "16Gi").
* `gpu`: Type of GPU to use (e.g., "A10G").
* `gpu_count`: Number of GPUs (e.g., 1).
* `workers`: Number of worker processes for vLLM (e.g., 1).
* `vllm_args`: Arguments passed directly to the vLLM engine:
* `model`: The Hugging Face model identifier.
* `served_model_name`: Name under which the model is served.
* `trust_remote_code`: Allows the model to execute custom code if required.
* `max_model_len`: Maximum token sequence length for the model.
* `gpu_memory_utilization`: Target GPU memory utilization (e.g., 0.90 for 90%).
* `limit_mm_per_prompt`: (If applicable) Limits for multi-modal inputs.
* `quantization`: Enables model quantization (e.g., "awq"). This is often beneficial even if the model name suggests it's pre-quantized, as vLLM handles the specifics.
* `max_num_batched_tokens`: Sets the capacity for tokens in a batch for dynamic batching (e.g., 8192).
**Equivalent vLLM Command Line (for reference):**
The `VLLM` integration in Beam simplifies deployment. If you were to run a similar configuration using the `vllm serve` command-line tool directly, some of the corresponding arguments would be:
```bash theme={null}
vllm serve OpenGVLab/InternVL3-8B-AWQ \
--trust-remote-code \
--max-model-len 4096 \
--limit-mm-per-prompt image=2 \
--quantization awq \
--max-num-batched-tokens 8192 \
--gpu-memory-utilization 0.90
# Note: Parameters like cpu, memory, gpu_count, and workers are managed by Beam's infrastructure.
```
## Deploying the API
To deploy our model, we can run the following command:
```bash theme={null}
beam deploy models.py:internvl
```
The output will look like this:
```bash theme={null}
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
Collecting files from /Users/minzi/Dev/beam/ex-repo/vllm
Added /Users/minzi/Dev/beam/ex-repo/vllm/models.py
Added /Users/minzi/Dev/beam/ex-repo/vllm/tool_chat_template_mistral.jinja
Added /Users/minzi/Dev/beam/ex-repo/vllm/README.md
Added /Users/minzi/Dev/beam/ex-repo/vllm/chat.py
Added /Users/minzi/Dev/beam/ex-repo/vllm/inference.py
Collected object is 14.46 KB
=> Files already synced
=> Deploying
=> Deployed
=> Invocation details
curl -X POST 'https://internvl-15c4487-v4.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_TOKEN' \
-d '{}'
```
## Using the API
### Pre-requisites
Once your function is deployed, you can interact with it using the OpenAI Python client.
To get started, you can clone the [example repository](https://github.com/beam-cloud/examples/tree/main/vllm) and run the `chat.py` script.
Make sure you have the `openai` library installed locally, since that is how
we interact with the deployed API.
```bash theme={null}
git clone https://github.com/beam-cloud/examples.git
cd examples/vllm
pip install openai
python chat.py
```
### Starting a Dialogue
You will be greeted with a prompt to enter the URL of your deployed function.
Once you enter the URL, the container will initialize on Beam and you will be able to interact with the model.
```bash theme={null}
Welcome to the CLI Chat Application!
Type 'quit' to exit the conversation.
Enter the app URL: https://internvl-instruct-15c4487-v3.app.beam.cloud
Model OpenGVLab/InternVL2_5-8B is ready
Question: What is in this image?
Image link (press enter to skip): https://upload.wikimedia.org/wikipedia/commons/7/74/White_domesticated_duck,_stretching.jpg
Assistant: The image you've shared is of a white duck standing on a grassy field. The duck, with its distinctive orange beak and feet, is facing to the left.
```
To host other models, you can simply change the arguments you pass into the `VLLM` class.
```python Yi Coder 9B Chat theme={null}
from beam.integrations import VLLM, VLLMArgs
YI_CODER_CHAT = "01-ai/Yi-Coder-9B-Chat"
yicoder_chat = VLLM(
name=YI_CODER_CHAT.split("/")[-1],
cpu=8,
memory="16Gi",
gpu="H100",
vllm_args=VLLMArgs(
model=YI_CODER_CHAT,
served_model_name=[YI_CODER_CHAT],
task="chat",
trust_remote_code=True,
max_model_len=8096,
),
)
```
```python Mistral 7B Instruct v0.3 theme={null}
from beam.integrations import VLLM, VLLMArgs
MISTRAL_INSTRUCT = "mistralai/Mistral-7B-Instruct-v0.3"
mistral_instruct = VLLM(
name=MISTRAL_INSTRUCT.split("/")[-1],
cpu=8,
memory="16Gi",
gpu="H100",
secrets=["HF_TOKEN"],
vllm_args=VLLMArgs(
model=MISTRAL_INSTRUCT,
served_model_name=[MISTRAL_INSTRUCT],
chat_template="./tool_chat_template_mistral.jinja",
enable_auto_tool_choice=True,
tool_call_parser="mistral",
),
)
```
# Web Scraping with Beam Functions
Source: https://docs.beam.cloud/v2/examples/web-scraping
In this example, we'll demonstrate how to build a Wikipedia web scraper using Beam functions. While you could run this on a local computer, Beam provides access to more powerful computational resources, allowing you to add advanced features to your webscraper using large language models or OCR models.
See the code for this example on Github.
## Defining our Scraping Function
We will start by defining our scraping function. This is the Beam function that will be invoked remotely. We use the [`Image` class](/v2/environment/custom-images) from the `beam` SDK to install these packages in the container running your code.
```python theme={null}
from beam import Image, function
@function(image=Image().add_python_packages(["requests", "beautifulsoup4"]))
def scrape_page(url):
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
if response.status_code != 200:
return {"url": url, "title": "", "content": "", "links": []}
soup = BeautifulSoup(response.text, "html.parser")
title = soup.find(id="firstHeading").text
content = soup.find(id="mw-content-text").find(class_="mw-parser-output")
if not content:
return {"url": url, "title": title, "content": "", "links": []}
paragraphs = [p.text for p in content.find_all("p", recursive=False)]
links = [urljoin(url, link["href"]) for link in content.find_all("a", href=True)]
return {
"url": url,
"title": title,
"content": "\n\n".join(paragraphs),
"links": links,
}
```
Our function takes in a URL, fetches the page's HTML, and then uses [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) to extract the page's title, content, and links. It returns that content in a dictionary so that our crawler can invoke new functions with the links found on the page. If we wanted, we could add more functionality to this function to extract or process the content in a variety of ways. For example, we could add a language model to summarize the content or use an OCR model to extract text from an image.
## Building a Batch Crawler with Beam's Function Map
Next, we'll build a crawler that will use Beam's `map` method to invoke our `scrape_page` function on a list of URLs. Below, is our `__init__` method for the crawler.
```python theme={null}
class WikipediaCrawler:
def __init__(self, start_url, max_pages=100, batch_size=5):
self.start_url = start_url
self.max_pages = max_pages
self.batch_size = batch_size
self.visited_pages = set()
self.pages_to_visit = [start_url]
self.scraped_data = {}
```
Our crawler takes in a starting URL, a maximum number of pages to scrape, and a batch size. The batch size determines how many remote function invocations we will make at a time.
Next, we'll define the actual `crawl` method along with a helper method to determine if a URL is a valid Wikipedia URL.
```python theme={null}
def is_wikipedia_url(self, url):
parsed_url = urlparse(url)
return parsed_url.netloc.endswith(
"wikipedia.org"
) and parsed_url.path.startswith("/wiki/")
def crawl(self):
while len(self.scraped_data) < self.max_pages and self.pages_to_visit:
# Create a batch of 5 pages to scrape that we have not yet visited
batch = []
while len(batch) < self.batch_size and self.pages_to_visit:
p = self.pages_to_visit.pop(0)
if p not in self.visited_pages:
batch.append(p)
for result in scrape_page.map(batch):
# Save the result and collect new links
self.scraped_data[result["url"]] = result
if len(self.scraped_data) < self.max_pages:
new_links = [
link
for link in result["links"]
if self.is_wikipedia_url(link)
and link not in self.visited_pages
and link not in self.pages_to_visit
]
self.pages_to_visit.extend(new_links)
print(f"Crawling completed. Scraped {len(self.scraped_data)} pages.")
def get_scraped_data(self):
return self.scraped_data
```
The crawl method runs continuously until we have scraped the maximum number of pages or there are no more pages to visit. It creates a batch of URLs to scrape and then passes them to the `scrape_page` function's `map` method. This allows us to scrape multiple pages in parallel. After the pages are scraped, we collect any new links that we want to visit and add them to the `pages_to_visit` list.
## Running the Batch Crawler
Finally, we can run our crawler. Below is the code for our `main` function which initializes the crawler and runs the crawl method.
```python theme={null}
if __name__ == "__main__":
start_url = "https://en.wikipedia.org/wiki/Web_scraping"
crawler = WikipediaCrawler(start_url, max_pages=20)
crawler.crawl()
# Write the scraped data to a file
with open("scraped_data.json", "w") as f:
json.dump(crawler.get_scraped_data(), f)
```
This code initializes the crawler with a starting URL and a maximum number of pages to scrape. It then runs the crawl method and writes the scraped data to a file. You can run this code like any other Python script:
```bash theme={null}
python batch_crawl.py
```
When you run this code, you should see output that looks like the following:
```bash theme={null}
=> Building image
=> Using cached image
=> Syncing files
...
=> Uploading
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 727.9/727.9 kB 0:00:00
=> Files synced
=> Running function:
=> Function complete <21f88938-8b82-465c-8b16-8bc0259e1997>
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Function complete <0f384b7a-98da-400e-bcc5-abacf7f239ef>
=> Function complete <16da6df3-955d-4ad7-a8ec-c6456ab6de1e>
=> Function complete <2dd4c91b-a48f-4d7e-ada3-485633539ee5>
=> Function complete <04452bb5-f642-43e3-9d0f-9cb7532c0d4b>
=> Function complete <7ba94632-1907-415a-acad-c37a2cddd97e>
```
The output shows five function invocations in parallel. Once the scraping is complete, you can see the results in the `scraped_data.json` file. It will look something like this:
```json theme={null}
{
"https://en.wikipedia.org/wiki/Web_scraping": {
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"title": "Web scraping",
"content": "Web scraping, web harvesting, or web data extraction ...",
"links": [
"https://en.wikipedia.org/wiki/Data_scraping",
...
]
},
"https://en.wikipedia.org/wiki/Wikipedia:Verifiability": {
"url": "https://en.wikipedia.org/wiki/Wikipedia:Verifiability",
"title": "Wikipedia:Verifiability",
"content": "\n\n\nIn the English Wikipedia, ...",
"links": [
...
]
},
...
}
```
## Building a Continuous Crawler with Beam Functions and Threads
The batched web crawler is a good starting point, but it requires waiting for a full batch to finish before starting any new jobs. If we want to keep our crawler limit continuously saturated, we can use Beam functions in conjunction with Python threads.
To do this, we will use the same `scrape_page` function, but instead of using the `map` method, we will use a thread pool to invoke the function in parallel. Below is the code for our `WikipediaCrawler` class with a continuous crawl method.
```python theme={null}
def process_scraped_page(self, result):
if not result or len(self.scraped_data) >= self.max_pages:
return
self.scraped_data[result["url"]] = result
if len(self.scraped_data) < self.max_pages:
new_links = filter(self.is_wikipedia_url, result["links"])
self.pages_to_visit.extend(new_links)
def crawl(self):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = {}
while len(self.scraped_data) < self.max_pages and (
self.pages_to_visit or futures
):
# Start new tasks if we have capacity and pages to visit
while len(futures) < 5 and self.pages_to_visit:
url = self.pages_to_visit.pop(0)
self.visited_pages.add(url)
future = executor.submit(scrape_page.remote, url)
futures[future] = url
# Wait for any task to complete
if futures:
done, _ = concurrent.futures.wait(
futures, return_when=concurrent.futures.FIRST_COMPLETED
)
for future in done:
url = futures.pop(future)
try:
result = future.result()
self.process_scraped_page(result)
except Exception as e:
print(f"Error processing {url}: {str(e)}")
print(f"Crawling completed. Scraped {len(self.scraped_data)} pages.")
```
This code is more complex than the batch crawler, but it allows us to better utilize our compute resources. Instead of having containers sitting idle while other containers complete their work, we immediately send a new function invocation as soon as another one completes. To do this, we track the futures returned by the `executor.submit` method and wait for any of them to complete using the `concurrent.futures.wait` method. We specify that we only want to wait for one of the futures to complete using the `concurrent.futures.FIRST_COMPLETED` constant. This means that as soon as any future completes, we will process the result and add new work to the pool.
## Running the Continuous Crawler
To run the continuous crawler, you can use the same `main` function as before. When you run this code, you should see output that looks like the following:
```bash theme={null}
=> Building image
=> Using cached image
=> Syncing files
...
=> Files already synced
=> Running function:
=> Function complete <5c059d2c-2570-4ac1-8c8b-11d96543d197>
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Function complete <4f0ff5f6-12db-448f-acad-66362485c988>
=> Running function:
=> Function complete
=> Running function:
=> Function complete <44ae4cbe-6de1-4442-a28c-9adb32937a03>
=> Running function:
=> Function complete <5788c95d-afb7-428d-b906-d0378f099d58>
=> Running function:
=> Function complete
```
As you can see, as soon as one function invocation completes, we immediately start a new one.
# Faster Whisper
Source: https://docs.beam.cloud/v2/examples/whisper
This guide will walk you through deploying and invoking a transcription API using the Faster Whisper model on Beam. The API can be invoked with either a URL to an `.mp3` file or a base64-encoded audio file.
See the code for this example on Github.
## Initial Setup
In your Python file, add the following code to define your endpoint and handle the transcription:
```python app.py theme={null}
from beam import endpoint, Image, Volume, env
import base64
import requests
from tempfile import NamedTemporaryFile
BEAM_VOLUME_PATH = "./cached_models"
# These packages will be installed in the remote container
if env.is_remote():
from faster_whisper import WhisperModel, download_model
# This runs once when the container first starts
def load_models():
model_path = download_model("large-v3", cache_dir=BEAM_VOLUME_PATH)
model = WhisperModel(model_path, device="cuda", compute_type="float16")
return model
@endpoint(
on_start=load_models,
name="faster-whisper",
cpu=2,
memory="32Gi",
gpu="A10G",
image=Image(
base_image="nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04",
python_version="python3.10",
)
.add_python_packages(["git+https://github.com/SYSTRAN/faster-whisper.git", "huggingface_hub[hf-transfer]"])
.with_envs("HF_HUB_ENABLE_HF_TRANSFER=1"),
volumes=[
Volume(
name="cached_models",
mount_path=BEAM_VOLUME_PATH,
)
],
)
def transcribe(context, **inputs):
# Retrieve cached model from on_start
model = context.on_start_value
# Inputs passed to API
language = inputs.get("language")
audio_base64 = inputs.get("audio_file")
url = inputs.get("url")
if audio_base64 and url:
return {"error": "Only a base64 audio file OR a URL can be passed to the API."}
if not audio_base64 and not url:
return {
"error": "Please provide either an audio file in base64 string format or a URL."
}
binary_data = None
if audio_base64:
binary_data = base64.b64decode(audio_base64.encode("utf-8"))
elif url:
resp = requests.get(url)
binary_data = resp.content
text = ""
with NamedTemporaryFile() as temp:
try:
# Write the audio data to the temporary file
temp.write(binary_data)
temp.flush()
segments, _ = model.transcribe(temp.name, beam_size=5, language=language)
for segment in segments:
text += segment.text + " "
print(text)
return {"text": text}
except Exception as e:
return {"error": f"Something went wrong: {e}"}
```
## Deployment
To deploy the app, run the following command:
If you named your file something different than `app.py`, make sure to
customize the command with your correct file name.
```python theme={null}
beam deploy app.py:transcribe
```
This command will deploy your app as a web endpoint. The endpoint URL will be printed out in the shell.
## Invoking the API
Once the API is running, you can invoke it with a URL to an `.mp3` file using the following cURL command:
If you want to test with sample `.mp3` files, you can find many samples on
[this website](https://audio-samples.github.io/).
```sh theme={null}
curl -X POST 'https://faster-whisper-7157fd0-v1.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR-AUTH-TOKEN]' \
-d '{"url":"https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3"}'
```
Replace the URL with the URL printed in your shell, and `[YOUR-AUTH-TOKEN]` with your authentication token.
## Summary
You've successfully set up a highly performant serverless API for transcribing audio files using the Faster Whisper model on Beam. The API can handle both URLs to audio files and base64-encoded audio files. With the provided setup, you can easily serve, invoke, and develop your transcription API.
# Zonos
Source: https://docs.beam.cloud/v2/examples/zonos
This guide demonstrates how to deploy a Text-to-Speech (TTS) API using the [Zonos model](https://github.com/Zyphra/Zonos) from Zyphra. The API converts input text into spoken audio, leveraging a pre-trained transformer model and speaker embeddings derived from an example audio file. We use Beam’s infrastructure for compute and file output handling.
See the full code for this example on GitHub.
## Setup
### Environment Configuration
First, create a file named `app.py`:
```python theme={null}
from beam import Image, endpoint, Output, env
if env.is_remote():
import torchaudio
from zonos.model import Zonos
from zonos.conditioning import make_cond_dict
from zonos.utils import DEFAULT_DEVICE as device
import os
import uuid
# Custom image configuration
image = (
Image(
base_image="nvidia/cuda:12.4.1-devel-ubuntu22.04",
python_version="python3.11"
)
.add_commands(["apt update && apt install -y espeak-ng git"])
.add_commands([
"pip install -U uv",
"git clone https://github.com/Zyphra/Zonos.git /tmp/Zonos",
"cd /tmp/Zonos && pip install setuptools wheel && pip install -e .",
])
)
@endpoint(
name="zonos-tts",
image=image,
cpu=12,
memory="32Gi",
gpu="H100",
timeout=-1
)
def generate(**inputs):
text = inputs.get("text")
if not text:
return {"error": "Please provide a text"}
os.chdir("/tmp/Zonos")
model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device=device)
wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
speaker = model.make_speaker_embedding(wav, sampling_rate)
cond_dict = make_cond_dict(text=text, speaker=speaker, language="en-us")
conditioning = model.prepare_conditioning(cond_dict)
codes = model.generate(conditioning)
# Save generated audio
file_name = f"/tmp/zonos_out_{uuid.uuid4()}.wav"
wavs = model.autoencoder.decode(codes).cpu()
torchaudio.save(file_name, wavs[0], model.autoencoder.sampling_rate)
# Upload and get public URL
output_file = Output(path=file_name)
output_file.save()
public_url = output_file.public_url(expires=1200000000)
return {"output_url": public_url}
if __name__ == "__main__":
generate()
```
## Deployment
Run this command to deploy the endpoint:
```bash theme={null}
beam deploy app.py:generate
```
It will return a URL with the endpoint:
```bash theme={null}
=> Building image
=> Syncing files
=> Deploying
=> Deployed
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/zonos-tts/v1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{"text": "On Beam run AI workloads anywhere with zero complexity."}'
```
## API Usage
The deployed endpoint accepts POST requests with a JSON payload containing the text to convert to speech.
### Request Format
```json theme={null}
{
"text": "Your text to convert to speech"
}
```
### Example Request
```bash theme={null}
curl -X POST 'https://app.beam.cloud/endpoint/zonos-tts/v1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{"text": "On Beam run AI workloads anywhere with zero complexity. One line of Python, global GPUs, full control"}'
```
### Example Response
The API returns a JSON object with a URL to the generated audio file:
```json theme={null}
{
"output_url": "https://app.beam.cloud/output/id/704defd0-9370-4499-9124-677925e64961"
}
```
# Distributed Maps
Source: https://docs.beam.cloud/v2/function/maps
Using Beam's distributed Map
Beam includes a concurrency-safe distributed map, accessible both locally and within remote containers. Serialization is done using cloudpickle, so any pickleable object will work. The interface is that of a standard python dictionary, but unlike a typical dicitonary it will persist between runs.
## Example: Accessing a map locally and remotely
In the following example, we create a distributed map. Our first function is invoked remotely using `.remote()`, and it sets the value a key in our map. The second function is invoked locally using `.local()`, and it sets another value. Finally, we print the result of our third, remotely invoked function, which retrieves the values we just set.
```python theme={null}
from beam import Map, function
@function()
def first():
m = Map(name="m")
m["beam"] = "me up"
return
@function()
def second():
m = Map(name="m")
m["speed"] = "of light"
return
@function()
def third():
m = Map(name="m")
return [m["beam"], m["speed"]]
if __name__ == '__main__':
first.remote()
second.local()
print(third.remote())
```
You can run the example above with `python app.py`. The output will be:
```bash theme={null}
['me up', 'of light']
```
# Queues
Source: https://docs.beam.cloud/v2/function/queues
Using Beam's distributed Queue to coordinate between tasks
Beam includes a concurrency-safe distributed queue, accessible both locally and within remote containers.
Serialization is done using cloudpickle, so any object that supported by that should work here. The interface is that of a standard python queue.
Because this is backed by a distributed queue, it will persist between runs.
In the example below, we run one function remotely on Beam and another locally. The remote function puts a value in the queue, and the local function pops it out and prints it. The output will be `beam me up`.
```python Simple Queue theme={null}
from beam import Queue, function
@function()
def first():
q = Queue(name="q")
q.put("beam me up")
return
@function()
def second():
q = Queue(name="q")
print(q.pop())
return
if __name__ == '__main__':
first.remote()
second.local()
```
# Running Functions Remotely
Source: https://docs.beam.cloud/v2/function/running-functions
A short guide on using Beam to run one-off functions in the cloud
You can add a decorator to any Python function to run it remotely on Beam:
```python app.py theme={null}
from beam import function
@function()
def handler():
return {"hello world"}
if __name__ == "__main__":
handler.remote()
```
Just run this like a normal Python file, and the code will run on Beam's cloud and stream the response back to your shell.
```sh theme={null}
$ python app.py
=> Building image
=> Using cached image
=> Syncing files
=> Uploading
=> Files synced
=> Running function:
Loading image ...
Loaded image , took: 3.131485ms
=> Function complete
```
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
## Passing Function Args
You can also pass arguments to your function just like normal Python functions:
```python app.py theme={null}
from beam import function
@function()
def greet(name: str):
return f"Hello {name}"
if __name__ == "__main__":
print(greet.remote("World")) # "Hello World"
```
## Task Timeouts
You can set timeouts on tasks. Timeouts are set in seconds:
```python theme={null}
from beam import function
# Set a 24 hour timeout
@function(timeout=86400)
def long_timeout():
return {"hello world"}
# Disable timeouts completely
@function(timeout=-1)
def no_timeout():
return {"message": "hello world"}
```
## Running Tasks in the Background
By default, remote functions will stop when you close your local Python process or exit your shell.
You can override this behavior and keep the function running in the background by setting `headless=False` in
your function decorator.
```python theme={null}
import time
from beam import function
# Run the function in the background
@function(headless=True)
def handler():
for i in range(100):
print(i)
time.sleep(1)
return {"message": "This is running in the background"}
if __name__ == "__main__":
handler.remote()
```
# Scheduled Jobs
Source: https://docs.beam.cloud/v2/function/scheduled-job
How to run workloads on a schedule.
## Run Scheduled Jobs
Use the `@schedule` decorator to define a scheduled job.
```python theme={null}
from beam import schedule
@schedule(when="@weekly", name="weekly-task")
def task():
print("Hi, from your weekly scheduled task!")
```
To schedule it, run `beam deploy`:
```sh theme={null}
beam deploy app.py:task
```
You'll see the upcoming jobs listed in the console.
```sh theme={null}
=> Deployed
=> Schedule details
Schedule: @hourly
Upcoming:
1. 2024-08-30 18:00:00 UTC (2024-08-30 14:00:00 EDT)
2. 2024-08-30 19:00:00 UTC (2024-08-30 15:00:00 EDT)
3. 2024-08-30 20:00:00 UTC (2024-08-30 16:00:00 EDT)
```
## Scheduling Options
The following predefined schedules can be used in the `when` parameter:
| **Predefined Schedule** | **Description** | **Cron Expression** |
| -------------------------- | ---------------------------------------------------------- | ------------------- |
| `@yearly` (or `@annually`) | Run once a year at midnight on January 1st | `0 0 1 1 *` |
| `@monthly` | Run once a month at midnight on the first day of the month | `0 0 1 * *` |
| `@weekly` | Run once a week at midnight on Sunday | `0 0 * * 0` |
| `@daily` (or `@midnight`) | Run once a day at midnight | `0 0 * * *` |
| `@hourly` | Run once an hour at the beginning of the hour | `0 * * * *` |
## Stopping Scheduled Jobs
You can stop a scheduled job from running by using the `beam deployment stop` CLI command.
First, list the upcoming jobs with `beam deployment list`:
```sh theme={null}
ID Name Active Version Created At Updated At Stub Name Workspace Name
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
10c192b6-6489-42c9-a3… schedule Yes 2 9 minutes ago 9 minutes ago schedule/deployment/ap… f6fa28
```
Then reference the **Deployment ID** to stop a job:
```sh theme={null}
$ beam deployment stop 10c192b6-6489-42c9-a3
Stopped 10c192b6-6489-42c9-a3bf-75c52ad1816b
```
## Gotchas
If you deploy a new version of your scheduled job, the previous schedule will be disabled.
# Add to Cursor or Claude
Source: https://docs.beam.cloud/v2/getting-started/add-to-cursor-claude
Connect the Beam docs to your AI tools with MCP
Beam's documentation is available as an [MCP](https://modelcontextprotocol.io) server, so your AI tools can search the docs and answer questions with accurate, up-to-date context while you build.
The server is hosted at:
```text theme={null}
https://docs.beam.cloud/mcp
```
## Add to Cursor
Click the button to install the Beam docs MCP server in Cursor:
[](cursor://anysphere.cursor-deeplink/mcp/install?name=Beam\&config=eyJ1cmwiOiJodHRwczovL2RvY3MuYmVhbS5jbG91ZC9tY3AifQ==)
Or add it manually. In **Cursor Settings → MCP & Integrations → New MCP Server**, add:
```json theme={null}
{
"mcpServers": {
"beam": {
"url": "https://docs.beam.cloud/mcp"
}
}
}
```
## Add to Claude
Add the server from your terminal:
```bash theme={null}
claude mcp add --transport http beam https://docs.beam.cloud/mcp
```
Go to **Settings → Connectors → Add custom connector**, then enter:
* **Name:** Beam
* **URL:** `https://docs.beam.cloud/mcp`
Restart Claude Desktop and the Beam docs tools will be available.
Looking for more ways to use the docs with AI tools, including `llms.txt` and `.md` pages? See [Using Beam Docs with AI Tools](/v2/resources/ai-tools).
# Core Concepts
Source: https://docs.beam.cloud/v2/getting-started/core-concepts
## How Beam Works
Beam is a new kind of cloud provider that makes using the cloud feel almost the same as using your local machine. You write plain Python, add a decorator, and run your file. Beam packages your code into a container, launches it in the cloud in under a second, runs it, and scales out automatically when traffic increases.
It's powered by an [open-source container orchestrator](https://github.com/beam-cloud/beta9) that launches containers in less than 1 second.
```mermaid theme={null}
flowchart LR
localCode["Your Python code + decorator"] --> beam["Beam"]
beam --> container["Container in the cloud"]
container --> autoscale["Autoscales to thousands of containers"]
container --> shutdown["Shuts down when idle"]
```
There's no infrastructure to provision and no YAML to write. You only pay for the compute you use, billed by the millisecond.
The examples below use the [Python SDK](/v2/reference/py-sdk), which provides Beam's decorator programming model. The [TypeScript SDK](/v2/reference/ts-sdk) offers programmatic access for creating sandboxes and calling deployed endpoints.
## Functions
You can run functions on the cloud, either once, or on a schedule. [Learn more about Functions](/v2/function/running-functions).
One-off Python functions, like training runs, scraping, or batch jobs.
```python theme={null}
from beam import function
@function()
def handler():
return {}
if __name__ == "__main__":
# Runs locally
handler.local()
# Runs on the cloud
handler.remote()
```
Functions that run based on a schedule you specify.
```python theme={null}
from beam import schedule
@schedule(when="every 1d")
def handler():
return {}
if __name__ == "__main__":
# Runs locally
handler.local()
# Runs on the cloud
handler.remote()
```
You'll run your functions like a normal Python function: `python app.py`.
Even though it *feels* like the code is running locally, it's running on a
container in the cloud.
## Endpoints
You can also deploy synchronous and asynchronous web endpoints. Learn more about [Endpoints](/v2/endpoint/overview) and [Task Queues](/v2/task-queue/running-tasks).
Synchronous REST API endpoints, for tasks that run in 60s or less.
```python theme={null}
from beam import endpoint
@endpoint(name="quickstart")
def handler():
return {}
```
Asynchronous REST API endpoints, for heavier tasks that take a long time to run.
```python theme={null}
from beam import task_queue
@task_queue(name="quickstart")
def handler():
print(48393 * 39383)
```
Beam provides a temporary cloud environment to test your code.
These environments hot-reload with your code changes. You can test your workflow end-to-end before deploying to production.
```bash theme={null}
beam serve app.py:handler
```
When you're ready to deploy a persistent endpoint, you'll use `beam deploy`:
```bash theme={null}
beam deploy app.py:handler
```
## Web Services
You can also bring your own container and host web services, like Jupyter Notebooks, Node.js apps, and much more. [Learn more about Pods](/v2/pod/web-service).
Run any container behind an SSL-backed REST API.
```python theme={null}
from beam import Pod
pod = Pod(
name="my-pod",
cpu=2,
memory="1Gi",
ports=[8000],
entrypoint=["python", "-m", "http.server", "--bind", "::", "8000"],
)
# Run the container as an API
pod.deploy()
```
# Installation
Source: https://docs.beam.cloud/v2/getting-started/installation
## Mac and Linux
Install the Beam SDK and CLI. The Python SDK provides the decorator programming model (functions, endpoints, task queues, and sandboxes); the TypeScript SDK provides programmatic access for creating sandboxes and calling deployed endpoints.
```bash Python theme={null}
uv tool install beam-client
```
```bash TypeScript theme={null}
npm install @beamcloud/beam-js@rc
```
Beam will create a credentials file in `~/.beam/config.ini`. When you run `beam config create`, your API keys will be saved to this file.
## Homebrew
You can install the CLI separately from the SDK using Homebrew:
```bash theme={null}
brew tap beam-cloud/beam
brew install beam
```
## Windows
You can install Beam on Windows using [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install) (WSL).
These steps assume you're starting fresh, but note that some systems (e.g. with Docker Desktop) may already have WSL distributions installed.
After installation, you may be prompted to set up a new user for the Ubuntu
environment: `wsl --install Ubuntu-22.04`
Only do this if you explicitly need WSL 1. Most users should stick with WSL
2: `wsl --set-version Ubuntu-22.04 1`
This ensures you’re using the correct distribution (not docker-desktop or
others): `wsl -d Ubuntu-22.04`
`sudo apt update && sudo apt install python3-pip -y`
`uv tool install beam-client`
## Upgrading
Once installed, you can upgrade the CLI by running:
```bash theme={null}
uv tool upgrade beam-client
```
## Uninstalling
The Python SDK can be uninstalled using `pip`:
```bash theme={null}
python3 -m pip uninstall beam-client
```
# Introduction
Source: https://docs.beam.cloud/v2/getting-started/introduction
The open-source serverless cloud for AI and ML workloads
Beam lets you run functions, REST APIs, task queues, and sandboxes on CPUs and GPUs, end to end. There's no infrastructure to manage and no YAML to write: you define everything in code, and Beam runs it in containers that launch in under a second.
## Get Started
Create a free account on [Beam](https://platform.beam.cloud) to get \$30 in credit, then pick a path to start building. Most people start with the SDK and add the docs MCP if their editor speaks it (Cursor, Claude).
Install the SDK, run a function, and deploy a web endpoint in minutes.
Connect the Beam docs MCP so your AI editor has live context.
## What You Can Build
Deploy functions as autoscaling REST APIs on CPUs and GPUs.
Run untrusted or LLM-generated code in secure, isolated environments.
Serve ML and LLM inference on on-demand GPUs.
Process heavy or long-running jobs asynchronously.
Run one-off functions, batch jobs, and cron schedules with no timeouts.
Deploy any existing Docker image as a web service.
## Keep Exploring
Understand how functions, endpoints, and sandboxes work.
Browse end-to-end examples for real workloads.
The [Python SDK](/v2/reference/py-sdk) provides Beam's decorator programming model (functions, endpoints, task queues, and sandboxes). The [TypeScript SDK](/v2/reference/ts-sdk) provides programmatic access for creating sandboxes and calling deployed endpoints.
## Community
Beam is completely open source. Star the repo, ask questions, and share what you build.
Star the repo and contribute.
Join the community and get help.
## Enterprise
Running Beam at scale or need self-hosting and dedicated support? [Get in touch](https://calendly.com/elimernit/30min) or reach us at [founders@beam.cloud](mailto:founders@beam.cloud).
# Quickstart
Source: https://docs.beam.cloud/v2/getting-started/quickstart
## Run a Function in the Cloud
The simplest way to run code on Beam is to add the `@function` decorator to any Python function. Save this to `app.py`:
```python app.py theme={null}
from beam import function
@function(cpu=1, memory="1Gi")
def square(x: int):
return {"result": x**2}
if __name__ == "__main__":
print(square.remote(x=12))
```
Run it like any other Python file:
```sh theme={null}
python app.py
```
Beam syncs your code, launches a container, runs the function, and streams the result back to your shell:
```
=> Building image
=> Using cached image
=> Syncing files
=> Files synced
=> Running function:
{'result': 144}
=> Function complete
```
The container spins up in seconds, runs your code, and shuts itself down. No idle costs, no infrastructure to clean up.
## Deploy a Web Endpoint
To turn your code into a live web API, swap `@function` for `@endpoint`. We'll include `numpy` in the image to show how easily you can add Python packages.
* `Image()` defines your container environment. You can add Python packages, system dependencies, or even custom Dockerfiles.
* `@endpoint` turns your function into a real, live web API that runs in the cloud.
```python app.py theme={null}
from beam import endpoint, Image
@endpoint(
name="quickstart",
cpu=1,
memory="1Gi",
image=Image().add_python_packages(["numpy"]),
)
def predict(**inputs):
x = inputs.get("x", 256)
return {"result": x**2}
```
### Deployment
Deploy the endpoint to the cloud:
```sh theme={null}
beam deploy app.py:predict
```
### Call the API
When the deploy finishes, Beam prints your endpoint URL along with a ready-to-run `curl` command. Replace `[TOKEN]` with your token and use the URL from your deploy output:
```sh curl theme={null}
curl -X POST 'https://app.beam.cloud/endpoint/quickstart' \
-H 'Authorization: Bearer [TOKEN]' \
-H 'Content-Type: application/json' \
-d '{"x": 12}'
```
```typescript TypeScript theme={null}
import { beamOpts, Deployments } from "@beamcloud/beam-js";
beamOpts.token = process.env.BEAM_TOKEN!;
beamOpts.workspaceId = process.env.BEAM_WORKSPACE_ID!;
const deployment = await Deployments.get({
name: "quickstart",
stubType: "endpoint/deployment",
});
const response = await deployment.call({ x: 12 });
console.log(response);
```
Either way, you'll get back:
```json theme={null}
{ "result": 144 }
```
The container spins up in seconds, runs your code, and shuts itself down. No idle costs. No infrastructure to clean up.
## What Next?
Here are some other things you can try:
* [Customize your container image](/v2/environment/custom-images)
* [Add a GPU to your app](/v2/environment/gpu)
* [Run a scheduled job](/v2/function/scheduled-job)
* [Parallelize a function across 10 containers](/v2/scaling/parallelizing-functions)
# Networking
Source: https://docs.beam.cloud/v2/pod/networking
## Exposing Ports
You can expose TCP ports to the outside world by specifying the ports you want to expose in the `ports` parameter.
`ports` accepts a list, so you can expose multiple ports too.
In the example below, we expose two ports:
* `8888` for a Jupyter Notebook server
* `3000` for a separate application or web server
```python theme={null}
from beam import Image, Pod
pod = Pod(
image=Image(base_image="jupyter/base-notebook:latest"),
ports=[8888, 3000],
entrypoint=["start-notebook.py"],
)
```
Once your Pod is running, both ports will be available at a public URL.
## Network Security
### Blocking Outbound Traffic
You can block all outbound network access from your Pod while still allowing inbound connections to exposed ports. This is useful for security-sensitive workloads that shouldn't communicate with external services.
```python theme={null}
from beam import Image, Pod
pod = Pod(
image=Image(base_image="python:3.11-slim"),
ports=[8000],
block_network=True, # Block all outbound traffic
entrypoint=["python", "-m", "http.server", "8000"],
)
```
With `block_network=True`, the Pod can receive requests on exposed ports but cannot make outbound connections to external services.
### Allow Lists (CIDR Ranges)
For more fine-grained control, you can specify an allow list of CIDR ranges that your Pod is permitted to connect to. All other outbound traffic will be blocked.
```python theme={null}
from beam import Image, Pod
pod = Pod(
image=Image(base_image="python:3.11-slim"),
ports=[8000],
allow_list=[
"8.8.8.8/32", # Allow Google DNS
"10.0.0.0/8", # Allow private network range
"2001:db8::/32", # Allow IPv6 range
],
entrypoint=["python", "app.py"],
)
```
**Important Notes:**
* Maximum of 10 CIDR entries per Pod
* Supports both IPv4 and IPv6 addresses
* Must use proper CIDR notation (e.g., `"8.8.8.8/32"` for a single IP)
* Cannot use `allow_list` and `block_network` together - they are mutually exclusive
* Invalid CIDR values will trigger an error at creation time
## Static IPs
Pods are served in a static IP range, making it possible to whitelist the Beam IP range from the client.
For the static IP range, send us a message in [Slack](https://join.slack.com/t/beam-cloud/shared_invite/zt-3enuvj3r7-OeAzVPYvyqQHy9avNrLL0w).
# Host a Web Service
Source: https://docs.beam.cloud/v2/pod/web-service
[`Pod`](/v2/reference/py-sdk#pod) provides a way to run serverless containers on the cloud. It enables you to quickly launch a container as an HTTPS server that you can access from a web browser.
Pods run in isolated containers, allowing you to run untrusted code safely from your host system.
This can be used for a variety of use cases, such as:
* Hosting GUIs, like Jupyter Notebooks, Streamlit or Reflex apps, and ComfyUI
* Testing code in an isolated environment as part of a CI/CD pipeline
* Securely executing code generated by LLMs
...and much more (if you've got a cool use case, [let us know!](https://join.slack.com/t/beam-cloud/shared_invite/zt-3enuvj3r7-OeAzVPYvyqQHy9avNrLL0w))
# Launching Cloud Containers
Containers can be launched programmatically through the Python SDK, or with the Beam CLI.
For example, the following code is used to launch a cloud-hosted Jupyter Notebook:
```python Python theme={null}
from beam import Image, Pod
notebook = Pod(
image=Image(base_image="jupyter/base-notebook:latest"),
ports=[8888],
cpu=1,
memory=1024,
env={
"NOTEBOOK_ARGS": "--ip='' --NotebookApp.token='' --NotebookApp.notebook_dir=/tmp"
},
entrypoint=["start-notebook.py"],
)
nb = notebook.create()
print("Container hosted at:", nb.url)
```
```shell CLI theme={null}
beam run --image jupyter/base-notebook:latest --ports 8888 \
--env NOTEBOOK_ARGS="--ip='' --NotebookApp.token='' --NotebookApp.notebook_dir=/tmp" \
--entrypoint "start-notebook.py"
```
When this code is executed, Beam will launch a container and expose it as a publicly available HTTPS server:
```
$ python app.py
=> Building image
=> Using cached image
=> Syncing files
=> Creating container
=> Container created successfully ===> pod-2929b184-b445-4f23-abc6-7c4b151001da-ec86d9ac
Container hosted at: https://2929b184-b445-4f23-abc6-7c4b151001da-8888.app.beam.cloud
```
### Accessing Containers via HTTP
You can then enter this URL in the browser to interact with your hosted container instance:
### Securely Executing Untrusted Code
Beam's containers are launched in isolated environments from your host system, making it safe to execute untrusted or LLM-generated code.
## Parameters
Pods can be heavily customized to fit your needs.
### Using Custom Images
You can customize the container image using the [`Image`](/v2/reference/py-sdk#image) object. This can be customized with Python packages, shell commands, Conda packages, and much more.
```python Python theme={null}
from beam import Image, Pod
pod = Pod(
image=Image(base_image="jupyter/base-notebook:latest"),
entrypoint=["start-notebook.py"],
)
```
```shell CLI theme={null}
beam run --image jupyter/base-notebook:latest --entrypoint "start-notebook.py"
```
### Specifying Entry Points
An *entry point* is the command or script that will run when the container starts. You can interact with Pods using the CLI or the Python SDK.
```python Python theme={null}
from beam import Image, Pod
pod = Pod(
image=Image(base_image="jupyter/base-notebook:latest"),
entrypoint=["start-notebook.py"],
)
pod.create()
```
```shell CLI theme={null}
beam run \
--image jupyter/base-notebook:latest \
--entrypoint "start-notebook.py"
```
### Passing Environment Variables
You can pass environment variables into your container for credentials or other parameters. Like entry points, environment variables can be defined in both the CLI or the Python SDK:
```python Python theme={null}
from beam import Image, Pod
Pod(
image=Image(base_image="jupyter/base-notebook:latest"),
env={"NOTEBOOK_ARGS": "--ip='' --NotebookApp.token='' --NotebookApp.notebook_dir=/tmp"},
entrypoint=["start-notebook.py"],
)
```
```shell CLI theme={null}
beam run \
--image jupyter/base-notebook:latest \
--env NOTEBOOK_ARGS="--ip='' --NotebookApp.token='' --NotebookApp.notebook_dir=/tmp" \
--entrypoint "start-notebook.py"
```
## Deploying a Pod
Pods can be deployed as persistent endpoints using the `beam deploy` command.
When deploying a Pod, don't forget to include the `name` field.
```python app.py theme={null}
from beam import Pod
pod = Pod(
name="my-deployed-pod",
cpu=2,
memory="1Gi",
ports=[8000],
entrypoint=["python", "-m", "http.server", "8000"],
)
```
```sh theme={null}
beam deploy app.py:pod
```
## Terminating a Pod
Pod instances can be terminated directly using the `terminate()` method.
Alternatively, you can terminate the container the Pod is running on by using the `beam container stop ` command.
```python theme={null}
from beam import Pod
# Initialize a pod
notebook = Pod()
# Launch the pod
notebook.create()
# Terminate the pod
notebook.terminate()
```
## Lifecycle
### Timeouts
Pods are serverless and automatically scale-to-zero.
By default, pods will be terminated after 10 minutes without any active connections to the hosted URL or until the process exits by itself. Making a connection request (i.e. accessing the URL in your browser) will keep the container alive until the timeout is reached.
You can set a custom timeout by passing the `keep-warm-seconds` parameter when creating a pod. By specifying -1, the pod will not spin down to due inactivity, and will remain up until either the entrypoint process exits, or you explicitly stop the container.
**Keep Alive for 5 minutes**
```python theme={null}
beam run --image jupyter/base-notebook:latest --keep-warm-seconds 300
```
**Keep Alive Indefinitely**
*There is no upper limit on the duration of a session*.
```python theme={null}
beam run --image jupyter/base-notebook:latest --keep-warm-seconds -1
```
### List Running Pods
You can list all running Pods using the `beam container list` command.
```bash theme={null}
$ beam container list
ID Status Stub ID Deployment ID Scheduled At Uptime
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
pod-58a38dba-8b7b-4db3-b002-436c6d9b4858-a613eecd RUNNING 58a38dba-8b7b-4db3-b002-436c6d9b4858 5 seconds ago 4 seconds
1 items
```
You can kill any container by running `beam container stop `.
# Reference
Source: https://docs.beam.cloud/v2/reference/api
### Authentication
All APIs are authenticated through [Bearer Authentication](https://swagger.io/docs/specification/authentication/bearer-authentication/).
All API responses follow standard HTTP semantics. In case of an error, you
will receive a non-2xx response with a JSON body describing the error.
## Tasks
The Tasks API lets you interact with tasks. See the left navigation for detailed parameters and responses for each endpoint.
## Pods - Beta
The Pods API is currently in beta. It's an experimental API that allows you to create and interact with runtime containers and is subject to change. Please contact us if you have any feedback or questions.
The Pods API lets you create and interact with runtime containers. Full, interactive reference pages are generated from our OpenAPI definition in the sidebar under Pods. See the left navigation for detailed parameters, responses, and an interactive playground for each endpoint.
# List Containers/Pods/Sandboxes
Source: https://docs.beam.cloud/v2/reference/api-docs/gatewayservice/get-containers
v2/reference/api-docs/gateway.swagger.json GET /containers
List containers/pods/sandboxes associated with your workspace.
# Stop Container/Pod/Sandbox
Source: https://docs.beam.cloud/v2/reference/api-docs/gatewayservice/post-containers-stop
v2/reference/api-docs/gateway.swagger.json POST /containers/{containerId}/stop
Stop a running container/pod/sandbox by its ID.
# Create a Pod
Source: https://docs.beam.cloud/v2/reference/api-docs/podservice/post-pods
v2/reference/api-docs/pods.swagger.json POST /pods
Create a new pod and return its identifiers and initial state. Provide exactly one of `stubId` or `checkpointId`.
# Cancel Task
Source: https://docs.beam.cloud/v2/reference/api-docs/tasks/tasks-cancel
### Cancelling Tasks
Tasks can be cancelled through the `api.beam.cloud/v2/task/cancel/` endpoint.
#### Request
```bash theme={null}
curl -X DELETE --compressed 'https://api.beam.cloud/v2/task/cancel/' \
-H 'Authorization: Bearer [YOUR_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{"task_ids": ["TASK_ID"]}'
```
This API accepts a list of tasks, which can be passed in like this:
```json theme={null}
{
"task_ids": [
"70101e46-269c-496b-bc8b-1f7ceeee2cce",
"81bdd7a3-3622-4ee0-8024-733227d511cd",
"7679fb12-94bb-4619-9bc5-3bd9c4811dca"
]
}
```
#### Response
`200`
```json theme={null}
{}
```
# Get Task Status
Source: https://docs.beam.cloud/v2/reference/api-docs/tasks/tasks-status
### Query Task Status
You can check the status of any task by querying the `task` API:
```sh theme={null}
https://api.beam.cloud/v2/task/{TASK_ID}/
```
### Task Statuses
Your payload will return the status of the task. These are the possible statuses for a task:
| Status | Description |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `PENDING` | The task is enqueued and has not started yet. |
| `RUNNING` | The task is running. |
| `COMPLETE` | The task completed without any errors. |
| `RETRY` | The task is being retried. Defaults to 3, unless `max_retries` is provided in the function decorator. |
| `CANCELLED` | The task was cancelled by the client. |
| `TIMEOUT` | The task timed out, based on the `timeout` provided in the function decorator. |
| `EXPIRED` | The task remained in the queue and was never picked up by a worker. **For endpoints, this usually occurs when the task does not start running before the request timeout (180 seconds).** |
| `FAILED` | The task did not complete successfully. |
#### Request
```sh theme={null}
curl -X GET \
'https://api.beam.cloud/v2/task/{TASK_ID}/' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json'
```
#### Response
The response to `/task` returns the following data:
| Field | Type | Description |
| ------------------------- | ------- | ----------------------------------------------------------------------------------------------------------- |
| `id` | string | The unique identifier of the task. |
| `started_at` | string | The timestamp when the task started, in ISO 8601 format. Null if the task hasn't started yet. |
| `ended_at` | string | The timestamp when the task ended, in ISO 8601 format. Null if the task is still running or hasn't started. |
| `status` | string | The current status of the task (e.g., COMPLETE, RUNNING, etc.). |
| `container_id` | string | The identifier of the container running the task. |
| `updated_at` | string | The timestamp when the task was last updated, in ISO 8601 format. |
| `created_at` | string | The timestamp when the task was created, in ISO 8601 format. |
| `outputs` | array | An array containing the outputs of the task. |
| `stats` | object | An object containing statistics about the task's execution environment. |
| `stats.active_containers` | integer | The number of active containers for the task. |
| `stats.queue_depth` | integer | The depth of the queue for the deployment. |
| `stub` | object | An object containing detailed information about the task's configuration and deployment. |
| `stub.id` | string | The identifier of the deployment stub. |
| `stub.name` | string | The name of the deployment stub. |
| `stub.type` | string | The type of the deployment stub. |
| `stub.config` | string | The full runtime configuration for the deployment, returned as a JSON string. |
| `stub.config_version` | integer | The version number of the deployment stub configuration. |
| `stub.object_id` | integer | The object identifier associated with the deployment stub. |
| `stub.created_at` | string | The timestamp when the deployment stub was created, in ISO 8601 format. |
| `stub.updated_at` | string | The timestamp when the deployment stub was last updated, in ISO 8601 format. |
To parse `stub.config`, use `json.loads()` or an equivalent JSON decoder in
your language.
Here's what the response payload looks like as JSON:
```json theme={null}
{
"id": "07ce4078-bccc-4a42-b530-5f2653484a6a",
"started_at": "2024-07-22T14:02:28.466278Z",
"ended_at": "2024-07-22T14:02:28.475954Z",
"status": "COMPLETE",
"container_id": "endpoint-d327e987-759d-493e-b3e4-005774bcf998-8b747792",
"updated_at": "2024-07-22T14:02:28.477026Z",
"created_at": "2024-07-22T14:02:28.413232Z",
"outputs": [],
"stats": {
"active_containers": 0,
"queue_depth": 0
},
"stub": {
"id": "d327e987-759d-493e-b3e4-005774bcf998",
"name": "endpoint/deployment/app:squared",
"type": "",
"config": "{\"runtime\":{\"cpu\":1000,\"gpu\":\"\",\"memory\":128,\"image_id\":\"4724a2a2dfb601d8\"},\"handler\":\"app:squared\",\"on_start\":\"\",\"python_version\":\"python3.10\",\"keep_warm_seconds\":180,\"max_pending_tasks\":100,\"callback_url\":\"\",\"task_policy\":{\"max_retries\":0,\"timeout\":180,\"expires\":\"0001-01-01T00:00:00Z\"},\"workers\":1,\"authorized\":false,\"volumes\":null,\"autoscaler\":{\"type\":\"queue_depth\",\"max_containers\":1,\"tasks_per_container\":1}}",
"config_version": 0,
"object_id": 0,
"created_at": "0001-01-01T00:00:00Z",
"updated_at": "0001-01-01T00:00:00Z"
}
}
```
# CLI Reference
Source: https://docs.beam.cloud/v2/reference/cli
The `beam` CLI is a command-line utility that lets you work with Beam using `beam` commands, from uploading files to volumes to deploying your applications.
You can use the `--help` flag to get information about any command.
### Most Common Commands
You'll be using these a lot!
* `beam deploy` – deploy an app to the cloud
* `beam shell` – SSH into a container to debug it interactively
* `beam serve` – create a temporary live preview of your app
* `beam logs` – stream logs from a task, container, or deployment
## Installation
This installs the Beam SDK and CLI in your Python environment.
```bash theme={null}
uv tool install beam-client
```
You can find instructions for installing the CLI on Windows [here](/v2/getting-started/installation#windows).
## Setup Credentials
Beam will create a credentials file in `~/.beam/config.ini`. When you run `beam config create`, your API keys will be saved to this file.
## Config
Configures your Beam [API keys](/v2/reference/cli#setup-credentials) and saves a profile to `~/.beam/config.ini`
```bash theme={null}
beam config
```
```bash theme={null}
$ beam config create prod
Context Name [prod]:
Gateway Host [gateway.beam.cloud]:
Gateway Port [443]:
Token:
```
* `Context Name` (required) -- the name of the profile i.e. prod or staging.
* `Gateway Host` (optional) -- used only for self-hosting. If you are using the beam.cloud, you can leave this blank.
* `Gateway Port` (optional) -- used only for self-hosting. If you are using the beam.cloud, you can leave this blank.
* `Token` (required) -- your API token, found on [this page of the dashboard](https://platform.beam.cloud/settings/api-keys).
### Create
Create a new context.
```bash theme={null}
beam config create [NAME]
```
```bash theme={null}
$ beam config create prod-env
Context Name [prod-env]:
Gateway Host [gateway.beam.cloud]:
Gateway Port [443]:
Token: [YOUR-TOKEN]
Added new context!
```
If you are prompted to enter a value for `Gateway Host` or `Gateway Port`, you
can leave both fields blank.
### Delete
Delete a saved context.
```bash theme={null}
beam config delete [NAME]
```
```bash theme={null}
$ beam config delete prod-env
Do you want to continue? [y/N]: y
Deleted context prod.
```
### List
Lists saved contexts.
```bash theme={null}
beam config list
```
```bash theme={null}
$ beam config list
Name Host Port Token
───────────────────────────────────────────────────────
default gateway.beam.cloud 443 qONcMO...
staging gateway.beam.cloud 443 qONcMO...
```
### Select
Set the default context.
```bash theme={null}
beam config select [NAME]
```
```bash theme={null}
$ beam config select staging-env
Default context updated with 'staging-env'.
```
### Specifying Context
Most commands support the `--context` (or `-c`) flag, which allows you to specify which config profile to use for that command. This is useful when working with multiple environments (e.g., development, staging, production).
```bash theme={null}
beam deploy app.py:handler --context staging
beam task list -c production
beam volume list --context dev
```
If no context is specified, the default context (set via `beam config select`) will be used.
## Deployment
### Create
Deploys your app and creates a persistent web endpoint to access it.
You can run this command with `beam deploy [...]` or `beam deploy create [...]`.
```bash theme={null}
beam deploy [FILE:FUNCTION] --name [APP-NAME]
```
```bash theme={null}
$ beam deploy create app.py:handler --name inference-app
=> Building image
=> Using cached image
=> Syncing files
=> Files synced
=> Deploying endpoint
=> Deployed
```
### List
Lists all deployments.
```bash theme={null}
beam deployment list
```
```bash theme={null}
$ beam deployment list
ID Name Active Version Created At Updated At Stub Name Workspace Name
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
9c9e99aa-a64e-4… whisper-stt Yes 1 1 day ago 1 day ago endpoint/deplo… cf2db0
e1831baa-e4a9-4… inference-app Yes 2 5 days ago 5 days ago endpoint/deplo… cf2db0
6983cfe2-abf7-4… vllm-app Yes 2 7 days ago 7 days ago endpoint/deplo… cf2db0
3 items
```
### Stop
Stops a deployment.
```bash theme={null}
beam deployment stop [DEPLOYMENT-ID]
```
```bash theme={null}
$ beam deployment stop c7b9fdaa-a25a-4db0-a825-c31f94c91c3f
Stopped c7b9fdaa-a25a-4db0-a825-c31f94c91c3f
```
### Start
Starts an inactive deployment.
```bash theme={null}
beam deployment start [DEPLOYMENT-ID]
```
```bash theme={null}
$ beam deployment start c555edd8-3f10-4b54-ac1c-4e1e5e10eabd
Starting c555edd8-3f10-4b54-ac1c-4e1e5e10eabd
```
### Delete
Deletes a deployment.
```bash theme={null}
beam deployment delete [DEPLOYMENT-ID]
```
```bash theme={null}
$ beam deployment delete c7b9fdaa-a25a-4db0-a825-c31f94c91c3f
Deleted deployment c7b9fdaa-a25a-4db0-a825-c31f94c91c3f
```
# Shell
### SSH Into Containers
Allows you to interactively access a container on Beam.
```bash theme={null}
$ beam shell app.py:handler
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
Welcome to Ubuntu 22.04.5 LTS (GNU/Linux 6.8.0-51-generic x86_64)
root@runc:/mnt/code#
```
You can also shell into a running container, by passing in a `container_id`:
```bash theme={null}
beam shell --container-id
```
# Serve
### Create a Preview Environment
Creates a temporary deployment preview.
```bash theme={null}
beam serve [FILE:FUNCTION]
```
```bash theme={null}
$ beam serve app.py:predict
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/id/55108039-e3bf-409b-bad5-f4982b2f1c02' \
-H 'Accept: */*' \
-H 'Accept-Encoding: gzip, deflate' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
=> Watching /Users/beam for changes...
⠇ Serving endpoint...
```
## Container
Manage the containers running in your account.
### List
Lists all containers.
```bash theme={null}
beam container list
```
```bash theme={null}
$ beam container list
ID Status Stub Id Scheduled At
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
function-ee5c046f-985f-40b7-a0fa-477794e0a052-6c0d5340 RUNNING 27d567fe-8bd3-41b4-bd5b-2e6ce1afb454 3 seconds ago
1 item(s)
```
### Stop
Terminate a running container.
```bash theme={null}
beam container stop [CONTAINER-ID]
```
```bash theme={null}
$ beam container stop function-ee5c046f-985f-40b7-a0fa-477794e0a052-6c0d5340
Stopped container.
```
## Task
Any code you run on Beam creates a task. Any time you run a function or invoke an API, a task is created.
### List Tasks
Lists all tasks.
```bash theme={null}
beam task list
```
```bash theme={null}
$ beam task list
Task ID Status Started At Ended At Container ID Stub Name Workspace Name
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
29d739d5-624f-41… COMPLETE 29 minutes ago 29 minutes ago endpoint-b4e9a64… endpoint/deploym… cf2db0
f45fe342-0bff-4e… COMPLETE 35 minutes ago 35 minutes ago endpoint-c1dd0b6… endpoint/deploym… cf2db0
4b9a1b6f-e34d-4d… COMPLETE 1 day ago 1 day ago endpoint-05fb7c9… endpoint/deploym… cf2db0
38910b63-8c8a-42… COMPLETE 1 day ago 1 day ago endpoint-05fb7c9… endpoint/deploym… cf2db0
cf051d10-fa28-42… COMPLETE 1 day ago 1 day ago endpoint-05fb7c9… endpoint/deploym… cf2db0
```
### Stop a Task
Stops a task.
```bash theme={null}
beam task stop [TASK-ID]
```
```bash theme={null}
$ beam task stop c6d9e4a3-9262-485a-a7bb-a72980008c02
Stopped task c6d9e4a3-9262-485a-a7bb-a72980008c02.
```
## Volume
Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
### Create a Volume
Creates a volume.
```bash theme={null}
beam volume create [VOLUME-NAME]
```
```bash theme={null}
$ beam volume create weights
Name Created At Updated At Workspace Name
───────────────────────────────────────────────────────
weights May 07 2024 May 07 2024 cf2db0
```
### Delete a Volume
```bash theme={null}
beam volume delete [VOLUME-NAME]
```
```bash theme={null}
$ beam volume delete model-weights
Any apps (functions, endpoints, task queue, etc) that
refer to this volume should be updated before it is deleted.
Are you sure? (y/n) [n]: y
Deleted volume model-weights
```
### List Volumes
List all volumes mounted to your apps.
```bash theme={null}
beam volume list
```
```bash theme={null}
$ beam volume list
Name Size Created At Updated At Workspace Name
─────────────────────────────────────────────────────────────────────────────────────
weights 240.23 MiB 2 days ago 2 days ago cf2db0
1 volumes | 240.23 MiB used
```
### List Volume Contents
List all contents of a volume.
```bash theme={null}
beam ls [VOLUME-NAME]
```
```bash theme={null}
$ beam ls weights
Name Size Modified Time IsDir
──────────────────────────────────────────────────────────────────
.locks 0.00 B 29 minutes ago Yes
models--facebook--opt-125m 240.23 MiB 28 minutes ago Yes
2 items | 240.23 MiB used
```
### Copy Files to Volumes
Copies a file to a volume.
```bash theme={null}
beam cp [LOCAL-PATH] beam://[VOLUME-NAME]
```
```bash theme={null}
=> weights (copying 1 object)
[LennonBeatlemania.pth] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 kB 0:00:00
```
### Move Files in Volumes
Move files around a volume.
```bash theme={null}
beam mv [SOURCE] [DEST]
```
```bash theme={null}
$ beam mv file.txt files/text-files
Moved file.txt to files/text-files/file.txt
```
### Remove Files from Volumes
Remove a file from a volume.
```bash theme={null}
beam rm [FILE]
```
```bash theme={null}
=> weights/app.py (1 object deleted)
app.py
```
### Downloading Data
You can download directories and individual files.
```bash theme={null}
beam cp beam://myvol/file.txt . # beam://myvol/file.txt => ./file.txt
beam cp beam://myvol/file.txt file.new # beam://myvol/file.txt => ./file.new
beam cp beam://myvol/mydir . # beam://myvol/mydir/file.txt => ./file.txt
```
## Secret
Secrets and environment variables can be injected into the containers that run your apps. After a secret is saved, it can be used in your application code like this:
```python theme={null}
from beam import function
@function(secrets=["AWS_ACCESS_KEY"])
def handler():
import os
my_secret = os.environ["AWS_ACCESS_KEY"]
print(f"Secret: {my_secret}")
```
### List Secrets
Lists all secrets that exist.
```bash theme={null}
beam secret list
```
```bash theme={null}
$ beam secret list
Name Last Updated Created
──────────────────────────────────────────────────
AWS_KEY 19 hours ago 19 hours ago
AWS_ACCESS_KEY 20 seconds ago 20 seconds ago
AWS_REGION 7 seconds ago 7 seconds ago
3 items
```
### Create a Secret
Creates a secret
```bash theme={null}
beam secret create [KEY] [VALUE]
```
```bash theme={null}
$ beam secret create AWS_ACCESS_KEY ASIAY34FZKBOKMUTVV7A
=> Created secret with name: 'AWS_ACCESS_KEY'
```
### Show a Secret
Shows the value of a secret.
```bash theme={null}
beam secret show [KEY]
```
```bash theme={null}
$ beam secret show AWS_ACCESS_KEY
=> Secret 'AWS_ACCESS_KEY': ASIAY34FZKBOKMUTVV7A
```
### Modify a Secret
Modifies the value of a secret.
```bash theme={null}
beam secret modify [KEY] [VALUE]
```
```bash theme={null}
$ beam secret modify AWS_ACCESS_KEY ASIAY34FZKBOKMUTVV7A
=> Modified secret 'AWS_ACCESS_KEY'
```
### Delete a Secret
Permanently deletes a secret.
```bash theme={null}
beam secret delete [KEY]
```
```bash theme={null}
$ beam secret delete AWS_ACCESS_KEY
=> Deleted secret 'AWS_ACCESS_KEY'
```
## Logs
You can stream logs from a task, deployment, or a container to your shell.
### Deployment
Streams logs for a deployment.
```bash theme={null}
beam logs --deployment-id [DEPLOYMENT-ID]
```
You can find the deployment ID by running `beam deployment list`.
```bash theme={null}
$ beam logs --deployment-id 2089b5b5-9b2a-450e-9a56-a50d4f0a8d4c
Starting task worker[0]
Worker[0] ready
Running task <25156946-2c3a-4900-8a2c-568f162493a7>
Task completed <25156946-2c3a-4900-8a2c-568f162493a7>, took 3.4570693s
Spinning down taskqueue
```
### Task
Streams logs for a task.
```bash theme={null}
beam logs --task-id [TASK-ID]
```
You can find the task ID by running `beam task list`.
### Container
Streams logs for a container.
```bash theme={null}
beam logs --container-id [CONTAINER-ID]
```
You can find the container ID by running `beam container list`.
### Stub ID
Streams logs for a stub ID.
```bash theme={null}
beam logs --stub-id [STUB-ID]
```
## Machine
Manage the machines available on Beam.
### List
List the available GPUs at any given moment.
```bash theme={null}
$ beam machine list
GPU Type Available
──────────────────────
A10G Yes
H100 Yes
```
## Helpers and utils
### Ignore Local Files
You can create a `.beamignore` file in your project's root directory to tell Beam which local files and directories to ignore when syncing to Beam.
This follows the conventions of [`.gitignore`](https://git-scm.com/docs/gitignore)
**Ignoring Files**
```
do_not_sync.txt
.DS_Store
```
**Ignoring Folders**
```
images/*
node_modules/*
```
# Python SDK Reference
Source: https://docs.beam.cloud/v2/reference/py-sdk
Beam's Python SDK is the heart of the Beam platform. Unlike traditional cloud providers, Beam apps are defined entirely in code — no YAML, no config files. All infrastructure and runtime configuration is expressed in Python.
This reference outlines every available decorator, object, and configuration option in the SDK. For a quickstart or high-level overview, check out the [Getting Started guide](/v2/getting-started/quickstart).
# Environment
### `Image`
Defines a custom container image that your code will run in.
An Image object encapsulates the configuration of a custom container image that will be used as the runtime environment for executing tasks.
```python theme={null}
from beam import endpoint, Image
image = (
Image(
base_image="docker.io/nvidia/cuda:12.3.1-runtime-ubuntu20.04",
python_version="python3.9",
)
.add_commands(["apt-get update -y", "apt-get install ffmpeg -y"])
.add_python_packages(["transformers", "torch"])
.build_with_gpu(gpu="A10G")
)
@endpoint(image=image)
def handler():
return {}
```
The Python version to be used in the image. Defaults to Python 3.8.
A list of Python packages to install in the container image. Alternatively, a
string containing a path to a requirements.txt can be provided. Default is \[].
A list of shell commands to run when building your container image. These
commands can be used for setting up the environment, installing dependencies,
etc. Default is \[].
A custom base image to replace the default ubuntu20.04 image used in your container. This can be a public or private image from Docker Hub, Amazon ECR, Google Cloud Artifact Registry, or
NVIDIA GPU Cloud Registry. The formats for these registries are respectively `docker.io/my-org/my-image:0.1.0`,
`111111111111.dkr.ecr.us-east-1.amazonaws.com/my-image:latest`,
`us-east4-docker.pkg.dev/my-project/my-repo/my-image:0.1.0`, and `nvcr.io/my-org/my-repo:0.1.0`. Default is None.
A key/value pair or key list of environment variables that contain credentials to
a private registry. When provided as a dict, you must supply the correct keys and values.
When provided as a list, the keys are used to lookup the environment variable value
for you. Default is None.
#### List of Base Image Creds
```python theme={null}
image = Image(
base_image="111111111111.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
base_image_creds=[
"AWS_ACCESS_KEY_ID",
"AWS_SECRET_ACCESS_KEY",
"AWS_SESSION_TOKEN",
"AWS_REGION",
],
)
```
#### Dict of Base Image Creds
```python theme={null}
image = Image(
base_image="111111111111.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
base_image_creds={
"AWS_ACCESS_KEY_ID": "xxxx",
"AWS_SECRET_ACCESS_KEY": "xxxx"
"AWS_REGION": "xxxx"
},
)
```
Adds environment variables to an image. These will be available when building the image
and when the container is running. This can be a string, a list of strings, or a
dictionary of strings. The string must be in the format of `KEY=VALUE`. If a list of
strings is provided, each element should be in the same format. Default is None.
Builds the image on a GPU.
### `Image.from_registry()`
Create an Image from a remote container registry.
The full URI of the registry image.
Credentials for private registries. Either a dict of key to value, or a list
of env var keys to read at build time.
```python theme={null}
from beam import Image, endpoint
image = Image.from_registry("docker.io/library/python:3.11-slim")
@endpoint(image=image)
def handler():
return {}
```
### `Image.from_id()`
Create an image from a filesystem snapshot.
Snapshot to use as the base.
```python theme={null}
Image.from_id("snapshot-123")
```
### `Image.from_dockerfile()`
Build the base image using a local Dockerfile.
Path to Dockerfile.
Directory to sync as build context. Defaults to the Dockerfile directory.
```python theme={null}
image = Image.from_dockerfile("./Dockerfile", context_dir="./app").add_python_packages(["uvicorn"])
```
### `Image.add_python_packages()`
Queue pip packages to install during the build. Accepts a list or a path to requirements.txt.
Package names or a `requirements.txt` path.
```python theme={null}
image = Image().add_python_packages(["transformers==4.44.0", "torch==2.4.0"])
```
### `Image.add_commands()`
Shell commands to run during the build in the order added.
Shell commands.
```python theme={null}
image = Image().add_commands(["apt-get update -y", "apt-get install -y ffmpeg"])
```
### `Image.with_envs()`
Add environment variables available during build and at runtime.
One `KEY=VALUE`, a list of them, or a dict.
```python theme={null}
image = Image().with_envs({"HF_HOME":"/models","HF_HUB_ENABLE_HF_TRANSFER":"1"})
```
### `Image.with_secrets()`
Expose platform secrets to the build environment.
Secret names created via the platform.
```python theme={null}
image = Image().with_secrets(["HF_TOKEN"])
```
### `Image.micromamba()`
Switch package management to micromamba and target a micromamba Python.
```python theme={null}
image = Image(python_version="python3.11").micromamba()
```
### `Image.add_micromamba_packages()`
Install micromamba packages and optional channels.
Package names or a `requirements.txt` path.
Micromamba channels.
```python theme={null}
image = Image().micromamba().add_micromamba_packages(packages=["pandas","numpy"], channels=["conda-forge"])
```
### `Image.build_with_gpu()`
Request the build to run on a GPU node. Useful when installers detect GPU and compile CUDA parts.
GPU type such as `T4`, `A10G`, `H100`, `4090`.
```python theme={null}
image = Image().add_commands(["pip install xformers"]).build_with_gpu("A10G")
```
## `Context`
Context is a dataclass used to store various useful fields you might want to access in your entry point logic.
| Field Name | Type | Default Value | Purpose |
| ---------------- | -------------- | ------------- | ------------------------------------------------------ |
| `container_id` | Optional\[str] | None | Unique identifier for a container |
| `stub_id` | Optional\[str] | None | Identifier for a stub |
| `stub_type` | Optional\[str] | None | Type of the stub (function, endpoint, task queue, etc) |
| `callback_url` | Optional\[str] | None | URL called when the task status changes |
| `task_id` | Optional\[str] | None | Identifier for the specific task |
| `timeout` | Optional\[int] | None | Maximum time allowed for the task to run (seconds) |
| `on_start_value` | Optional\[Any] | None | Any values returned from the `on_start` function |
| `bind_port` | int | 0 | Port number to bind a service to |
| `python_version` | str | "" | Version of Python to be used |
## `Client`
You can use this to track the state of tasks and deployments.
```python theme={null}
from beam import Client, function
@function()
def handler():
client = Client(
token="YOUR_TOKEN"
)
# Get a deployment by its ID
deployment = client.get_deployment_by_id("YOUR_DEPLOYMENT_ID")
# Submit a task
file = client.upload_file("./app.py")
task = deployment.submit(input={"audio_file": file})
task = deployment.submit(input={"text": "Hello world"})
# Get a deployment by its stub ID
deployment = client.get_deployment_by_stub_id("YOUR_STUB_ID")
task = deployment.submit(input={"data": "example"})
# Get a task by its ID
task = client.get_task_by_id("YOUR_TASK_ID")
result = task.result(wait=True)
if __name__ == "__main__":
handler.remote()
```
Authentication token for the Beam API. If not provided, will use the
`BEAM_TOKEN` environment variable.
### `Client.upload_file()`
Upload a local file to be used as input to a function or deployment.
The path to the local file to upload.
```python theme={null}
client = Client(token="YOUR_TOKEN")
file_url = client.upload_file("./data.csv")
```
### `Client.get_task_by_id()`
Retrieve a task by its task ID.
The task ID to retrieve.
```python theme={null}
client = Client(token="YOUR_TOKEN")
task = client.get_task_by_id("YOUR_TASK_ID")
result = task.result(wait=True)
```
### `Client.get_deployment_by_id()`
Retrieve a deployment using its deployment ID.
The deployment ID to retrieve.
```python theme={null}
client = Client(token="YOUR_TOKEN")
deployment = client.get_deployment_by_id("YOUR_DEPLOYMENT_ID")
task = deployment.submit(input={"text": "Hello world"})
```
### `Client.get_deployment_by_stub_id()`
Retrieve a deployment using the associated stub ID.
The stub ID associated with the deployment.
```python theme={null}
client = Client(token="YOUR_TOKEN")
deployment = client.get_deployment_by_stub_id("YOUR_STUB_ID")
task = deployment.submit(input={"data": "example"})
```
## Sandbox
A sandboxed container for running Python code or arbitrary processes.
You can use this to create isolated environments where you can execute code,
manage files, and run processes.
Whether to block all outbound network access from the sandbox. When enabled,
the sandbox cannot make outbound connections to external services, but inbound
connections to exposed ports are still allowed. Cannot be used together with
`allow_list`.
List of CIDR ranges that the sandbox is allowed to connect to. All other
outbound network access will be blocked. Must use CIDR notation (e.g.,
`"8.8.8.8/32"` for a single IP, `"10.0.0.0/8"` for a range). Supports both
IPv4 and IPv6. Maximum of 10 CIDR entries. Cannot be used together with
`block_network`.
List of ports to expose from the sandbox. When specified, these ports will
be accessible via public URLs immediately upon sandbox creation. You can also
dynamically expose additional ports at runtime using the `expose_port()` method.
Default is an empty list.
### `Sandbox.connect()`
Connect to an existing sandbox instance by ID.
The container ID of the existing sandbox instance.
```python theme={null}
# Connect to an existing sandbox
instance = sandbox.connect("sandbox-123")
```
### `Sandbox.create()`
Create a new sandbox instance.
This method creates a new containerized sandbox environment with the
specified configuration.
```python theme={null}
# Create a new sandbox
instance = Sandbox().create()
print(f"Sandbox created with ID: {instance.sandbox_id()}")
```
### `Sandbox.create_from_memory_snapshot()`
Create a new sandbox instance from a memory snapshot.
This method creates a new containerized sandbox environment with the
specified configuration from a memory snapshot.
```python theme={null}
# Create a new sandbox from a memory snapshot
instance = Sandbox().create_from_memory_snapshot(snapshot_id)
print(f"Sandbox created with ID: {instance.sandbox_id()}")
```
### `Sandbox.debug()`
Print the debug buffer contents to stdout.
This method outputs any debug information that has been collected
during sandbox operations.
## SandboxInstance
A sandbox instance that provides access to the sandbox internals.
This class represents an active sandboxed container and provides methods for
process management, file system operations, preview URLs, and lifecycle
management.
### `SandboxInstance.expose_port()`
Dynamically expose a port to the internet.
This method creates a public URL that allows external access to a specific
port within the sandbox. The URL is SSL-terminated and provides secure
access to services running in the sandbox.
The port number to expose within the sandbox.
```python theme={null}
# Expose port 8000 for a web service
url = instance.expose_port(8000)
print(f"Web service available at: {url}")
```
### `SandboxInstance.list_urls()`
List the URLs / ports that are exposed on the sandbox.
This method returns a list of preview URLs / ports that are exposed on the sandbox.
```python theme={null}
for port, url in instance.list_urls().items():
print(f"Port {port} available at: {url}")
print(f"Available URLs: {urls}")
```
### `SandboxInstance.update_network_permissions()`
Update the network permissions of a running sandbox without restart.
This method allows you to dynamically change network access policies while
the sandbox is running. You can block all outbound traffic or specify an
allowlist of CIDR ranges. Exposed ports remain accessible regardless of
these restrictions.
Whether to block all outbound network access from the sandbox. When enabled,
the sandbox cannot make outbound connections to external services, but inbound
connections to exposed ports are still allowed. Cannot be used together with
`allow_list`.
List of CIDR ranges that the sandbox is allowed to connect to. All other
outbound network access will be blocked. Must use CIDR notation (e.g.,
`"8.8.8.8/32"` for a single IP, `"10.0.0.0/8"` for a range). Supports both
IPv4 and IPv6. Maximum of 10 CIDR entries. Cannot be used together with
`block_network=True`.
```python theme={null}
# Block all outbound traffic
instance.update_network_permissions(block_network=True)
# Allow only specific destinations
instance.update_network_permissions(
allow_list=["8.8.8.8/32", "10.0.0.0/8"]
)
# Remove all restrictions
instance.update_network_permissions(block_network=False, allow_list=[])
```
### `SandboxInstance.sandbox_id()`
Get the ID of the sandbox.
```python theme={null}
sandbox_id = instance.sandbox_id()
print(f"Working with sandbox: {sandbox_id}")
```
### `SandboxInstance.terminate()`
Terminate the container associated with this sandbox instance.
This method stops the sandbox container and frees up associated resources.
Once terminated, the sandbox instance cannot be used for further operations.
```python theme={null}
# Terminate the sandbox
success = instance.terminate()
if success:
print("Sandbox terminated successfully")
```
### `SandboxInstance.update_ttl()`
Update the keep warm setting of the sandbox.
This method allows you to change how long the sandbox will remain active
before automatically shutting down.
The number of seconds to keep the sandbox alive. Use -1 for sandboxes that
never timeout.
```python theme={null}
# Keep the sandbox alive for 1 hour
instance.update_ttl(3600)
# Make the sandbox never timeout
instance.update_ttl(-1)
```
### `SandboxInstance.create_image_from_filesystem()`
Create a filesystem snapshot of the current sandbox.
This method captures the filesystem state of the sandbox as an immutable artifact.
You can later restore this snapshot into a new sandbox instance.
```python theme={null}
# Take a filesystem snapshot of the sandbox
image_id = instance.create_image_from_filesystem()
print(f"Created image: {image_id}")
```
### `SandboxInstance.snapshot_memory()`
Create a memory snapshot of the current sandbox.
This method captures the memory state of the sandbox as an immutable artifact.
You can later restore this snapshot into a new sandbox instance.
```python theme={null}
# Take a memory snapshot of the sandbox
snapshot_id = instance.snapshot_memory()
print(f"Created snapshot: {snapshot_id}")
```
## SandboxProcess
Represents a running process within a sandbox.
This class provides control and monitoring capabilities for processes
running in the sandbox. It allows you to wait for completion, kill
processes, check status, and access output streams.
### `SandboxProcess.kill()`
Kill the process.
This method forcefully terminates the running process. Use this
when you need to stop a process that is not responding or when
you want to cancel a long-running operation.
```python theme={null}
process = pm.exec("sleep", "100")
# Kill the process after 5 seconds
import time
time.sleep(5)
process.kill()
```
### `SandboxProcess.logs()`
Returns a combined stream of both stdout and stderr.
This is a convenience property that combines both output streams.
The streams are read concurrently, so if one stream is empty, it won't block
the other stream from being read.
```python theme={null}
process = pm.exec("python3", "-c", "import sys; print('stdout'); print('stderr', file=sys.stderr)")
# Read combined output
for line in process.logs:
print(f"LOG: {line.strip()}")
# Or read all at once
all_logs = process.logs.read()
```
### `SandboxProcess.status()`
Get the status of the process.
This method returns the current exit code and status string of the process.
An exit code of -1 indicates the process is still running.
```python theme={null}
process = pm.exec("sleep", "5")
# Check status periodically
while True:
exit_code, status = process.status()
if exit_code >= 0:
print(f"Process finished with exit code: {exit_code}")
break
time.sleep(1)
```
Get a handle to a stream of the process's stderr.
```python theme={null}
process = pm.exec("python3", "-c", "import sys; print('Error', file=sys.stderr)")
stderr_content = process.stderr.read()
print(f"STDERR: {stderr_content}")
```
Get a handle to a stream of the process's stdout.
```python theme={null}
process = pm.exec("echo", "Hello World")
stdout_content = process.stdout.read()
print(f"STDOUT: {stdout_content}")
```
### `SandboxProcess.wait()`
Wait for the process to complete.
This method blocks until the process finishes execution and returns
the exit code. It polls the process status until completion.
```python theme={null}
process = pm.exec("long_running_command")
exit_code = process.wait()
if exit_code == 0:
print("Command completed successfully")
```
## SandboxProcessManager
Manager for executing and controlling processes within a sandbox.
This class provides a high-level interface for running commands and Python
code within the sandbox environment. It supports both blocking and non-blocking
execution, environment variable configuration, and working directory specification.
### `SandboxProcessManager.exec()`
Run an arbitrary command in the sandbox.
This method executes shell commands within the sandbox environment.
The command is executed using the shell available in the sandbox.
The command and its arguments to execute.
The working directory to run the command in. Default is None.
Environment variables to set for the command. Default is None.
```python theme={null}
# Run a simple command
process = pm.exec("ls", "-la")
process.wait()
# Run with custom environment
process = pm.exec("echo", "$CUSTOM_VAR", env={"CUSTOM_VAR": "hello"})
# Run in specific directory
process = pm.exec("pwd", cwd="/tmp")
```
### `SandboxProcessManager.get_process()`
Get a process by its PID.
The process ID to look up.
```python theme={null}
try:
process = pm.get_process(12345)
print(f"Found process: {process.pid}")
except SandboxProcessError:
print("Process not found")
```
### `SandboxProcessManager.list_processes()`
List all processes running in the sandbox.
```python theme={null}
processes = pm.list_processes()
for pid, process in processes.items():
print(f"Process {pid} is running")
```
### `SandboxProcessManager.run_code()`
Run Python code in the sandbox.
This method executes Python code within the sandbox environment. The code
is executed using the Python interpreter available in the sandbox.
The Python code to execute.
Whether to wait for the process to complete. If True, returns
SandboxProcessResponse. If False, returns SandboxProcess.
The working directory to run the code in. Default is None.
Environment variables to set for the process. Default is None.
```python theme={null}
# Run blocking Python code
result = pm.run_code("print('Hello from sandbox!')")
print(result.result)
# Run non-blocking Python code
process = pm.run_code("import time; time.sleep(10)", blocking=False)
# Do other work while process runs
process.wait()
```
## SandboxProcessResponse
Response object containing the results of a completed process execution.
This class encapsulates the output and status information from a process
that has finished running in the sandbox.
## SandboxProcessStream
A stream-like interface for reading process output in real-time.
This class provides an iterator interface for reading stdout or stderr
from a running process. It buffers output and provides both line-by-line
iteration and bulk reading capabilities.
Example:
```python theme={null}
# Get a process stream
process = pm.exec("echo", "Hello World")
# Read line by line
for line in process.stdout:
print(f"Output: {line.strip()}")
# Read all output at once
all_output = process.stdout.read()
```
### `SandboxProcessStream()`
### `SandboxProcessStream.read()`
Read all remaining output from the stream.
## SandboxProcessError
## SandboxConnectionError
## SandboxFileInfo
Metadata of a file in the sandbox.
This class provides detailed information about files and directories
within the sandbox filesystem, including permissions, ownership,
and modification times.
## SandboxFileSystem
File system interface for managing files within a sandbox.
This class provides a comprehensive API for file operations within
the sandbox, including uploading, downloading, listing, and managing
files and directories.
### `SandboxFileSystem.create_directory()`
Create a directory in the sandbox.
Note: This method is not yet implemented.
The path where the directory should be created.
### `SandboxFileSystem.delete_directory()`
Delete a directory in the sandbox.
Note: This method is not yet implemented.
The path of the directory to delete.
### `SandboxFileSystem.delete_file()`
Delete a file in the sandbox.
This method removes a file from the sandbox filesystem.
The path to the file within the sandbox.
```python theme={null}
# Delete a temporary file
fs.delete_file("/tmp/temp_file.txt")
# Delete a log file
fs.delete_file("/var/log/old_log.log")
```
### `SandboxFileSystem.download_file()`
Download a file from the sandbox to a local path.
This method downloads a file from the sandbox filesystem and
saves it to the specified local path.
The path to the file within the sandbox.
The destination path on the local filesystem.
```python theme={null}
# Download a log file
fs.download_file("/var/log/app.log", "local_app.log")
# Download to a specific directory
fs.download_file("/output/result.txt", "./results/result.txt")
```
### `SandboxFileSystem.find_in_files()`
Find files matching a pattern in the sandbox.
This method searches for files within the specified directory
that match the given pattern.
The directory path to search in.
The pattern to match files against.
```python theme={null}
# Find all Python files
python_files = fs.find_in_files("/workspace", "*.py")
# Find all log files
log_files = fs.find_in_files("/var/log", "*.log")
```
### `SandboxFileSystem.list_files()`
List the files in a directory in the sandbox.
This method returns information about all files and directories
within the specified directory in the sandbox.
The path to the directory within the sandbox.
```python theme={null}
# List files in the root directory
files = fs.list_files("/")
for file_info in files:
if file_info.is_dir:
print(f"Directory: {file_info.name}")
else:
print(f"File: {file_info.name} ({file_info.size} bytes)")
# List files in a specific directory
workspace_files = fs.list_files("/workspace")
```
### `SandboxFileSystem.replace_in_files()`
Replace a string in all files in a directory.
This method performs a find-and-replace operation on all files
within the specified directory, replacing occurrences of the
old string with the new string.
The directory path to search in.
The string to find and replace.
The string to replace with.
```python theme={null}
# Replace a configuration value
fs.replace_in_files("/config", "old_host", "new_host")
# Update version numbers
fs.replace_in_files("/app", "1.0.0", "1.1.0")
```
### `SandboxFileSystem.stat_file()`
Get the metadata of a file in the sandbox.
This method retrieves detailed information about a file or directory
within the sandbox, including size, permissions, ownership, and
modification time.
The path to the file within the sandbox.
```python theme={null}
# Get file information
file_info = fs.stat_file("/path/to/file.txt")
print(f"File size: {file_info.size} bytes")
print(f"Is directory: {file_info.is_dir}")
print(f"Modified: {file_info.mod_time}")
```
### `SandboxFileSystem.upload_file()`
Upload a local file to the sandbox.
This method reads a file from the local filesystem and uploads
it to the specified path within the sandbox.
The path to the local file to upload.
The destination path within the sandbox.
```python theme={null}
# Upload a Python script
fs.upload_file("my_script.py", "/workspace/script.py")
# Upload to a subdirectory
fs.upload_file("config.json", "/app/config/config.json")
```
## SandboxFileSystemError
## SandboxFilePosition
A position in a file.
### `SandboxFilePosition()`
## SandboxFileSearchMatch
A match in a file.
### `SandboxFileSearchMatch()`
## SandboxFileSearchRange
A range in a file.
### `SandboxFileSearchRange()`
## `Pod`
A **Pod** is an object that allows you to run arbitrary services in a fast, scalable, and secure remote container on Beam.
You can think of a Pod as a lightweight compute environment that you fully control—complete with a custom container, ports you can expose, environment variables, volumes, secrets, and GPUs.
```python theme={null}
from beam import Pod, Image
# Create a Pod that runs a simple HTTP server
pod = Pod(
name="web-server",
cpu=2,
memory="512Mi",
image=Image(
base_image="python:3.9-slim",
python_packages=["requests"],
),
ports=[8000], # We'll expose port 8000
)
# Spin up the Pod container, running `python -m http.server 8000`
result = pod.create(entrypoint=["python", "-m", "http.server", "8000"])
print("Container ID:", result.container_id)
print("URL:", result.url)
```
When you run this snippet (e.g., python app.py), Beam will:
* Build your container (if necessary) and sync your local files to the remote environment.
* Create a Pod container with the specified resources (2 CPU cores, 512 MiB memory).
* Run `python -m http.server 8000` inside that remote container.
* Expose the container on port 8000. You’ll get back a container ID and a URL to access it.
* Once the Pod is running, you can perform additional operations—like opening an interactive shell inside the container or deploying the Pod as a named app.
The command to run in the container. By default, nothing is specified, so you
must provide an entrypoint to actually run anything. You can override or
provide this entrypoint at creation time using `pod.create(entrypoint=...)`.
A list of ports to expose. If provided, the container will be accessible
through an HTTP URL for each port opened. For example, if `[8000]` is
specified, you'll get `:8000`.
An optional name for the pod. If you plan to deploy this Pod (i.e., treat it
as a persistent app), you should specify a name. If you do not specify a name,
Beam will generate a random name at deploy time, or you must specify
`--name=...` in the CLI.
The amount of CPU allocated to the container. For example, `2` means 2 CPU
cores, `"500m"` might mean half a CPU core, `1.0` means 1 CPU core, etc.
The amount of memory (in MiB) allocated to the container. You can also specify
this as a string with units (e.g., `"512Mi"`, `"2Gi"`).
The type or name of the GPU device to be used for GPU-accelerated tasks. You
can specify multiple GPUs by providing a list (in which case the scheduler
prioritizes their selection based on the order in the list). If no GPU is
required, leave it empty.
The number of GPUs allocated to the container. If a GPU is specified but this
value is set to 0, it will automatically be updated to 1.
The container image to be used for running the Pod. Defaults to a basic Beam
`Image` object, which can be customized (e.g., `base_image=`,
`python_packages=`, and more).
A list of volumes to be mounted into the container. Volumes allow you to
persist data or mount external storage services, such as S3-compatible
buckets.
A list of secrets that are injected into the container as environment
variables. Each secret must be configured in your Beam project.
A dictionary of environment variables to inject into the container.
For example: `{"MY_API_KEY": "abc123"}`.
The number of seconds to keep the container alive after the last request. A
value of `-1` means never scale down to zero (i.e., keep the container running
indefinitely). This only applies if you deploy the Pod.
If `False`, allows the container to be accessed without an auth token. This is
useful for public-facing services. If you need to secure it behind an auth
token, set it to `True`.
Whether to block all outbound network access from the pod. When enabled, the
pod cannot make outbound connections to external services, but inbound
connections to exposed ports are still allowed. Cannot be used together with
`allow_list`.
List of CIDR ranges that the pod is allowed to connect to. All other outbound
network access will be blocked. Must use CIDR notation (e.g., `"8.8.8.8/32"`
for a single IP, `"10.0.0.0/8"` for a range). Supports both IPv4 and IPv6.
Maximum of 10 CIDR entries. Cannot be used together with `block_network`.
#### `Create`
Create a new container that runs until it completes or is explicitly killed.
```python theme={null}
from beam import Pod
pod = Pod(cpu=2, memory="1Gi", ports=[8080])
result = pod.create(entrypoint=["python", "-m", "http.server", "8080"])
if result.ok:
print("Pod created successfully!")
print("Container ID:", result.container_id)
print("URL:", result.url)
else:
print("Failed to create Pod")
```
#### `Deploy`
Deploy the Pod as a named persistent service. Pods can be deployed programmatically via Python, or CLI.
**Deploying via Python**
```python app.py theme={null}
from beam import Pod
pod = Pod(
name="my-deployed-pod",
cpu=2,
memory="1Gi",
ports=[8000],
entrypoint=["python", "-m", "http.server", "8000"],
)
# Deploy the Pod
ok = pod.deploy()
if ok:
print("Pod successfully deployed!")
else:
print("Pod deployment failed!")
```
```python app.py theme={null}
python app.py
```
**Deploying via CLI**
```python app.py theme={null}
from beam import Pod
pod = Pod(
name="my-deployed-pod",
cpu=2,
memory="1Gi",
ports=[8000],
entrypoint=["python", "-m", "http.server", "8000"],
)
```
```sh theme={null}
beam deploy app.py:pod
```
## `Function`
Decorator for defining a remote function.
This method allows you to run the decorated function in a remote container.
```python Function theme={null}
from beam import Image, Function
@function(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
keep_warm_seconds=1000,
)
def transcribe(filename: str):
print(filename)
return "some_result"
# Call a function in a remote container
function.remote("some_file.mp4")
# Map the function over several inputs
# Each of these inputs will be routed to remote containers
for result in function.map(["file1.mp4", "file2.mp4"]):
print(result)
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
MiB, or as a string with units (e.g., "1Gi").
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty. Multiple GPUs can be
specified as a list.
The container image used for task execution.
The maximum number of seconds a task can run before timing out. Set to -1 to
disable the timeout.
The maximum number of times a task will be retried if the container crashes.
An optional URL to send a callback to when a task is completed, timed out, or
cancelled.
A list of storage volumes to be associated with the function.
A list of secrets that are injected into the container as environment
variables.
An optional name for this function, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument.
The task policy for the function. This helps manage the lifecycle of an
individual task. Setting values here will override timeout and retries.
A list of exceptions that will trigger a retry.
#### `Remote`
You can run any function remotely on Beam by using the `.remote()` method:
```python theme={null}
from beam import function
@function(cpu=8)
def square(i: int):
return i**2
if __name__ == "__main__":
# Run the `square` function remotely on Beam
result = square.remote(i=12)
print(result)
```
The code above is invoked by running `python example.py`:
```bash theme={null}
(.venv) user@MacBook % python example.py
=> Building image
=> Using cached image
=> Syncing files
=> Files synced
=> Running function:
=> Function complete <908c76b1-ee68-4b33-ac3a-026ae646625f>
144
```
#### `Map`
You can scale out workloads to many containers using the `.map()` method. You might use this for parallelizing computational-heavy tasks, such as batch inference or data processing jobs.
```python theme={null}
from beam import function
@function(cpu=8)
def square(i: int):
return i**2
def main():
numbers = list(range(10))
squared = []
# Run a remote container for every item in list
for result in square.map(numbers):
squared.append(result)
```
## `Schedule`
This method allows you to schedule the decorated function to run at specific intervals defined by a cron expression.
```python theme={null}
from beam import schedule
@schedule(when="*/5 * * * *", name="scheduled-job")
def task():
print("Hi, from scheduled task!")
```
The cron expression or predefined schedule that determines when the task will run.
This parameter defines the interval or specific time when the task should execute.
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
megabytes (e.g., 128 for 128 megabytes).
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty.
The container image used for the task execution..
The maximum number of seconds a task can run before it times out. Default is
180\. Set it to -1 to disable the timeout.
The number of concurrent tasks to handle per container. Modifying this
parameter can improve throughput for certain workloads. Workers will share the
CPU, Memory, and GPU defined. You may need to increase these values to
increase concurrency.
The maximum number of tasks that can be pending in the queue. If the number of
pending tasks exceeds this value, the task queue will stop accepting new
tasks.
An optional URL to send a callback to when a task is completed, timed out, or
cancelled.
The maximum number of times a task will be retried if the container crashes.
A list of volumes to be mounted to the container.
A list of secrets that are injected into the container as environment
variables.
An optional name for this endpoint, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument
**Scheduling Options**
| **Predefined Schedule** | **Description** | **Cron Expression** |
| -------------------------- | ---------------------------------------------------------- | ------------------- |
| `@yearly` (or `@annually`) | Run once a year at midnight on January 1st | `0 0 1 1 *` |
| `@monthly` | Run once a month at midnight on the first day of the month | `0 0 1 * *` |
| `@weekly` | Run once a week at midnight on Sunday | `0 0 * * 0` |
| `@daily` (or `@midnight`) | Run once a day at midnight | `0 0 * * *` |
| `@hourly` | Run once an hour at the beginning of the hour | `0 * * * *` |
## `Endpoint`
Decorator used for deploying a web endpoint.
```python theme={null}
from beam import endpoint, Image
@endpoint(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
keep_warm_seconds=1000,
)
def multiply(**inputs):
result = inputs["x"] * 2
return {"result": result}
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
megabytes (e.g., 128 for 128 megabytes).
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty.
The container image used for the task execution..
The maximum number of seconds a task can run before it times out. Default is
180\. Set it to -1 to disable the timeout.
The number of concurrent tasks to handle per container. Modifying this
parameter can improve throughput for certain workloads. Workers will share the
CPU, Memory, and GPU defined. You may need to increase these values to
increase concurrency.
The duration in seconds to keep the task queue warm even if there are no
pending tasks. Keeping the queue warm helps to reduce the latency when new
tasks arrive. Default is 10s.
The maximum number of tasks that can be pending in the queue. If the number of
pending tasks exceeds this value, the task queue will stop accepting new
tasks.
A function that runs when the container first starts. The return values of the
`on_start` function can be retrieved by passing a `context` argument to your
handler function.
A list of volumes to be mounted to the container.
A list of secrets that are injected into the container as environment
variables.
An optional name for this endpoint, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument
If false, allows the endpoint to be invoked without an auth token.
The maximum number of times a task will be retried if the container crashes.
Capture a memory snapshot of the running container after `on_start` completes,
speeding up cold boot. Initial checkpoints can take up to 3 minutes to
capture, and 5 minutes to distribute among our servers.
#### `Serve`
[`beam serve`](/v2/reference/cli#serve) monitors changes in your local file system, live-reloads the remote environment as you work, and forwards remote container logs to your local shell.
Serve is great for prototyping. You can develop in a containerized cloud environment in real-time, with adjustable CPU, memory, GPU resources.
It's also great for testing an app before deploying it. Served functions are orchestrated identically to deployments, which means you can test your Beam workflow end-to-end before deploying.
To start an ephemeral `serve` session, you'll use the `serve` command:
```sh theme={null}
beam serve app.py
```
Sessions end automatically after 10 minutes of inactivity.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
## `Task Queue`
Decorator for defining a task queue.
This method allows you to create a task queue out of the decorated function.
The tasks are executed asynchronously. You can interact with the task queue either through an API (when deployed), or directly in Python through the `.put()` method.
```python Task Queue theme={null}
from beam import Image, task_queue
# Define the task queue
@task_queue(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
keep_warm_seconds=1000,
)
def transcribe(filename: str):
return {}
transcribe.put("some_file.mp4")
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
megabytes (e.g., 128 for 128 megabytes).
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty.
The container image used for the task execution..
The maximum number of seconds a task can run before it times out. Default is
180\. Set it to -1 to disable the timeout.
The number of concurrent tasks to handle per container. Modifying this
parameter can improve throughput for certain workloads. Workers will share the
CPU, Memory, and GPU defined. You may need to increase these values to
increase concurrency.
The duration in seconds to keep the task queue warm even if there are no
pending tasks. Keeping the queue warm helps to reduce the latency when new
tasks arrive. Default is 10s.
The maximum number of tasks that can be pending in the queue. If the number of
pending tasks exceeds this value, the task queue will stop accepting new
tasks.
An optional URL to send a callback to when a task is completed, timed out, or
cancelled.
The maximum number of times a task will be retried if the container crashes.
A list of volumes to be mounted to the container.
A list of secrets that are injected into the container as environment
variables.
An optional name for this endpoint, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument
A list of exceptions that will trigger a retry.
Capture a memory snapshot of the running container after `on_start` completes,
speeding up cold boot. Initial checkpoints can take up to 3 minutes to
capture, and 5 minutes to distribute among our servers.
#### `Serve`
[`beam serve`](/v2/reference/cli#serve) monitors changes in your local file system, live-reloads the remote environment as you work, and forwards remote container logs to your local shell.
Serve is great for prototyping. You can develop in a containerized cloud environment in real-time, with adjustable CPU, memory, GPU resources.
It's also great for testing an app before deploying it. Served functions are orchestrated identically to deployments, which means you can test your Beam workflow end-to-end before deploying.
To start an ephemeral `serve` session, you'll use the `serve` command:
```sh theme={null}
beam serve app.py
```
Sessions end automatically after 10 minutes of inactivity.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
## `ASGI`
Decorator used for creating and deploying an ASGI application.
```python theme={null}
from beam import asgi, Image
@asgi(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["fastapi"]),
keep_warm_seconds=10,
max_pending_tasks=100,
)
def web_server(context):
from fastapi import FastAPI
app = FastAPI()
@app.post("/hello")
async def something():
return {"hello": True}
@app.post("/warmup")
async def warmup():
return {"status": "warm"}
return app
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
MiB, or as a string with units (e.g., "1Gi").
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty.
The container image used for task execution.
A list of volumes to be mounted to the container.
The maximum number of seconds a task can run before timing out. Set to -1 to
disable the timeout.
The maximum number of times a task will be retried if the container crashes.
The number of processes handling tasks per container. Workers share CPU,
memory, and GPU resources.
The maximum number of concurrent requests the ASGI application can handle.
The duration in seconds to keep the task queue warm when there are no pending
tasks.
The maximum number of tasks that can be pending in the queue.
A list of secrets injected into the container as environment variables.
An optional name for this ASGI application, used during deployment.
If false, allows the ASGI application to be invoked without an auth token.
Configure deployment autoscaling using various strategies.
An optional URL to send a callback when a task is completed, timed out, or
canceled.
The task policy for the function, overriding timeout and retries.
#### `Serve`
[`beam serve`](/v2/reference/cli#serve) monitors changes in your local file system, live-reloads the remote environment as you work, and forwards remote container logs to your local shell.
Serve is great for prototyping. You can develop in a containerized cloud environment in real-time, with adjustable CPU, memory, GPU resources.
It's also great for testing an app before deploying it. Served functions are orchestrated identically to deployments, which means you can test your Beam workflow end-to-end before deploying.
To start an ephemeral `serve` session, you'll use the `serve` command:
```sh theme={null}
beam serve app.py
```
Sessions end automatically after 10 minutes of inactivity.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
## `Realtime`
Decorator for creating a real-time application built on top of ASGI/websockets.\
The handler function runs every time a message is received over the websocket.
```python theme={null}
from beam import realtime
def generate_text():
return ["this", "could", "be", "anything"]
@realtime(
cpu=1.0,
memory=128,
gpu="T4"
)
def handler(context):
return generate_text()
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
MiB, or as a string with units (e.g., "1Gi").
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU is required, leave it empty.
The container image used for task execution.
A list of volumes to be mounted to the ASGI application.
The maximum number of seconds a task can run before timing out. Set to -1 to
disable the timeout.
The number of processes handling tasks per container. Workers share CPU,
memory, and GPU resources.
The maximum number of concurrent requests the ASGI application can handle.
This allows processing multiple requests concurrently.
The duration in seconds to keep the task queue warm even if there are no
pending tasks.
The maximum number of tasks that can be pending in the queue.
A list of secrets injected into the container as environment variables.
An optional name for this ASGI application, used during deployment. If not
specified, you must provide the name during deployment.
If false, allows the ASGI application to be invoked without an auth token.
Configure a deployment autoscaler to scale the function horizontally using
various autoscaling strategies.
An optional URL to send a callback to when a task is completed, timed out, or
canceled.
#### `Serve`
[`beam serve`](/v2/reference/cli#serve) monitors changes in your local file system, live-reloads the remote environment as you work, and forwards remote container logs to your local shell.
Serve is great for prototyping. You can develop in a containerized cloud environment in real-time, with adjustable CPU, memory, GPU resources.
It's also great for testing an app before deploying it. Served functions are orchestrated identically to deployments, which means you can test your Beam workflow end-to-end before deploying.
To start an ephemeral `serve` session, you'll use the `serve` command:
```sh theme={null}
beam serve app.py
```
Sessions end automatically after 10 minutes of inactivity.
By default, Beam will sync all the files in your working directory to the
remote container. This allows you to use the files you have locally while
developing. If you want to prevent some files from getting uploaded, you can
create a [`.beamignore`](/v2/reference/cli#ignore-local-files).
### `Function`
Decorator for defining a remote function.
This method allows you to run the decorated function in a remote container.
```python Function theme={null}
from beam import Image, Function
@function(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
)
def transcribe(filename: str):
print(filename)
return "some_result"
# Call a function in a remote container
function.remote("some_file.mp4")
# Map the function over several inputs
# Each of these inputs will be routed to remote containers
for result in function.map(["file1.mp4", "file2.mp4"]):
print(result)
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
MiB, or as a string with units (e.g., "1Gi").
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty. Multiple GPUs can be
specified as a list.
The container image used for task execution.
The maximum number of seconds a task can run before timing out. Set to -1 to
disable the timeout.
The maximum number of times a task will be retried if the container crashes.
An optional URL to send a callback to when a task is completed, timed out, or
cancelled.
A list of storage volumes to be associated with the function.
A list of secrets that are injected into the container as environment
variables.
An optional name for this function, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument.
The task policy for the function. This helps manage the lifecycle of an
individual task. Setting values here will override timeout and retries.
A list of exceptions that will trigger a retry.
Determines whether the function continues running in the background after the
client disconnects.
## `Bot`
Decorator for defining a bot with multiple states and transitions.
The `bot` decorator allows you to define a bot with specific states (locations) and transitions. These bots run as distributed, stateful workflows, where each transition is executed in a remote container.
```python theme={null}
from beam import Bot, BotContext, BotLocation, Image
from pydantic import BaseModel
# Define input and output types for the bot
class ProductName(BaseModel):
product_name: str
class URL(BaseModel):
url: str
class ReviewPage(BaseModel):
review_page: str
# Create the bot
bot = Bot(
model="gpt-4o",
api_key="YOUR_API_KEY",
locations=[
BotLocation(marker=ProductName),
BotLocation(marker=URL, expose=False),
BotLocation(marker=ReviewPage, expose=False),
],
description="This bot retrieves product reviews and summarizes them.",
)
# Define a transition
@bot.transition(
inputs={ProductName: 1},
outputs=[URL],
description="Retrieve Google Shopping URLs for a product",
cpu=1,
memory=128,
image=Image(python_packages=["serpapi", "google-search-results"]),
)
def get_product_urls(context: BotContext, inputs):
product_name = inputs[ProductName][0].product_name
# Perform some action
return {URL: [URL(url="https://example.com")]}
```
The underlying language model (e.g., `"gpt-4o"`) used by the bot.
The Open API key used to authenticate requests to Open AI
A list of `BotLocation` objects defining the bot's states. Each location
corresponds to a type (e.g., `BaseModel`) that the bot operates on.
A human-readable description of the bot's purpose.
Specifies whether the bot requires an auth token passed to invoke it.
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
megabytes (e.g., 128 for 128 megabytes).
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU required, leave it empty.
The container image used for the task execution..
The maximum number of seconds a task can run before it times out. Default is
180\. Set it to -1 to disable the timeout.
The number of concurrent tasks to handle per container. Modifying this
parameter can improve throughput for certain workloads. Workers will share the
CPU, Memory, and GPU defined. You may need to increase these values to
increase concurrency.
The duration in seconds to keep the task queue warm even if there are no
pending tasks. Keeping the queue warm helps to reduce the latency when new
tasks arrive. Default is 10s.
The maximum number of tasks that can be pending in the queue. If the number of
pending tasks exceeds this value, the task queue will stop accepting new
tasks.
A function that runs when the container first starts. The return values of the
`on_start` function can be retrieved by passing a `context` argument to your
handler function.
A list of volumes to be mounted to the container.
A list of secrets that are injected into the container as environment
variables.
An optional name for this endpoint, used during deployment. If not specified,
you must specify the name at deploy time with the `--name` argument
If false, allows the endpoint to be invoked without an auth token.
The maximum number of times a task will be retried if the container crashes.
# Autoscaling
### `QueueDepthAutoscaler`
Adds an autoscaler to an app.
```python theme={null}
from beam import endpoint, QueueDepthAutoscaler
@endpoint(
autoscaler=QueueDepthAutoscaler(
min_containers=1, max_containers=3, tasks_per_container=1
),
)
def handler():
return {"success": "true"}
```
The number of containers to keep running at baseline. The containers will
continue running until the deployment is stopped.
The max number of tasks that can be queued up to a single container. This can
help manage throughput and cost of compute. When `max_tasks_per_container` is
0, a container can process any number of tasks.
The maximum number of containers that the autoscaler can create. It defines an
upper limit to avoid excessive resource consumption.
# Data Structures
### `Simple Queue`
Creates a Queue instance.
Use this a concurrency safe distributed queue, accessible both locally and within remote containers.
Serialization is done using cloudpickle, so any object that supported by that should work here. The interface is that of a standard python queue.
Because this is backed by a distributed queue, it will persist between runs.
```python Simple Queue theme={null}
from beam import Queue
val = [1, 2, 3]
# Initialize the Queue
q = Queue(name="myqueue")
for i in range(100):
# Insert something to the queue
q.put(val)
while not q.empty():
# Remove something from the queue
val = q.pop()
print(val)
```
The name of the queue (any arbitrary string).
### Map
Creates a Map Instance.
Use this a concurrency safe key/value store, accessible both locally and within
remote containers.
Serialization is done using cloudpickle, so any object that supported by that should work here. The interface is that of a standard python dictionary.
Because this is backed by a distributed dictionary, it will persist between runs.
```python Map theme={null}
from beam import Map
# Name the map
m = Map(name="test")
# Set a key
m["some_key"] = True
# Delete a key
del m["some_key"]
# Iterate through the map
for k, v in m.items():
print("key: ", k)
print("value: ", v)
```
The name of the map (any arbitrary string).
# Storage
Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
### `Volume`
Creates a Volume instance.
When your container runs, your volume will be available at `./{mount_path}` and `/volumes/{name}`.
```python theme={null}
from beam import function, Volume
VOLUME_PATH = "./model_weights"
@function(
volumes=[Volume(name="model-weights", mount_path=VOLUME_PATH)],
)
def load_model():
from transformers import AutoModel
# Load model from cloud storage cache
AutoModel.from_pretrained(VOLUME_PATH)
```
The name of the volume, a descriptive identifier for the data volume.
The path where the volume is mounted within the container environment.
### `CloudBucket`
Creates a CloudBucket instance.
When your container runs, your cloud bucket will be available at `./{mount_path}` and `/volumes/{name}`.
```python theme={null}
from beam import CloudBucket, CloudBucketConfig
# Cloud Bucket
weights = CloudBucket(
name="weights",
mount_path="./weights",
config=CloudBucketConfig(
access_key="my-access-key",
secret_key="my-secret-key",
endpoint="https://s3-endpoint.com",
),
)
@function(volumes=[weights])
def my_function():
pass
```
The name of the cloud bucket, must be the same as the bucket name in the cloud
provider.
The path where the cloud bucket is mounted within the container environment.
Configuration for the cloud bucket.
## `CloudBucketConfig`
Configuration for a cloud bucket.
```python theme={null}
from beam import CloudBucketConfig
config = CloudBucketConfig(
read_only=False,
access_key="my-access-key",
secret_key="my-secret-key",
endpoint="https://s3-endpoint.com",
region="us-west-2"
)
```
Whether the volume is read-only.
The beam secret name for the S3 access key for the external provider.
The beam secret name for the S3 secret key for the external provider.
The S3 endpoint for the external provider.
The region for the external provider.
## `Output`
A file that a task has created.
Use this to save a file you may want to save and share later.
```python theme={null}
from beam import Image as BeamImage, Output, function
@function(
image=BeamImage(
python_packages=[
"pillow",
],
),
)
def save_image():
from PIL import Image as PILImage
# Generate PIL image
pil_image = PILImage.new(
"RGB", (100, 100), color="white"
) # Creating a 100x100 white image
# Save image file
output = Output.from_pil_image(pil_image)
output.save()
# Retrieve pre-signed URL for output file
url = output.public_url(expires=400)
print(url)
# Print other details about the output
print(f"Output ID: {output.id}")
print(f"Output Path: {output.path}")
print(f"Output Stats: {output.stat()}")
print(f"Output Exists: {output.exists()}")
return {"image": url}
if __name__ == "__main__":
save_image()
```
When you run this function, it will return a pre-signed URL to the image:
```bash theme={null}
https://app.stage.beam.cloud/output/id/abe0c95a-2cd1-40b3-bace-9225f2c79c6d
```
The length of time the pre-signed URL will be available for. The file will be
automatically deleted after this period.
#### Files
Saving a file and generating a public URL.
```python theme={null}
myfile = "path/to/my.txt"
output = Output(path=myfile)
output.save()
output_url = output.public_url()
```
#### PIL Images
Saving a `PIL.Image` object.
```python theme={null}
image = pipe( ... )
output = Output.from_pil_image(image)
output.save()
```
#### Directories
Saving a directory.
```python theme={null}
mydir = Path("/volumes/myvol/mydir") # or use a str
output = Output(path=mydir)
output.save()
```
## Experimental
### `Signal`
Creates a Signal instance. Signals can be used to notify a container to perform specific actions using a flag.
For example, signals can reload global state, send a webhook, or terminate the container.
This is a great tool for automated retraining and deployment.
```python theme={null}
# Setting up a consumer of a signal
s = Signal(name="reload-model", handler=reload_model, clear_after_interval=5)
some_global_model = None
def load_model():
global some_global_model
some_global_model = LoadIt()
@endpoint(on_start=load_model)
def handler(**kwargs):
global some_global_model
return some_global_model(kwargs["param1"])
# Trigger load_model to execute again while the container is still running
s = Signal(name="reload-model")
s.set(ttl=60)
```
The name of the signal.
A function to be called when the signal is set. If not provided, no handler
will be executed.
The number of seconds after which the signal will be automatically cleared if
both `handler` and `clear_after_interval` are set.
## Integrations
### `vllm`
A wrapper around the vLLM library that allows you to deploy it as an ASGI app.
```python theme={null}
from beam import integrations
e = integrations.VLLMArgs()
e.device = "cpu"
e.chat_template = "./chatml.jinja"
vllm_app = integrations.VLLM(name="vllm-abstraction-1", vllm_args=e)
```
The number of CPU cores allocated to the container.
The amount of memory allocated to the container. It should be specified in
MiB, or as a string with units (e.g., "1Gi").
The type or name of the GPU device to be used for GPU-accelerated tasks. If
not applicable or no GPU is required, leave it empty.
The container image used for task execution. This will include an
`add_python_packages` call with `["fastapi", "vllm", "huggingface_hub"]` added
to ensure vLLM can run.
The number of workers to run in the container.
The maximum number of concurrent requests the container can handle.
The number of seconds to keep the container warm after the last request.
The maximum number of pending tasks allowed in the container.
The maximum number of seconds to wait for the container to start.
Whether the endpoints require authorization.
The name of the container. If not specified, you must provide it during
deployment.
The volumes to mount into the container. Default is a single volume named
"vllm\_cache" mounted to "./vllm\_cache", used as the download directory for
vLLM models.
A list of secrets to pass to the container. To enable Hugging Face
authentication for downloading models, set the `HF_TOKEN` in the secrets.
The autoscaler to use for scaling container deployments.
The arguments to configure the vLLM model.
# Utils
### `env`
You can use `env.is_remote()` to only import Python packages when your app is running remotely. This is used to avoid import errors, since your Beam app might be using Python packages that aren't installed on your local computer.
```python theme={null}
from beam import env
if env.is_remote():
import torch
```
The alternative to `env.is_remote()` is to import packages inline in your functions. For more information on this topic, [visit this page](/v2/environment/remote-versus-local).
# TypeScript SDK Reference
Source: https://docs.beam.cloud/v2/reference/ts-sdk
Beam's TypeScript SDK provides a powerful client library for interacting with the Beam platform. Unlike decorators and frameworks in the Python SDK, the TypeScript SDK focuses on programmatic access to Beam's infrastructure and resources.
This reference outlines every available class, method, and configuration option in the TypeScript SDK.
# Installation
Install the package with `npm`:
```typescript theme={null}
npm install @beamcloud/beam-js@rc
```
...or using `yarn`:
```typescript theme={null}
yarn add @beamcloud/beam-js@rc
```
# Configuration
Locate your Beam Token (API Key) and Workspace ID in the [dashboard](https://platform.beam.cloud/settings/api-keys) and set them as environment variables.
```bash theme={null}
export BEAM_TOKEN=YOUR_BEAM_TOKEN
export BEAM_WORKSPACE_ID=YOUR_WORKSPACE_ID
```
## `beamOpts`
Global configuration object for the Beam client.
```typescript theme={null}
import { beamOpts } from "@beamcloud/beam-js";
beamOpts.token = process.env.BEAM_TOKEN!;
beamOpts.workspaceId = process.env.BEAM_WORKSPACE_ID!;
beamOpts.gatewayUrl = "https://app.beam.cloud"; // Optional, defaults to https://app.beam.cloud
```
**Required Configuration:**
* `token`: Your Beam authentication token
* `workspaceId`: Your Beam workspace ID
**Optional Configuration:**
* `gatewayUrl`: The Beam gateway URL (defaults to `https://app.beam.cloud`)
## Quickstart
Run a simple Node.js server in a sandbox. This example uses the `Image` class to create a custom container image and the `Sandbox` class to create a sandbox instance.
```typescript theme={null}
import { beamOpts, Image, Sandbox } from "@beamcloud/beam-js";
beamOpts.token = process.env.BEAM_TOKEN!;
beamOpts.workspaceId = process.env.BEAM_WORKSPACE_ID!;
async function main() {
const image = new Image({
baseImage: "node:20",
commands: [
"apt update",
"apt install -y nodejs npm",
"git clone https://github.com/beam-cloud/quickstart-node.git /app",
],
});
const sandbox = new Sandbox({
name: "quickstart",
image: image,
cpu: 2,
memory: 1024,
keepWarmSeconds: 300,
});
const instance = await sandbox.create();
const process4 = await instance.exec([
"sh",
"-c",
"cd /app && node server.js",
]);
const url = await instance.exposePort(3000);
console.log(`Server is running at ${url}`);
}
main();
```
# Environment
## `Image`
Defines a custom container image that your code will run in.
An Image object encapsulates the configuration of a custom container image that will be used as the runtime environment for executing tasks.
```typescript theme={null}
import {
Image,
PythonVersion,
PythonVersionAlias,
GpuType,
GpuTypeAlias,
} from "@beamcloud/beam-js";
const image = new Image({
baseImage: "docker.io/nvidia/cuda:12.3.1-runtime-ubuntu20.04",
pythonVersion: PythonVersion.Python311, // Type-safe enum approach
commands: ["apt-get update -y", "apt-get install ffmpeg -y"],
pythonPackages: ["transformers", "torch"],
gpu: GpuType.A10G, // Type-safe enum approach
});
await image.build();
// Alternative using string literals for convenience
const imageWithStringLiterals = new Image({
baseImage: "docker.io/nvidia/cuda:12.3.1-runtime-ubuntu20.04",
pythonVersion: "python3.11", // String literal for Python version
commands: ["apt-get update -y", "apt-get install ffmpeg -y"],
pythonPackages: ["transformers", "torch"],
gpu: "A10G", // String literal for GPU
});
await imageWithStringLiterals.build();
```
**Constructor Parameters:**
The Python version to be used in the image. Can be a PythonVersion enum (e.g.,
PythonVersion.Python311) or string literal (e.g., "python3.11"). Defaults to
Python 3.
A list of Python packages to install in the container image. Alternatively, a
string containing a path to a requirements.txt can be provided.
A list of shell commands to run when building your container image. These
commands can be used for setting up the environment, installing dependencies,
etc.
A custom base image to replace the default ubuntu20.04 image used in your
container. This can be a public or private image from Docker Hub, Amazon ECR,
Google Cloud Artifact Registry, or NVIDIA GPU Cloud Registry.
Credentials for accessing private registries. Can be a dictionary of key/value
pairs or an array of environment variable names.
Environment variables to add to the image. These will be available when
building the image and when the container is running.
A list of secrets that are injected into the container as environment
variables.
Builds the image on a GPU node. Can be a GpuType enum (e.g., GpuType.H100) or
string literal (e.g., "H100").
### `Image.fromDockerfile()`
Create an Image from a local Dockerfile.
```typescript theme={null}
import { Image } from "@beamcloud/beam-js";
const image = await Image.fromDockerfile("./Dockerfile", "./context");
```
Path to the Dockerfile.
Directory to sync as build context. Defaults to the Dockerfile directory.
### `Image.addPythonPackages()`
Add pip packages to install during the build. Accepts a list or a path to requirements.txt.
```typescript theme={null}
// Using an array of package names
image.addPythonPackages(["transformers==4.44.0", "torch==2.4.0"]);
// Using a requirements.txt file
image.addPythonPackages("./requirements.txt");
```
Package names or a `requirements.txt` path.
### `Image.withEnvs()`
Add environment variables available during build and at runtime.
```typescript theme={null}
image.withEnvs({ HF_HOME: "/models", HF_HUB_ENABLE_HF_TRANSFER: "1" });
```
Environment variables as key/value pairs, array of "KEY=VALUE" strings, or
single string.
### `Image.withSecrets()`
Expose platform secrets to the build environment.
```typescript theme={null}
image.withSecrets(["HF_TOKEN"]);
```
Secret names created via the platform.
### `Image.micromamba()`
Switch package management to micromamba and target a micromamba Python.
```typescript theme={null}
image.micromamba();
```
### `Image.addMicromambaPackages()`
Install micromamba packages and optional channels.
```typescript theme={null}
image.addMicromambaPackages(["pandas", "numpy"], ["conda-forge"]);
```
Package names or a `requirements.txt` path.
Micromamba channels.
### `Image.build()`
Build the image and return the result.
```typescript theme={null}
const result = await image.build();
console.log("Build successful:", result.success);
```
# Deployments
## `Deployments`
You can use this to manage and interact with deployed Beam applications.
```typescript theme={null}
import { Deployments } from "@beamcloud/beam-js";
// List all deployments
const deployments = await Deployments.list();
// Get a deployment by ID
const deployment = await Deployments.get({ id: "deployment-id" });
// Get a deployment by name and stub type
const deployment = await Deployments.get({
name: "my-app",
stubType: "endpoint/deployment",
});
// Call the deployment
const response = await deployment.call({ message: "Hello World" });
// Connect to realtime deployment
const ws = await deployment.realtime("/", (event) => {
console.log("Received:", event.data);
});
```
### `Deployments.list()`
List all deployments in your workspace.
```typescript theme={null}
const deployments = await Deployments.list({
stubType: "endpoint/deployment",
active: true,
limit: 10,
});
```
### `Deployments.get()`
Retrieve a deployment by ID, name, or URL.
```typescript theme={null}
// By ID
const deployment = await Deployments.get({ id: "deployment-id" });
// By name and stub type
const deployment = await Deployments.get({
name: "my-app",
stubType: "endpoint/deployment",
});
```
The deployment ID to retrieve.
The deployment name (must be used with stubType).
The stub type (must be used with name).
The deployment URL.
## `Deployment`
A deployment instance with methods for interaction.
### `Deployment.call()`
Call the deployment with data.
```typescript theme={null}
const response = await deployment.call(
{ message: "Hello World" },
"/endpoint-path", // optional path
"POST", // optional HTTP method
);
```
The data to send to the deployment.
Optional path to append to the deployment URL.
HTTP method to use for the request.
### `Deployment.realtime()`
Connect to a realtime deployment via WebSocket.
```typescript theme={null}
const ws = await deployment.realtime("/", (event) => {
console.log("Message received:", event.data);
});
// Send a message
ws.send(JSON.stringify({ message: "Hello" }));
```
Optional path to append to the WebSocket URL.
Optional message handler function.
### `Deployment.httpUrl()` / `Deployment.websocketUrl()`
Get the HTTP or WebSocket URL for the deployment.
```typescript theme={null}
const httpUrl = deployment.httpUrl("/api/predict");
const wsUrl = deployment.websocketUrl("/realtime");
```
# Sandbox
A sandboxed container for running Python code or arbitrary processes.
You can use this to create isolated environments where you can execute code,
manage files, and run processes.
```typescript theme={null}
import { Sandbox, Image, GpuType } from "@beamcloud/beam-js";
const sandbox = new Sandbox({
name: "my-sandbox",
cpu: 2,
memory: "1Gi",
gpu: GpuType.T4, // Using enum
image: new Image({
pythonPackages: ["numpy", "pandas"],
}),
keepWarmSeconds: 300,
});
// Alternative with string literal
const sandboxWithStringGpu = new Sandbox({
name: "my-sandbox-2",
cpu: 2,
memory: "1Gi",
gpu: "T4", // Using string literal
image: new Image({
pythonPackages: ["numpy", "pandas"],
}),
keepWarmSeconds: 300,
});
// Create a new sandbox instance
const instance = await sandbox.create();
// Or connect to an existing one
const existingInstance = await Sandbox.connect("sandbox-id");
```
You can also configure sandbox networking at creation time to pre-expose ports
or restrict outbound traffic.
```typescript theme={null}
const networkedSandbox = new Sandbox({
name: "networked-sandbox",
image: new Image({
baseImage: "node:20",
}),
ports: [3000, 8080],
allowList: ["8.8.8.8/32"],
});
const networkedInstance = await networkedSandbox.create();
const urls = await networkedInstance.listUrls();
console.log("Known URLs:", urls);
```
Ports to expose immediately when the sandbox is created. These ports will be
available via public URLs as soon as the sandbox starts.
Blocks all outbound network access from the sandbox while still allowing
inbound traffic to exposed ports. Cannot be used together with `allowList`.
CIDR ranges that the sandbox is allowed to connect to. When specified, all
other outbound traffic is blocked. Cannot be used together with
`blockNetwork`.
### `Sandbox.create()`
Create a new sandbox instance.
```typescript theme={null}
const instance = await sandbox.create(["python", "app.py"]); // optional entrypoint
console.log(`Sandbox created with ID: ${instance.sandboxId}`);
```
### `Sandbox.connect()`
Connect to an existing sandbox instance by ID.
```typescript theme={null}
const instance = await Sandbox.connect("sandbox-123");
```
The container ID of the existing sandbox instance.
### `Sandbox.createFromSnapshot()`
Create a sandbox instance from a filesystem snapshot.
```typescript theme={null}
const instance = await Sandbox.createFromSnapshot("snapshot-123");
```
The ID of the snapshot to create the sandbox from.
## SandboxInstance
A sandbox instance that provides access to the sandbox internals.
### `SandboxInstance.runCode()`
Run Python code in the sandbox.
```typescript theme={null}
const result = await instance.runCode(`
import numpy as np
print("NumPy version:", np.__version__)
result = np.array([1, 2, 3, 4, 5])
print("Array:", result)
`);
console.log("Output:", result.result);
```
The Python code to execute.
Whether to wait for the process to complete.
### `SandboxInstance.exec()`
Run an arbitrary command in the sandbox.
```typescript theme={null}
// Using an array of command and arguments
const process = await instance.exec(["ls", "-la", "/workspace"]);
const exitCode = await process.wait();
// Using a single string command
const process2 = await instance.exec("ls -la /workspace");
// With execution options
const process3 = await instance.exec(["node", "server.js"], {
cwd: "/app",
env: { NODE_ENV: "production" },
});
// Get the process ID
const pid = process.pid;
// Read output
const stdout = await process.stdout.read();
const stderr = await process.stderr.read();
```
The command to execute. Can be a single string or an array of strings (command
and arguments).
Optional execution options.
The working directory for the command.
Environment variables to set for the command.
### `SandboxInstance.exposePort()`
Dynamically expose a port to the internet.
```typescript theme={null}
const url = await instance.exposePort(8000);
console.log(`Web service available at: ${url}`);
```
The port number to expose within the sandbox.
Use `listUrls()` to inspect every currently exposed URL, including ports
exposed at creation time with `ports`.
### `SandboxInstance.listUrls()`
List the currently exposed preview/public URLs for the sandbox. Returns
`Promise>`, keyed by port number.
```typescript theme={null}
const urls = await instance.listUrls();
for (const [port, url] of Object.entries(urls)) {
console.log(`Port ${port} available at: ${url}`);
}
```
### `SandboxInstance.updateNetworkPermissions()`
Dynamically update outbound network permissions for the sandbox. This method
returns `Promise`.
Because the method signature is positional, pass `false` as the first argument
when updating to an allow list without fully blocking outbound traffic.
```typescript theme={null}
// Block all outbound traffic
await instance.updateNetworkPermissions(true);
// Allow only specific CIDR ranges
await instance.updateNetworkPermissions(false, ["8.8.8.8/32", "10.0.0.0/8"]);
// Remove all outbound restrictions
await instance.updateNetworkPermissions(false, []);
```
If `true`, blocks all outbound network access from the sandbox. Cannot be used
together with `allowList`.
Optional list of allowed CIDR ranges. Passing `[]` removes outbound
restrictions. Cannot be used together with `blockNetwork=true`.
`allowList` entries must use CIDR notation such as `"8.8.8.8/32"` or
`"10.0.0.0/8"`. `blockNetwork=true` and `allowList` are mutually exclusive, and
exposed ports remain reachable regardless of outbound restrictions.
### `SandboxInstance.snapshot()`
Create a memory snapshot of the current sandbox. This method captures the memory state of the sandbox as an immutable artifact. You can later restore this snapshot into a new sandbox instance using `createFromSnapshot()`.
```typescript theme={null}
const snapshotId = await instance.snapshot();
console.log(`Created memory snapshot: ${snapshotId}`);
// Restore from memory snapshot
const restoredInstance = await Sandbox.createFromSnapshot(snapshotId);
```
### `SandboxInstance.createImageFromFilesystem()`
Create an image from the sandbox filesystem. This method returns an image ID that can be used to create new sandboxes with the same filesystem state. You can use the `Image.fromId()` method to create a new image instance.
```typescript theme={null}
const imageId = await instance.createImageFromFilesystem();
console.log(`Created image from filesystem: ${imageId}`);
// Use the snapshot as a base image for new sandboxes
const image = Image.fromId(imageId);
const newSandbox = new Sandbox({
name: "from-filesystem",
image: image,
// ... other config
});
```
### `SandboxInstance.updateTtl()`
Update the keep warm setting of the sandbox.
```typescript theme={null}
// Keep alive for 1 hour
await instance.updateTtl(3600);
// Make it never timeout
await instance.updateTtl(-1);
```
The number of seconds to keep the sandbox alive. Use -1 for sandboxes that
never timeout.
### `SandboxInstance.sandboxId()`
Get the ID of the sandbox.
```typescript theme={null}
const sandboxId = instance.sandboxId();
```
### `SandboxInstance.terminate()`
Terminate the container associated with this sandbox instance.
```typescript theme={null}
const success = await instance.terminate();
```
## SandboxProcess
Represents a running process within a sandbox.
### `SandboxProcess.wait()`
Wait for the process to complete.
```typescript theme={null}
const process = await instance.exec(["sleep", "5"]);
const exitCode = await process.wait();
console.log("Process exited with code:", exitCode);
```
### `SandboxProcess.kill()`
Kill the process.
```typescript theme={null}
const process = await instance.exec(["sleep", "100"]);
await process.kill();
```
### `SandboxProcess.status()`
Get the status of the process.
```typescript theme={null}
const [exitCode, status] = await process.status();
if (exitCode >= 0) {
console.log("Process finished with exit code:", exitCode);
}
```
### `SandboxProcess.stdout` / `SandboxProcess.stderr`
Get handles to the process output streams.
```typescript theme={null}
const process = await instance.exec(["echo", "Hello World"]);
const output = await process.stdout.read();
console.log("Output:", output);
```
## SandboxFileSystem
File system interface for managing files within a sandbox.
### `SandboxFileSystem.uploadFile()`
Upload a local file to the sandbox.
```typescript theme={null}
await instance.fs.uploadFile("./local-file.txt", "workspace/uploaded-file.txt");
```
The path to the local file to upload.
The destination path within the sandbox.
### `SandboxFileSystem.downloadFile()`
Download a file from the sandbox to a local path.
```typescript theme={null}
await instance.fs.downloadFile(
"workspace/output.txt",
"./downloaded-output.txt",
);
```
The path to the file within the sandbox.
The destination path on the local filesystem.
### `SandboxFileSystem.listFiles()`
List the files in a directory in the sandbox.
```typescript theme={null}
const files = await instance.fs.listFiles("/workspace");
for (const file of files) {
console.log(`${file.name} (${file.isDir ? "directory" : "file"})`);
}
```
The path to the directory within the sandbox.
### `SandboxFileSystem.deleteFile()`
Delete a file in the sandbox.
```typescript theme={null}
await instance.fs.deleteFile("/tmp/temp-file.txt");
```
The path to the file within the sandbox.
### `SandboxFileSystem.statFile()`
Get metadata of a file in the sandbox.
```typescript theme={null}
const fileInfo = await instance.fs.statFile("/path/to/file.txt");
console.log(`File size: ${fileInfo.size} bytes`);
console.log(`Is directory: ${fileInfo.isDir}`);
```
The path to the file within the sandbox.
# Pod
A **Pod** is an object that allows you to run arbitrary services in a fast, scalable, and secure remote container on Beam.
You can think of a Pod as a lightweight compute environment that you fully control—complete with a custom container, ports you can expose, environment variables, volumes, secrets, and GPUs.
```typescript theme={null}
import { Pod, Image } from "@beamcloud/beam-js";
// Create a Pod that runs a simple HTTP server
const pod = new Pod({
name: "web-server",
cpu: 2,
memory: "512Mi",
image: new Image({
baseImage: "python:3.9-slim",
pythonPackages: ["requests"],
}),
ports: [8000],
});
// Create the pod container
const result = await pod.create(["python", "-m", "http.server", "8000"]);
console.log("Container ID:", result.containerId);
console.log("URL:", result.url);
```
### `Pod.create()`
Create a new container that runs until it completes or is explicitly killed.
```typescript theme={null}
const result = await pod.create(["python", "app.py"]);
console.log("Pod created successfully:", result.url);
```
The command to run in the container.
# Storage
## `Volume`
Creates a Volume instance.
When your container runs, your volume will be available at `./{mountPath}` and `/volumes/{name}`.
```typescript theme={null}
import { Volume } from "@beamcloud/beam-js";
const volume = new Volume("model-weights", "./weights");
await volume.getOrCreate();
// Use with Pod or Sandbox
const pod = new Pod({
name: "my-pod",
volumes: [volume],
// ... other config
});
```
The name of the volume, a descriptive identifier for the data volume.
The path where the volume is mounted within the container environment.
### `Volume.getOrCreate()`
Get or create the volume in the platform.
```typescript theme={null}
const success = await volume.getOrCreate();
console.log("Volume ready:", volume.ready);
```
# Utils
## `TaskPolicy`
Task policy for managing the lifecycle of individual tasks.
```typescript theme={null}
import { TaskPolicy } from "@beamcloud/beam-js";
const policy = new TaskPolicy({
maxRetries: 3,
timeout: 300,
ttl: 3600,
});
```
The maximum number of times a task will be retried if the container crashes.
The maximum number of seconds a task can run before it times out. Set it to -1
to disable the timeout.
The expiration time for a task in seconds. Must be greater than 0 and less
than 24 hours (86400 seconds).
# Types
The TypeScript SDK includes comprehensive type definitions for all resources:
* `GpuType`: Enum of available GPU types (T4, A10G, A100, H100, etc.)
* `GpuTypeAlias`: Union type allowing both enum values and string literals for GPU specification
* `PythonVersion`: Enum of supported Python versions
* `PythonVersionAlias`: Union type allowing both enum values and string literals for Python version specification
* `TaskStatus`: Enum of task statuses (PENDING, RUNNING, COMPLETE, etc.)
* `DeploymentData`: Interface for deployment data
* `TaskData`: Interface for task data
* `ExecOptions`: Options for sandbox command execution (cwd, env)
* `PodInstanceData`: Interface for pod instance data
* And many more...
```typescript theme={null}
import {
GpuType,
GpuTypeAlias,
PythonVersion,
PythonVersionAlias,
TaskStatus,
DeploymentData,
} from "@beamcloud/beam-js";
// Use types for better TypeScript support
const gpu: GpuType = GpuType.A10G; // Enum approach
const gpuAlias: GpuTypeAlias = "A10G"; // String literal approach
const anotherGpu: GpuTypeAlias = GpuType.H100; // Both work with GpuTypeAlias
const python: PythonVersion = PythonVersion.Python311; // Enum approach
const pythonAlias: PythonVersionAlias = "python3.11"; // String literal approach
const anotherPython: PythonVersionAlias = PythonVersion.Python310; // Both work with PythonVersionAlias
// Both approaches work seamlessly for any alias type
function createResource(gpu: GpuTypeAlias, python: PythonVersionAlias) {
console.log(`Using GPU: ${gpu}, Python: ${python}`);
}
createResource(GpuType.H100, PythonVersion.Python311); // Works with enums
createResource("H100", "python3.11"); // Works with strings
createResource("A10G", PythonVersion.Python310); // Works with mixed approaches
```
## `ExecOptions`
Options for configuring command execution in a sandbox.
```typescript theme={null}
import { ExecOptions } from "@beamcloud/beam-js";
const opts: ExecOptions = {
cwd: "/app",
env: { NODE_ENV: "production", DEBUG: "true" },
};
const process = await instance.exec(["node", "server.js"], opts);
```
The working directory for the command.
Environment variables to set for the command.
# Using Beam Docs with AI Tools
Source: https://docs.beam.cloud/v2/resources/ai-tools
Bring the Beam documentation into your LLMs, IDEs, and agents
The Beam docs are built to work well with AI assistants and coding agents. You can feed them to an LLM, open them in your editor, or connect them as a tool.
## llms.txt
We publish machine-readable indexes of the documentation that follow the [llms.txt](https://llmstxt.org) standard:
* [llms.txt](https://docs.beam.cloud/llms.txt) — a concise, structured index of every page, ideal for giving a model a map of the docs.
* [llms-full.txt](https://docs.beam.cloud/llms-full.txt) — the full text of the documentation in a single file, ideal for pasting into a model with a large context window.
## Markdown for any page
Append `.md` to the URL of any docs page to get its raw Markdown. For example:
```text theme={null}
https://docs.beam.cloud/v2/getting-started/quickstart.md
```
This is handy for piping a single page into an LLM or referencing it from an agent.
## Copy and open in your assistant
Every page has a menu in the top-right corner that lets you:
* **Copy page** as Markdown to paste into any chat.
* **Open in ChatGPT, Claude, or Perplexity** with the page preloaded.
* **Open in Cursor or VS Code** to use the page as context while you build.
* **Connect via MCP** so your agent can query the docs directly.
## Building on Beam with an agent
When working with a coding agent, point it at the resources above so it has accurate, up-to-date context:
* Share `llms.txt` so the agent knows the structure of the docs.
* Share the relevant `.md` pages (for example, the [Python SDK reference](/v2/reference/py-sdk.md) or [Quickstart](/v2/getting-started/quickstart.md)) for the task at hand.
* Connect the docs MCP server so the agent can search the documentation on demand.
# FAQ
Source: https://docs.beam.cloud/v2/resources/faq
This is an ongoing list of issues people sometimes encounter while using Beam. If you're having an issue, check this list first.
## `concurrency_limit_reached` or `cpu quota exceeded`
We offer three pricing tiers and each has its own CPU and GPU quotas.
| Plan | CPU Quota | GPU Quota |
| ---------- | --------- | --------- |
| Free Trial | 10 | 5 |
| Developer | 10 | 5 |
| Team | 1,000 | 20 |
| Growth | 10,000+ | 100+ |
If you get this message, make sure you've added a payment method to your account and [selected the pay-as-you-go developer plan on this page](https://platform.beam.cloud/settings/plans).
## `Unable to connect to gateway`
Make sure you're on the latest version of the `beam-client` CLI.
```bash theme={null}
uv tool upgrade beam-client
```
Run this command to validate your version of the CLI:
```bash theme={null}
beam --version
```
[You can see the latest CLI releases
here](https://github.com/beam-cloud/beam-client/releases).
## `No space left on device`
This error typically occurs when your app runs out of disk space. For example, if you're downloading a 30Gi file and your app only has 8Gi of memory, you might see this error.
For more information on configuring RAM for your apps, [read more on this page](/v2/environment/gpu#configuring-cpu-and-memory).
## `cannot import name 'App' from 'beam'`
If you're seeing this error, it's because you're trying to use Beam V2 with a V1 app. There is no `App` class in Beam V2.
For more information on using Beam V2, [read more on this page](/v2/releases/v2-upgrade).
## `Unable to locate config file`
This typically happens when there are multiple Python environments on your computer.
If you are using Conda, we recommend exiting Conda and using a standard Python
Virtual Environment instead: `python3 -m virtualenv .venv && source
.venv/bin/activate`
The most common way of solving this is by running `which python` and installing `beam-client` to that specific path.
For example:
```bash theme={null}
$ which python
python: aliased to /usr/bin/python3 # gotcha!
$ /usr/bin/python3 -m virtualenv .venv && source .venv/bin/activate
$ (.venv) /usr/bin/python3 -m pip install --upgrade beam-client
```
## Tensorflow Can't Find GPUs
If you're using Tensorflow, you might run into an issue when `tf` doesn't recognize the available GPUs on the device.
Make sure to install `tensorflow[and-cuda]`, otherwise the regular version of
`tf` won't have access to the GPU device.
```python theme={null}
from beam import Image, endpoint, env
if env.is_remote():
import tensorflow as tf
@endpoint(
name="tensorflow-gpu",
cpu=1,
memory="4Gi",
gpu="A10G",
# Make sure to use `tensorflow[and-cuda]` in order to access GPU resources
image=Image().add_python_packages(["tensorflow[and-cuda]"]),
)
def predict():
# Show available GPUs
gpus = tf.config.list_physical_devices("GPU")
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
print("Is built with CUDA:", tf.test.is_built_with_cuda())
print("Is GPU available:", tf.test.is_gpu_available())
print("GPUs available:", tf.config.list_physical_devices("GPU"))
```
# Pricing and Billing
Source: https://docs.beam.cloud/v2/resources/pricing-and-billing
Beam is serverless, which means your apps will scale-to-zero by default. Billing is based on the lifecycle of your containers. You are only charged when your containers are running.
## What am I charged for?
You are charged whenever a container is running. This includes:
* Running any code you've defined in an [`on_start`](/v2/endpoint/loaders) function
* Running your application code
* Any [`keep_warm_seconds`](/v2/endpoint/keep-warm) you've set
## What am I *not* charged for?
* Waiting for a machine to start
* Pulling your container image
## Default Container Spin-down Times
After handling a request, Beam keeps containers running ("warm") for a certain amount of time in order to quickly handle future requests. By default, these are the container "keep warm" times for each deployment type:
| Deployment Type | Container Keep Warm Duration |
| ----------------------- | ---------------------------- |
| Endpoints/ASGI/Realtime | 180s |
| Task Queues | 10s |
| Pods | 600s |
## Real-World Example
You've deployed a REST API. You've added two Python Packages in your `Image()`, which are loaded when your app first starts.
You've also added a `keep_warm_seconds=300`, which will keep the container alive for 300 seconds (5 minutes) after each request.
```python app.py theme={null}
from beam import endpoint
# This runs once when the container first starts
def load_models():
return {}
@endpoint(keep_warm_seconds=300, on_start=load_models)
def predict():
return {}
```
Let's pretend you deploy this and call the API. Suppose it takes:
* 1s to boot your application and run your `on_start` function.
* 100ms to run your task.
* 300s to keep the container alive, based on the `keep_warm_seconds` argument.
You would be billed for a total of 301.1 seconds.
# Configuration
Source: https://docs.beam.cloud/v2/sandbox/configuration
Learn how to configure and customize your sandbox environment
Sandboxes are configurable cloud environments. You control CPU, memory, GPU, dependencies, environment variables, and storage so each sandbox can be customized exactly to your needs.
## Basic
Start with the simplest configuration:
```python theme={null}
from beam import Sandbox, Image
# Default: 1 CPU, 128MB RAM, Python 3.11
sandbox = Sandbox()
sb = sandbox.create()
```
This gives you a minimal environment that works well for simple scripts and running untrusted code. For most real work, you'll want to customize the resources.
## Advanced compute settings
You can increase CPU, memory, or assign a GPU - depending on your use case:
```python theme={null}
# Simple scripts, web scraping
sandbox = Sandbox(cpu=1.0)
# Data processing, API development
sandbox = Sandbox(cpu=2.0, memory="16Gi")
# Machine learning, complex analysis, rendering
sandbox = Sandbox(cpu=4.0, memory="16Gi", gpu="A10G")
```
## Customize your environment
See the [Images](/v2/environment/custom-images) section for more information on how to customize the runtime.
## Persistent Storage
Beam supports two types of persistent storage: fast distributed volumes and cloud buckets you already manage.
### Distributed Storage Volumes
Mount fast storage volumes to persist files between sessions:
```python theme={null}
from beam import Volume
# Mount a storage volume to your sandbox
volume = Volume(name="documents", mount_path="/workspace/documents")
sandbox = Sandbox(volumes=[volume])
```
Use volumes when you:
* Are working on a project that spans multiple sessions
* Need to share data between different sandbox instances
* Want to keep work safe even if sandbox crashes
### Cloud Buckets
For large datasets or team sharing, you can use your own buckets:
```python theme={null}
from beam import CloudBucket
# Connect to your S3 bucket
bucket = CloudBucket(
bucket_name="my-data-bucket",
mount_path="/data"
)
sandbox = Sandbox(volumes=[bucket])
```
Use cloud buckets for:
* Sensitive data
* Connecting existing object storage
* Long-term data storage in your own infrastructure
## Session Management
### Timeout Configuration
Set timeouts to control costs:
```python theme={null}
# Quick tasks (testing, simple scripts)
sandbox = Sandbox(keep_warm_seconds=1800) # 30 minutes
# Development sessions
sandbox = Sandbox(keep_warm_seconds=3600) # 1 hour
# Long-running tasks (training, processing)
sandbox = Sandbox(keep_warm_seconds=7200) # 2 hours
# Manual termination only
sandbox = Sandbox(keep_warm_seconds=-1)
```
Start with shorter timeouts and increase as needed. You can always create a new sandbox if you need more time.
### Manual vs Automatic Termination
```python theme={null}
# Auto-terminate after 1 hour
sandbox = Sandbox(keep_warm_seconds=3600)
# Manual termination only (you control when it stops)
sandbox = Sandbox(keep_warm_seconds=-1)
```
Use manual termination for:
* Long-running training jobs
* Collaborative development sessions
* When you need to pause and resume work
## Environment Variables and Secrets
### Environment Variables
Pass configuration to your applications:
```python theme={null}
sandbox = Sandbox(
env={
"DATABASE_URL": "postgresql://user:pass@host:5432/db",
"API_KEY": "your-api-key",
"DEBUG": "true",
"ENVIRONMENT": "development"
}
)
```
Environment variables are good for:
* Keeping sensitive data out of your code
* Configuring applications for different environments
* Sharing configuration across team members
### Secrets Management
You can attach secrets to your sandbox using Beam's secret management system - they will be exposed as environment variables inside the Sandbox:
```python theme={null}
sandbox = Sandbox(secrets=["OPENAI_API_KEY"])
```
You add secrets using the [Beam CLI](https://docs.beam.cloud/v2/reference/cli#create-a-secret):
```sh theme={null}
$ beam secret create OPENAI_API_KEY ASIAY34FZKBOKMUTVV7A
=> Created secret with name: 'OPENAI_API_KEY'
```
Use secrets for:
* Database passwords
* API keys and tokens
* Private keys and certificates
## Best Practices
### Start Small, Scale Up
```python theme={null}
# Start with minimal resources
sandbox = Sandbox(cpu=1.0, memory="1Gi")
# If you need more power, create a new sandbox
powerful_sandbox = Sandbox(cpu=4.0, memory="8Gi")
```
You only pay for what you use. Start small and scale up as needed.
## Common Mistakes
### Over-provisioning
```python theme={null}
# Don't do this for simple scripts
sandbox = Sandbox(cpu=8.0, memory="32Gi", gpu="A10G") # Overkill!
```
Start with minimal resources and scale up as needed.
### Including Unnecessary Packages
```python theme={null}
# Don't include packages you don't need
from beam import PythonVersion
image = Image(python_version=PythonVersion.Python311).add_python_packages([
"flask", "django", "fastapi", "tornado", "bottle" # Pick one!
])
```
Only include packages you actually use.
### Long Timeouts for Short Tasks
```python theme={null}
# Don't set 2-hour timeout for 5-minute tasks
sandbox = Sandbox(keep_warm_seconds=7200) # Wasteful!
```
Try to match timeout to expected task duration. You can always extend the timeout dynamically like so:
```python theme={null}
sandbox = Sandbox(keep_warm_seconds=300)
# Do some stuff
sandbox.update_ttl(300) # Reset TTL back to 300 seconds
```
## What's Next?
Now that you understand configuration, let's put it to work:
* **[Process Management](/v2/sandbox/processes)**: Run code and commands in your configured environment
* **[File System Operations](/v2/sandbox/filesystem)**: Upload, download, and manage files
* **[Networking](/v2/sandbox/networking)**: Deploy web services and expose them to the internet
* **[Examples](/v2/sandbox/overview)**: See real-world configurations in action
# File System Operations
Source: https://docs.beam.cloud/v2/sandbox/filesystem
Upload, download, and manage files within your sandbox environment
Each sandbox has a built-in file system API available at `sb.fs`. You can upload local files, download files from the sandbox, list directories, and manage files with full metadata access.
## Uploading Files
### Basic File Upload
```python theme={null}
from beam import Sandbox, Image, PythonVersion
sandbox = Sandbox(image=Image(python_version=PythonVersion.Python311))
sb = sandbox.create()
# Upload a local file to the sandbox
sb.fs.upload_file("my_script.py", "/workspace/my_script.py")
```
### Uploading Multiple Files
```python theme={null}
# Upload several files
files_to_upload = [
("main.py", "/workspace/main.py"),
("requirements.txt", "/workspace/requirements.txt"),
("data.csv", "/workspace/data/data.csv"),
("config.yaml", "/workspace/config/config.yaml")
]
for local_path, sandbox_path in files_to_upload:
sb.fs.upload_file(local_path, sandbox_path)
print(f"Uploaded {local_path} to {sandbox_path}")
```
## Downloading Files
### Basic File Download
```python theme={null}
# Download a file from the sandbox
sb.fs.download_file("/workspace/output.txt", "local_output.txt")
# Download to a specific directory
sb.fs.download_file("/workspace/results/data.csv", "downloads/data.csv")
```
### Downloading Multiple Files
```python theme={null}
# Download all files in a directory
files = sb.fs.list_files("/workspace/results")
for file in files:
if not file.is_dir:
local_path = f"downloads/{file.name}"
sb.fs.download_file(f"/workspace/results/{file.name}", local_path)
print(f"Downloaded {file.name}")
```
## File Management
### Listing Files and Directories
```python theme={null}
# List files in workspace
files = sb.fs.list_files("/workspace")
for file in files:
if file.is_dir:
print(f"{file.name}/")
else:
print(f"{file.name} ({file.size} bytes)")
```
### File Information
```python theme={null}
# Get detailed information about a file
file_info = sb.fs.stat_file("/workspace/my_script.py")
print(f"Name: {file_info.name}")
print(f"Size: {file_info.size} bytes")
print(f"Is Directory: {file_info.is_dir}")
print(f"Permissions: {oct(file_info.permissions)}")
print(f"Owner: {file_info.owner}")
print(f"Group: {file_info.group}")
print(f"Modified: {file_info.mod_time}")
```
# Networking
Source: https://docs.beam.cloud/v2/sandbox/networking
Expose ports dynamically for services running inside your sandbox
The Sandbox provides some basic network tools. You can run web services and expose them to the internet behind SSL-terminated endpoints. This is useful for web development, API testing, and running interactive applications with LLMs (think v0, reflex.build, etc).
## Exposing Ports
You can expose ports from your Sandbox in two ways: statically at creation time, or dynamically at runtime.
### Static Port Exposure
Specify ports when creating the sandbox to have them exposed immediately with public URLs:
```python theme={null}
from beam import Sandbox, Image, PythonVersion
# Create a sandbox with pre-exposed ports
sandbox = Sandbox(
image=Image(python_version=PythonVersion.Python311),
ports=[8000, 8080, 3000] # Ports exposed at creation
)
sb = sandbox.create()
# URLs are immediately available
urls = sb.list_urls()
for port, url in urls.items():
print(f"Port {port} exposed at: {url}")
```
### Dynamic Port Exposure
You can also expose ports dynamically after the sandbox is created using the `expose_port()` method.
#### Expose a port
```python theme={null}
from beam import Sandbox, Image, PythonVersion
sandbox = Sandbox(image=Image(python_version=PythonVersion.Python311))
sb = sandbox.create()
# Expose port 8000
url = sb.expose_port(8000)
print(f"Port 8000 exposed at: {url}")
# The URL will be something like:
# https://384ced3c-f837-4429-bada-39e0b965c9f4-8000.app.beam.cloud
```
#### Expose multiple ports
```python theme={null}
# Expose multiple ports
ports = [8000, 8080, 3000]
urls = {}
for port in ports:
url = sb.expose_port(port)
urls[port] = url
print(f"Port {port} exposed at: {url}")
# Access different services
print(f"Main app: {urls[8000]}")
print(f"Admin panel: {urls[8080]}")
print(f"API server: {urls[3000]}")
```
#### List exposed ports/preview URLs
```python theme={null}
# List exposed ports & preview URLs
urls = sb.list_urls()
for port, url in urls.items():
print(f"Port {port} exposed at: {url}")
```
## Network Security
### Blocking Outbound Traffic
You can block all outbound network access from your Sandbox while still allowing inbound connections to exposed ports. This is useful for security-sensitive workloads or when executing untrusted code.
```python theme={null}
from beam import Sandbox, Image, PythonVersion
# Create a sandbox with blocked outbound network
sandbox = Sandbox(
image=Image(python_version=PythonVersion.Python311),
block_network=True, # Block all outbound traffic
)
sb = sandbox.create()
# The sandbox can still receive requests on exposed ports
url = sb.expose_port(8000)
print(f"Port 8000 exposed at: {url}")
# But it cannot make outbound connections to external services
```
With `block_network=True`, the Sandbox can receive requests on exposed ports but cannot initiate outbound connections to external services.
### Allow Lists (CIDR Ranges)
For more fine-grained control, you can specify an allow list of CIDR ranges that your Sandbox is permitted to connect to. All other outbound traffic will be blocked.
```python theme={null}
from beam import Sandbox, Image, PythonVersion
# Create a sandbox with an allow list
sandbox = Sandbox(
image=Image(python_version=PythonVersion.Python311),
allow_list=[
"8.8.8.8/32", # Allow Google DNS
"10.0.0.0/8", # Allow private network range
"2001:db8::/32", # Allow IPv6 range
],
)
sb = sandbox.create()
# The sandbox can only connect to addresses in the allow list
# All other outbound traffic is blocked
```
**Important Notes:**
* Maximum of 10 CIDR entries per Sandbox
* Supports both IPv4 and IPv6 addresses
* Must use proper CIDR notation (e.g., `"8.8.8.8/32"` for a single IP, `"10.0.0.0/8"` for a range)
* Cannot use `allow_list` and `block_network` together - they are mutually exclusive
* Invalid CIDR values will trigger an error at creation time
### Updating Network Permissions at Runtime
You can dynamically update the network permissions of a running Sandbox without restarting it. This allows you to change access policies during the sandbox's lifetime.
```python theme={null}
from beam import Sandbox, Image, PythonVersion
# Create a sandbox with no network restrictions
sandbox = Sandbox(image=Image(python_version=PythonVersion.Python311))
sb = sandbox.create()
# Later, block all outbound traffic
sb.update_network_permissions(block_network=True)
# Or update to use an allowlist instead
sb.update_network_permissions(
allow_list=[
"8.8.8.8/32", # Allow Google DNS
"10.0.0.0/8", # Allow private network range
]
)
# Remove all restrictions
sb.update_network_permissions(block_network=False, allow_list=[])
```
**Important Notes:**
* Cannot use `block_network=True` and `allow_list` together - they are mutually exclusive
* Exposed ports remain accessible regardless of network restrictions
* Changes take effect immediately without requiring a restart
# Overview
Source: https://docs.beam.cloud/v2/sandbox/overview
Run anything in secure code execution environments
Sandboxes are ultra-fast, Python-native environments for running any workload - with GPUs, networking, and persistent storage - in seconds.
## Features
* **Ultra Fast Boot Times**: Sandboxes cold boot in 1–3 seconds, even with dependencies included.
* **Image Caching**: Beam caches dependencies in your base image, so subsequent sandboxes boot faster. You can also build custom images for each app.
* **Snapshots**: Create Snapshots of the filesystem and restart Sandboxes from a previous state.
* **Preview URLs**: Dynamically expose ports behind SSL-terminated, authenticated endpoints.
* **Session Management**: Keep sandboxes running indefinitely, or configure them to shut down automatically after any period you choose.
## Quick Start
Create a sandbox, run some code, and see the results:
```python theme={null}
from beam import PythonVersion, Image, Sandbox
# Create a sandbox with the tools you need
sandbox = Sandbox(image=Image(python_version=PythonVersion.Python311))
# Launch it into the cloud
sb = sandbox.create()
# Run some code - this happens in the cloud, not on your machine!
result = sb.process.run_code("print('hello from the sandbox!')").result
print(result)
# Clean up - shut down the sandbox
sb.terminate()
```
## Running a Node.js server
You can run arbitrary code on Beam. It doesn't need to be Python!
For example, let's run a Node server. We'll track the startup time too:
```python theme={null}
import time
from beam import Image, Sandbox
start = time.time()
# Create a sandbox on port 3000
sb = Sandbox(image=Image().from_registry("node:20")).create()
url = sb.expose_port(3000)
# Terminate sandbox after 5 minutes
sb.update_ttl(300)
# Run some code
sb.process.exec("sh", "-c", "npx http-server -p 3000 -c-1")
elapsed = time.time() - start
print(f"Node app running at: {url}")
print(f"Sandbox started in {elapsed:.2f} seconds")
```
## Core Features
### Process Management
Run Python code, shell commands, or start long-running processes:
```python theme={null}
# Run some Python code
result = sb.process.run_code("print('Hello from sandbox!')")
print(result)
# Execute arbitrary shell commands
process = sb.process.exec("ls", "-la", "/workspace")
print(process.logs.read())
process.wait()
# Expose a port to the internet
url = sb.expose_port(8000)
print(f"Sandbox running here: {url}")
# Start a web server in the background
server_process = sb.process.exec("python3", "-m", "http.server", "8000")
try:
for line in server_process.logs:
print(line, end="")
finally:
sb.terminate()
```
### File System Operations
Upload local files, download results, and manage your workspace:
```python theme={null}
# Upload local files to the sandbox
sb.fs.upload_file("my_script.py", "/workspace/my_script.py")
# Run it
result = sb.process.run_code("exec(open('/workspace/my_script.py').read())")
# Download a file from the sandbox to your local
sb.fs.download_file("/workspace/output.csv", "local_results.csv")
```
### Dynamic Preview URLs
Expose ports to make your services accessible over the internet:
```python theme={null}
# Start a Flask app
process = sb.process.exec("python3", "app.py", cwd="/workspace")
# Expose it to the world
url = sb.expose_port(5000)
print(f"Your app is live at: {url}")
```
## Key Concepts
### SandboxInstance
When you create a sandbox, you get a `SandboxInstance` class that provides:
* `process`: Run commands and code with real-time output
* `fs`: Upload, download, and manage files
* `expose_port()`: Make your services accessible to the internet
* `terminate()`: Cleanup when you're done
### Lifecycle
1. **Create**: Configure your environment (CPU, memory, packages, etc.)
2. **Launch**: Start the sandbox with `create()`
3. **Use**: Execute code, manage files, expose services
4. **Terminate**: Clean up with `terminate()` (or let it auto-terminate)
## What's Next?
Now that you understand what Sandbox can do, let's dive deeper into each capability:
* **[Configuration](/v2/sandbox/configuration)**: Learn how to customize your sandbox for different use cases
* **[Process Management](/v2/sandbox/processes)**: Master running code and commands with real-time feedback
* **[File System Operations](/v2/sandbox/filesystem)**: Upload, download, and manage files inside your Sandbox
* **[Networking](/v2/sandbox/networking)**: Deploy web services and expose them to the internet
* **[Examples](/v2/sandbox/overview)**: See real-world patterns and workflows
# Process Management
Source: https://docs.beam.cloud/v2/sandbox/processes
Execute code and commands with real-time output streaming in your sandbox
The Sandbox provides process management through the `process` property. You can execute Python code, run shell commands, and manage long-running processes with real-time output streaming.
## Running Python Code
### Basic Code Execution
```python theme={null}
from beam import Sandbox, Image, PythonVersion
sandbox = Sandbox(image=Image(python_version=PythonVersion.Python311))
sb = sandbox.create()
# Run simple Python code
result = sb.process.run_code("print('Hello from sandbox!')")
print(result.result) # Hello from sandbox!
print(f"Exit code: {result.exit_code}") # 0
```
### Complex Python Scripts
```python theme={null}
# Multi-line Python code
code = """
import numpy as np
import pandas as pd
# Generate sample data
data = np.random.randn(1000, 3)
df = pd.DataFrame(data, columns=['A', 'B', 'C'])
# Calculate statistics
stats = df.describe()
print("Data Statistics:")
print(stats)
# Save results
df.to_csv('/workspace/data.csv', index=False)
print("Data saved to /workspace/data.csv")
"""
response = sb.process.run_code(code)
print(response.result)
```
### Error Handling
```python theme={null}
# Code with errors
response = sb.process.run_code("""
import nonexistent_module
print("This won't execute")
""")
print(f"Exit code: {response.exit_code}") # Non-zero exit code
print(f"Error output: {response.result}") # Error message
```
## Executing Commands
### Basic Command Execution
```python theme={null}
# Run a simple command
process = sb.process.exec("ls", "-la", "/workspace")
# Wait for completion
exit_code = process.wait()
print(f"Command completed with exit code: {exit_code}")
# Get all output
for line in process.logs:
print(line, end="")
```
### Shell Commands
```python theme={null}
# Use shell features
process = sb.process.exec("echo $HOME && pwd && whoami")
# Wait and get output
process.wait()
for line in process.logs:
print(line, end="")
```
### Working Directory
```python theme={null}
# Execute in specific directory
process = sb.process.exec("ls", "-la", cwd="/workspace")
process.wait()
# Create directory and work in it
sb.process.run_code("import os; os.makedirs('/workspace/myproject', exist_ok=True)")
process = sb.process.exec("touch", "test.txt", cwd="/workspace/myproject")
```
### Environment Variables
```python theme={null}
# Set environment variables for command
env = {
"DATABASE_URL": "postgresql://localhost/mydb",
"DEBUG": "true",
"API_KEY": "secret-key"
}
process = sb.process.exec("env", "|", "grep", "DATABASE", env=env)
process.wait()
```
## Non-blocking Execution
### Background Processes
```python theme={null}
# Start a long-running process without waiting
process = sb.process.run_code("""
import time
for i in range(10):
print(f"Processing {i}...")
time.sleep(1)
""", blocking=False)
print(f"Process started with PID: {process.pid}")
# Do other work while it runs
print("Process is running in background...")
# Check if still running
print(f"Exit code: {process.exit_code}") # -1 if still running
# Wait for completion when ready
process.wait()
print("Process completed!")
```
### Real-time Output Streaming
```python theme={null}
# Start process and stream output in real-time
process = sb.process.run_code("""
import time
for i in range(5):
print(f"Step {i}: Processing...")
time.sleep(1)
print("Done!")
""", blocking=False)
# Stream output as it happens
for line in process.logs:
print(f"[REAL-TIME] {line}", end="")
```
## Process Control
### Process Management
```python theme={null}
# Start multiple processes
process1 = sb.process.exec("sleep", "30", blocking=False)
process2 = sb.process.exec("sleep", "60", blocking=False)
print(f"Process 1 PID: {process1.pid}")
print(f"Process 2 PID: {process2.pid}")
# List all running processes
for p in sb.process.list_processes():
print(f"PID {p.pid}: {p.status()}")
# Kill specific process
process1.kill()
print("Process 1 killed")
# Get process by PID
specific_process = sb.process.get_process(process2.pid)
print(f"Process 2 status: {specific_process.status()}")
```
### Process Status and Monitoring
```python theme={null}
# Start a process
process = sb.process.exec("sleep", "10", blocking=False)
# Monitor status
while True:
exit_code, status = process.status()
print(f"PID {process.pid}: Exit code {exit_code}, Status: {status}")
if exit_code >= 0:
print("Process completed")
break
time.sleep(1)
```
### Process Output Streams
```python theme={null}
# Start process with output
process = sb.process.run_code("""
import sys
print("This goes to stdout")
print("This also goes to stdout", file=sys.stdout)
print("This goes to stderr", file=sys.stderr)
""", blocking=False)
# Read stdout
print("=== STDOUT ===")
for line in process.stdout:
print(f"STDOUT: {line}", end="")
# Read stderr
print("=== STDERR ===")
for line in process.stderr:
print(f"STDERR: {line}", end="")
# Read combined logs
print("=== COMBINED LOGS ===")
for line in process.logs:
print(f"LOG: {line}", end="")
```
### List running processes
```python theme={null}
# List running processes
processes = sb.process.list_processes()
for pid, process in processes.items():
print(f"PID {process.pid}: {process.exit_code}")
```
# Snapshots
Source: https://docs.beam.cloud/v2/sandbox/snapshots
Snapshots let you capture the filesystem and/or memory of a Sandbox as an immutable artifact.
You can then use this artifact to create new Sandboxes with that same captured state.
Use snapshots when you want to:
* Fork Sandboxes to test different variations of code
* Initialize Sandboxes with existing state for faster cold starts
* Save a reproducible environment you can return to later
## Creating a Filesystem Snapshot
```python theme={null}
from beam import Image, Sandbox
# Create a sandbox and make some changes to the filesystem
sandbox = Sandbox(cpu=1).create()
p = sandbox.process.exec("sh", "-c", "mkdir -p /something && touch /something/file.txt")
# Read the logs
p.wait()
print(p.logs.read())
# Generate a filesystem snapshot and terminate the sandbox
image_id = sandbox.create_image_from_filesystem()
sandbox.terminate()
```
## Using Filesystem Snapshots
You can use Snapshots as a base image for any other abstraction or Sandbox on Beam, using `Image.from_id`:
```python theme={null}
from beam import Image, Sandbox
# Creates an image from a filesystem snapshot
image = Image.from_id(image_id)
sandbox = Sandbox(image=image).create()
p = sandbox.process.exec("ls", "-l", "/something")
p.wait()
print(p.logs.read())
sandbox.terminate()
```
## Creating a Memory Snapshot
You can also create a memory snapshot of a running Sandbox, which will capture the state of the sandbox's memory - including all running processes and exposed ports.
```python theme={null}
from beam import Image, Sandbox
# Create a sandbox and make some changes to the filesystem
sandbox = Sandbox(cpu=1).create()
sandbox.expose_port(8000)
p = sandbox.process.exec("python", "-c", "import http.server; http.server.HTTPServer(('', 8000), http.server.SimpleHTTPRequestHandler).serve_forever()")
# Generate a memory snapshot and terminate the sandbox
snapshot_id = sandbox.snapshot_memory()
print(sandbox.list_urls())
sandbox.terminate()
```
## Using Memory Snapshots
You can use memory snapshots as a starting point for a new Sandbox, using `Sandbox().create_from_memory_snapshot`:
```python theme={null}
from beam import Image, Sandbox
# Creates a new sandbox from a memory snapshot
sandbox = Sandbox().create_from_memory_snapshot(snapshot_id)
print(sandbox.list_urls())
sandbox.terminate()
```
# Scaling Out
Source: https://docs.beam.cloud/v2/scaling/concurrency
You can scale out your app to multiple containers by adding autoscaling.
## Scaling Horizontally (Adding More Containers)
When you deploy a Task Queue or endpoint, Beam creates a queueing system that manages each task that's created when your API is called.
You can configure how Beam will scale based on how many things are in the task queue.
### Scale by Queue Depth
Our simplest autoscaling strategy allows you to scale by the number of tasks in the queue.
This allows you to control how many tasks each container can process before scaling up. For example, you could setup an autoscaler to run 30 tasks per container. When you pass 30 tasks in your queue,
we will add a container. When you pass 60, we'll add another containers (up until `max_containers` is reached).
```python theme={null}
from beam import QueueDepthAutoscaler, endpoint
autoscaling_config = QueueDepthAutoscaler(
max_containers=5,
tasks_per_container=30,
)
@endpoint(autoscaler=autoscaling_config)
def function():
...
```
## Setting Always-On Containers
Any running containers count towards billable usage. Take care to avoid
setting `min_containers` unless you're comfortable paying for usage 24/7.
You can configure the number of containers running at baseline using the `min_containers` field.
By setting `min_containers=1`, 1 container will *always* remain running until the deployment is stopped.
```python theme={null}
from beam import endpoint, QueueDepthAutoscaler
@endpoint(
autoscaler=QueueDepthAutoscaler(
min_containers=1, max_containers=3, tasks_per_container=1
),
)
def handler():
return {"success": "true"}
```
If you redeploy an app that has `min_containers` set, make sure to explicitly
stop the previous deployment versions in order to avoid running containers
that you are no longer using.
# Concurrent Inputs
Source: https://docs.beam.cloud/v2/scaling/concurrent-inputs
## Increasing Throughput in a Single Container
You can increase throughput for your workloads by configuring the number of workers to launch per container. For example, if you have 4 workers on 1 container, you can run 4 tasks at once.
Workers are especially useful for CPU workloads, since you can increase throughput by adding workers and additional CPU cores, rather than using autoscaling to additional containers.
```python theme={null}
from beam import Image, QueueDepthAutoscaler, task_queue
@task_queue(
cpu=4, # 1 CPU core per worker is a good rule of thumb
workers=4, # Launch 2 workers per container to increase throughput
image=Image(python_version="python3.8", python_packages=["pandas", "csaps"]),
autoscaler=QueueDepthAutoscaler(max_containers=5, tasks_per_container=1),
)
def handler():
import pandas as pd
print(pd)
import time
time.sleep(5)
return {"result": True}
```
Workers are always orchestrated together. When the container launches, all the workers launch.
This can result in higher throughput than using multiple containers with horizontal autoscaling.
### Worker Use-Cases
Workers allow you to increase your *per container* throughput, vertically.
Autoscaling allows to scale the *number of containers* and increase throughput
horizontally
Each worker will share the CPU, Memory, and GPU defined in your app. This means that you'll usually need to increase these values in order to utilize more workers.
For example, let's say your model use 3Gi of GPU memory, and your app is deployed on a T4 GPU with 16Gi of GPU memory. You can safely deploy with 4 workers, since those will fit within the 16Gi of GPU memory available.
It's not always possible to fit multiple workers onto a single machine. In order to use workers effectively, you'll need to know how much compute is consumed by your task.
When you've added multiple workers, you'll be able to see each time a new worker is started in your logs:
# Parallelizing Functions
Source: https://docs.beam.cloud/v2/scaling/parallelizing-functions
How to parallelize your functions
## Fanning Out Workloads
You can scale out individual Python functions to many containers using the `.map()` method.
You might use this for parallelizing computational-heavy tasks, such as batch inference or data processing jobs.
```python theme={null}
from beam import function
@function(cpu=0.1)
def square(i: int):
return i**2
def main():
numbers = list(range(10))
squared = []
# Run a remote container for every item in list
for result in square.map(numbers):
print(result)
squared.append(result)
if __name__ == "__main__":
main()
```
When we run this Python module, 10 containers will be spawned to run the workload:
```bash theme={null}
$ python math-app.py
=> Building image
=> Using cached image
=> Syncing files
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Running function:
=> Function complete
=> Function complete <531e1f2c-a4f2-4edf-9cb9-6240df959815>
=> Function complete
=> Function complete <2a3dde03-20df-4805-8fb0-ec9743f2bde3>
=> Function complete <59b64517-7b4a-4260-8c65-d0fbb9b98a76>
=> Function complete
=> Function complete <1256a9ac-c035-412a-ac65-c94248f1ce99>
=> Function complete <476189dd-ba28-4646-9911-96ef8794cb58>
=> Function complete <04ef44cd-ff64-4ef2-a087-00c01ce5a2e4>
=> Function complete <104a602c-93a7-43d5-983c-071f64d91a2c>
```
## Passing Multiple Arguments
The `.map()` method can also parallelize functions that require multiple parameters. Simply pass a list of tuples, where each tuple corresponds to a set of arguments for your function.
Below is an example that counts how many prime numbers appear between a start and a stop index for each tuple in ranges:
```python theme={null}
from beam import function
def is_prime(n: int) -> bool:
if n < 2:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
@function(cpu=0.1)
def count_primes_in_range(start: int, stop: int) -> int:
"""
Returns the number of prime numbers in the range [start, stop).
"""
return sum(is_prime(i) for i in range(start, stop))
def main():
# Each tuple represents (start, stop)
ranges = [
(0, 10),
(10, 20),
(20, 30)
]
# .map() will launch a remote container for each tuple
for result in count_primes_in_range.map(ranges):
print(result)
if __name__ == "__main__":
main()
```
In this example:
1. `ranges` is a list of tuples `(start, stop)`.
2. Calling `count_primes_in_range.map(ranges)` spawns a remote execution for each tuple, passing `(start, stop)` to the function.
3. Each remote call returns the number of prime numbers in that sub-range, which we print out.
With `.map()`, Beam takes care of distributing each item (or tuple of items) to separate containers for parallel processing. This approach makes it easy to scale out CPU-heavy or data-intensive tasks with minimal code.
# Compute Pools
Source: https://docs.beam.cloud/v2/scaling/pools
Compute pools are groups of dedicated machines reserved for your workloads. Instead of running on shared, on-demand capacity, your containers are scheduled onto nodes that belong only to you.
Pools are useful when you need guaranteed capacity, consistent hardware, or want to run Beam workloads on your own machines.
## Creating a Pool
You can reserve dedicated hardware from the dashboard. Open the **Add Hardware** dialog, choose an instance type, and give your pool a name. Any nodes you reserve under the same pool name will be added to that pool.
After clicking **Reserve & install**, the nodes are provisioned and joined to your pool automatically.
Pools remain active until they are explicitly terminated. You will continue
to be billed for reserved nodes until you remove them from your account.
## Using a Pool
Once your pool is set up, route workloads to it by passing the `pool` argument to your function decorator:
```python theme={null}
from beam import function
@function(gpu="H200", pool="H200-pool")
def handler():
return {}
```
Any tasks for this function will be scheduled onto machines in the pool, rather than on-demand serverless capacity.
## Bring Your Own Hardware
You can also connect your own machines to a pool. From the **Add Hardware** dialog, choose **Bring your own hardware**, enter a pool name, and run the generated command on your machine:
Machines that run this command join the pool and appear in your fleet automatically.
# Privacy Policy
Source: https://docs.beam.cloud/v2/security/privacy-policy
**Your privacy is important to us.** This Privacy Policy document explains the
collection, use and disclosure of information that we receive through Slai. This
Privacy Policy does not apply to any third-party websites, services or
applications, even if they are accessible through our Services.
**Effective Date**: October 3, 2024
This policy describes how Smartshare Inc. (“we,” “us,” or “Company”) collects,
aggregates, stores, safeguards and uses the data and information (including
non-public personal information, or “NPI”) provided by users through our
websites, [www.beam.cloud](http://www.beam.cloud) and [www.slai.io](http://www.slai.io/) (the “Site”), as well as
information collected by us through other means, including by email, over the
phone, or in offline communications. This Site is operated by the Company and
has been created to provide information about our company, products, and
services (together, the “Services”). This policy applies to the Site, the
Services, and our mobile, tablet and other smart device applications, and
application program interfaces (collectively, "Application"). The Site,
Application and Services together are hereinafter collectively referred to as
the “Site.”
**We take your privacy and the security of your information seriously.**
This policy explains:
* What information we collect
* How we use the information we collect
* Choices you can make about the way your information is collected and used
* How we protect personal information electronically and physically
This policy is incorporated into and a material term of your registration and/or
use of Company’s products and services, including our website,
[www.slai.io.](http://www.slai.io/) and [www.beam.cloud](http://www.beam.cloud/) By using the Site, you consent to the
practices set forth in this Privacy Policy.
**INFORMATION WE COLLECT**
**Information You Provide to Us**
Company collects information from you when you choose to provide it to us
through the Site or through any other means. This may include when you create an
account on the Site, register or request products or services, request
information from us, sign up for newsletters or our email lists, use our Site,
or otherwise contact us.
The information we collect may include your name, address, email address,
telephone or mobile phone number, and information from other third-party
applications you connect to the Site. You may be required to provide certain
personal and/or business information to apply for and receive Company products
or services.
**Information We Automatically Collect**
We may use cookies or other technologies to automatically collect certain
information when you visit our Site or interact with our emails. For example, if
you are responding to an offer, promotional email or other email from us, we may
automatically populate your personal information into our system once you enter
your offer code or similar identifying device or otherwise accept your offer or
promotion. Additionally, we may automatically collect certain non-personal
information from you such as your browser type, operating system, software
version, and Internet Protocol ("IP") address. We also may collect information
about your use of the Site, including the date and time of access, the areas or
pages that you visit, the amount of time you spend using the Site, the number of
times you return, whether you open, forward, or click-through emails, and other
Site usage data.
You may adjust your browser or operating system settings to limit this tracking
or to decline cookies, but by doing so, you may not be able to use certain
features on the Site or take full advantage of all of our offerings. Check the
"Help" menu of your browser or operating system to learn how to adjust your
tracking settings or cookie preferences. Please note that our system may not
respond to Do Not Track requests or headers from some or all browsers.
**HOW WE USE INFORMATION WE COLLECT**
Company uses the data and information you provide in a manner that is consistent
with this Privacy Policy and applicable law. If you provide personal data for a
certain reason, we may use the personal data in connection with the reason for
which it was provided. For instance, if you contact us by email, we will use the
personal data you provide to answer your question or resolve your problem. Also,
if you provide personal data in order to obtain access to the Site, we will use
your personal data to provide you with access to the Site and to monitor your
use of the Site.
Company may also use your personal data and other personally non-identifiable
information collected through the Site or the provision of the Services to help
us improve the content and functionality of the Site or the Services, to better
understand our users and to improve the Site and the Services. Company and its
affiliates may use this information to contact you in the future to tell you
about services we believe will be of interest to you. If at any time you wish
not to receive any future marketing communications or you wish to have your name
deleted from our mailing lists, please contact us as indicated below.
**SHARING OF INFORMATION WE COLLECT**
Company is not in the business of selling your information. There are, however,
certain circumstances in which we may share your personal data with certain
third parties without further notice to you, as set forth below:
**Agents, Consultants and Third Party Service Providers:**
Company, like many businesses, sometimes hires other companies to perform
certain business-related functions. Examples of such functions include mailing
information, maintaining databases and processing payments. When we employ
another entity to perform a function of this nature, we only provide them with
the information that they need to perform their specific function.
**Business Transfers:**
As we develop our business, we might sell or buy businesses or assets. In the
event of a corporate sale, merger, reorganization, dissolution or similar event,
personal data may be part of the transferred assets.
**Related Companies:**
We may also share your personal data with our corporate affiliates and
subsidiaries, if any, for purposes consistent with this Privacy Policy.
**Legal Requirements:**
Company may disclose your personal data if required to do so by law or in the
good faith belief that such action is necessary to (i) comply with a legal
obligation, (ii) protect and defend the rights or property of Company, (iii) act
in urgent circumstances to protect the personal safety of users of the Site, the
Services or the public, or (iv) protect against legal liability.
**LINKS TO OTHER WEBSITES**
The Site may have links to third-party websites, which may have privacy policies
that differ from our own. We are not responsible for the practices of such
sites, nor does any such link imply that Company endorses or has reviewed the
third-party site subject to such link. We suggest contacting those sites
directly for information on their privacy policies.
**CHILDREN AND MINORS**
Company does not knowingly collect personal data from minors under the age of
18\. If you are under the age of 18, please do not submit any personal data
through the Site. We encourage parents and legal guardians to monitor their
children’s Internet usage and to help enforce our Privacy Policy by instructing
their children never to provide personal data without the parent’s permission.
If you have reason to believe that a minor under the age of 18 has provided
personal data to Company through the Site, please contact us, and we will
endeavor to delete that information from our databases.
**DATA SECURITY**
We have taken certain physical, administrative, and technical steps to safeguard
the information we collect from and about our customers and Site visitors. While
we make reasonable efforts to help ensure the integrity and security of our
network and systems, we cannot guarantee our security measures. Therefore, you
should take special care in deciding what information you send to us via email.
Please keep this in mind when disclosing any personal data to the Company via
the Internet.
**ACCESS TO YOUR PERSONAL INFORMATION**
To keep your personal data accurate, current, and complete, please contact us as
specified below. We will take reasonable steps to update or correct personal
information in our possession that you have previously submitted via the Site.
**INFORMATION FOR CALIFORNIA RESIDENTS**
This section applies to the personal information we may collect from California
residents that use the Site and our compliance with the California Consumer
Privacy Act (“CCPA”).
The CCPA provides California residents that use the Site, subject to
limitations, the right to request more details about the types or specific
pieces of personal information we collect (as described in the “Information We
Collect” section), to delete their personal information, to opt out of any
“sales” that may be occurring, and to not be discriminated against for making
requests protected by the CCPA.
California residents that use the Site may make a request pursuant to their
rights under the CCPA by contacting us at [support@slai.io](mailto:support@slai.io). Please mark your
inquiry “CCPA Request”. We will verify your request using the information you
have provided to us in use of the Site, including email address.
Government-issued identification may be required in order to process your
request. California residents that use the Site can also designate an authorized
agent to exercise these rights on their behalf.
**INFORMATION FOR EEA USERS**
If you live outside of the United States, you understand and agree that we may
transfer your information to the United States. This site is subject to U.S.
laws, which may not afford the same level of protection of those in your
country. If you do not want your information transferred to the United States,
do not use the Site.
***What Rights Do I Have?***
Individuals located in the European Economic Area (EEA) have certain rights in
respect of your NPI, including the right of access, rectification, restriction,
opposition, erasure and data portability. Where possible, we rely on user
consent as a lawful basis for processing personal data and obtain such consent
in compliance with applicable laws. In some cases, Company may process NPI
pursuant to legal obligation or to protect your vital interests or those of
another person.
***How May I Exercise My Individual Rights?***
Company users may access and update their NPI by sending an email to
[support@slai.io](mailto:support@slai.io). Users located within the EEA may contact us with questions or
requests regarding their NPI using the contact information below. Please note
that Company may request additional information from you to verify your identity
before we disclose any personal or account information. We only send marketing
communications to users we believe to be located in the EEA with your prior
consent, and you may opt out of such communications at any time by clicking the
“unsubscribe” link found within Company email updates and changing your contact
preferences. Please note, you will continue to receive essential account-related
information, even if you unsubscribe from promotional emails.
***Who Can I Contact at Company Regarding Data Protection Issues?***
Company has designated a Data Protection Officer to assist with data privacy and
data protection issues. You may contact him or her by emailing
[support@slai.io](mailto:support@slai.io) and addressing your questions or
concerns to the Data Protection Officer.
**IF YOU HAVE QUESTIONS**
If you have any questions about this Privacy Statement or the practices
described herein, you may contact us at [support@slai.io](mailto:support@slai.io).
**CHANGES TO THIS STATEMENT**
Company reserves the right to revise this Privacy Policy at any time. When we
do, we will post the change(s) on the Site. This Privacy Policy was last updated
on the date indicated above. Your continued use of the Site after any changes or
revisions to this Privacy Policy shall indicate your agreement with the terms of
such revised Privacy Policy.
# Terms and Conditions
Source: https://docs.beam.cloud/v2/security/terms-and-conditions
These are the terms the Beam Platform is provided under.
**Date last updated**: January 23, 2025
**IMPORTANT – CAREFULLY READ ALL THE TERMS AND CONDITIONS OF THIS BEAM TERMS OF
SERVICE AND THE BEAM [PRIVACY POLICY](/v2/security/privacy-policy), WHICH IS
INCORPORATED BY REFERENCE, (COLLECTIVELY THE "AGREEMENT"). BY CREATING AN
ACCOUNT TO USE THE BEAM PLATFORM AS A SERVICE ("SERVICE"),
CLICKING "I ACCEPT", OR PROCEEDING WITH THE USE OF THE SERVICE, INDIVIDUALLY,
AND/OR YOU AS AN AUTHORIZED REPRESENTATIVE OF YOUR COMPANY ON WHOSE BEHALF YOU
USE THE SERVICE ("YOU"), ARE INDICATING THAT YOU HAVE READ, UNDERSTOOD AND
ACCEPT THIS AGREEMENT WITH SMARTSHARE, INC., A DELAWARE CORPORATION ("Beam"),
AND THAT YOU AGREE TO BE BOUND BY THE TERMS. YOU AGREE THAT YOU WILL (A) INFORM
ANY EMPLOYEES OR CONTRACTORS AT YOUR COMPANY OF THE POLICIES AND PRACTICES THAT
ARE RELEVANT TO THEIR USE OF THE SERVICES AND OF ANY SETTINGS THAT MAY IMPACT
THE PROCESSING OF THEIR DATA; AND (B) ENSURE THE TRANSFER AND PROCESSING OF ANY
SUCH EMPLOYEE'S OR CONTRACTOR'S DATA UNDER THIS AGREEMENT IS LAWFUL. IF YOU DO
NOT AGREE WITH ALL OF THE TERMS OF THIS AGREEMENT, YOU MAY NOT USE THE SERVICE.
THE EFFECTIVE DATE OF THIS AGREEMENT SHALL BE THE DATE THAT YOU REGISTER TO USE
THE SERVICE**
**PLEASE NOTE: THESE TERMS OF SERVICE CONTAINS AN ARBITRATION CLAUSE AND CLASS
ACTION WAIVER THAT APPLIES TO ALL USERS. If You reside in the United States,
this provision applies to all disputes with Beam. If You reside outside of the
United States, this provision applies to any action You bring against Beam in
the United States. It affects how disputes with Beam are resolved. By accepting
these Terms of Service, You agree to be bound by the arbitration clause and
class action waiver. Please read it carefully.**
## SERVICE
1. Provision of Service. Beam grants You the right to access and use the Service
in accordance with this Agreement and Your applicable subscription
("Subscription") indicated on the order form and/or online checkout. You will
comply with all user documentation and all laws, rules, and regulations
applicable to the use of Service.
2. Restrictions on use of the Service. You may not: (i) modify, alter, tamper
with, repair, or otherwise create derivative works of the Service; (ii)
reverse engineer, disassemble, or decompile the Service or apply any other
process or procedure to derive the source code of the Service; (iii) access
or use the Service in a way intended to avoid incurring fees or exceeding
usage limits or quotas; (iv) rent, transfer, resell, or sublicense the
Service; (v) attempt to disable or circumvent any security, billing, or
monitoring mechanisms used by the Service; (vi) use the Service to perform a
malicious activity; (vii) upload or otherwise process any malicious content
to or through the Service; or (viii) benchmark or perform competitive
analysis on the Service. The specific Subscription You select may have
limitations as outlined in the applicable Subscription order form and/or
online checkout.
3. Updates to the Service. Beam may from time to time make updates to the
Service as it deems reasonably necessary, and this Agreement shall apply to
such updated Service. Your continued use of the updated Service indicates
Your acceptance of the updated.
4. Use of the Services may require the use of certain third party products and
services (" **Third Party Services**"). Use of any Third Party Services is at
your sole risk and will be governed by separate terms and conditions,
separate privacy policies relating to usage of data you may share through the
Third Party Services in the course of using the Services, other applicable
policies, and may include separate fees and charges. Beam may display content
from third parties through the Services or may provide information about or
links to Third Party Services. Your interactions with any such third parties,
and any terms, conditions, warranties, or representations associated with
such interactions, are solely between you and the applicable third parties.
Beam is not responsible or liable for any loss or damage of any sort incurred
as the result of any such interactions or as the result of the presence of
such third-party information made available through the Services.
## REGISTRATION; SUBSCRIPTION AND FEES
1. Registration. To register to use the Service, You must provide Beam with the
information requested in the registration process, including Your name and
work email address. You are responsible for all activities that occur under
Your account; Beam and Beam's affiliates are not responsible for unauthorized
access to Your account. You will contact Beam immediately if You believe an
unauthorized third party may be using Your account or if Your account
information is lost or stolen. You will provide complete and accurate
information during the registration process and will update it to ensure it
remains accurate.
2. Some parts of the Services are billed on a Subscription basis. You will be
billed on a recurring and periodic basis ("Billing Cycle") with payment terms
as set forth on the applicable order form and/or online checkout. Billing
cycles are set either on calendar month or annual basis, depending on the
type of Subscription plan You select when purchasing a Subscription. At the
end of each Billing Cycle, Your Subscription will automatically renew for
additional successive periods of equal duration to the initial Subscription
term unless You cancel it before the end of the then current Subscription
period. If a free trial period applies to You, Your Subscription will be
charged upon the expiration of any applicable free trial period.
Subscriptions canceled prior to the expiration of any trial period will not
be charged. You may cancel Your Subscription renewal by contacting Beam
customer support team at [support@slai.io](mailto:support@slai.io), or
through the account management portal where applicable.
3. A valid payment method is required to process the payment for Your
Subscription. You shall provide Beam with accurate and complete billing
information including full name, address, state, zip code, telephone number,
and a valid payment method information. By submitting such payment
information, You automatically authorize Beam to charge all Subscription fees
incurred through Your account to any such payment instruments. Should
automatic billing fail to occur for any reason, Beam will issue an electronic
invoice indicating that You must proceed manually, within a certain deadline
date, with the full payment corresponding to the billing period as indicated
on the invoice.
4. Beam, in its sole discretion and at any time, may modify the Subscription
fees for the Subscriptions. Any Subscription fee change will become effective
at the end of the then-current Billing Cycle. Beam will provide You with a
reasonable prior notice of any change in Subscription fees to give You an
opportunity to terminate Your Subscription before such change becomes
effective. Your continued use of the Services after the Subscription fee
change comes into effect constitutes Your agreement to pay the modified
Subscription fee amount.
5. Unless otherwise agreed to in the applicable order form, all fees are payable
in the currency of the United States through our payment processor
("Stripe"). You will be responsible for all taxes resulting from the
performance of the Service other than taxes on Beam's income. If all or any
part of any payment owed to Beam under this Agreement is withheld, based upon
a claim that such withholding is required pursuant to the tax laws of any
country or its political subdivisions and/or any tax treaty between the U.S.
and any such country, such payment shall be increased by the amount necessary
to result in a net payment to Beam of the amounts otherwise payable under
this Agreement. All fees paid or payable under this Agreement are
non-refundable and Subscriptions are non-cancelable during the Subscription
term. Beam may change its fees and payment terms at its discretion.
6. Payments through Stripe. In order to make payments to Beam, You may be
required to provide Your credit card details to Stripe. Payment processing
services by Stripe are subject to the Stripe Security Policy, found
[here](https://stripe.com/docs/security/stripe), and the Stripe Privacy
Policy, found [here](https://stripe.com/privacy), which Stripe may update
from time to time. As a condition of Beam enabling payment processing
services through Stripe, You agree to provide Beam accurate and complete
information about You and Your business, and You authorize Beam to share it
and transaction information (exclusive of any credit or debit card numbers,
details or associated passwords) related to Your use of the payment
processing services provided by Stripe.
7. Communications. You expressly agree that Beam, or its payment processor, is
permitted to bill You any applicable fees, any applicable tax and any other
charges You may incur with Beam in connection with Your use of the Service.
The fees will be billed to the credit card or other payment account You
provide in accordance with the billing terms in effect at the time the fees
are due and payable. You acknowledge and agree that Beam will automatically
charge Your credit card or other payment account on record with Beam. If
payment is not received or cannot be charged to Your credit card account for
any reason, Beam reserves the right to either suspend or terminate Your
access to the Service and terminate this Agreement. By using the Service, You
consent to receiving electronic communications from Beam. These electronic
communications may include notices about applicable fees and charges related
to the Service and transactional or other information concerning or related
to the Service. These electronic communications are part of Your relationship
with Beam and You receive them as part of Your use of the Service. You agree
that any notices, agreements, disclosures or other communications that we
send You electronically will satisfy any legal communication requirements,
including that such communications be in writing.
8. Acceptable Use. In addition to the prohibitions set forth in Section 1(b)
above, You agree not to, and not to allow third parties to use the Service:
to violate, or encourage the violation of, the legal rights of others (for
example, infringing or misappropriate the intellectual property rights of
others in violation of the Digital Millennium Copyright Act); to engage in,
promote or encourage illegal activity; for any unlawful, invasive,
infringing, defamatory or fraudulent purpose (for example, this may include
phishing, creating a pyramid scheme or mirroring a website); to intentionally
distribute viruses, worms, Trojan horses, corrupted files, hoaxes, or other
items of a destructive or deceptive nature; to interfere with the use of the
Service, or the equipment used to provide the Service, by customers,
authorized resellers, or other authorized users; to disable, interfere with
or circumvent any aspect of the Service; to generate, distribute, publish or
facilitate unsolicited mass email, promotions, advertisements or other
solicitations ("spam"); or to use the Service, or any interfaces provided
with the Service in a manner that violates the terms of this Agreement. If
You become aware of any use or content that is in violation of the foregoing
Acceptable Use restrictions, You agree to promptly remedy such use or
content. If You fail to do so, Beam or its providers may suspend or disable
access to the Service (including Your Data) until You comply.
## INTELLECTUAL PROPERTY RIGHTS AND OWNERSHIP
1. Beam Rights. This Agreement does not transfer any right, title or interest
in any intellectual property right to each other, except as expressly set
forth in this Agreement. Beam owns all rights, title and interest in and to
the Service. There are no implied rights. Beam reserves all rights not
expressly granted herein.
2. We welcome and encourage You to provide feedback, comments and suggestions
for improvements to the Service ("Feedback"). You may submit Feedback by
emailing us through the Contact section of the website, or by other means of
communication. Any Feedback You submit to us will be considered
non-confidential and non-proprietary to You. By submitting Feedback to us,
You grant us a non-exclusive, worldwide, royalty-free, irrevocable,
sub-licensable, perpetual license to use and publish those ideas and
materials for any purpose, without compensation to You and without the
obligation to identify You.
3. Your Rights in Your Data. You represent and warrant to Beam that: (1) You or
Your licensors own all right, title, and interest in and to any and all
permitted electronic data uploaded and stored by You in the Service ("Your
Data"); (2) You have all rights in Your Data necessary to grant the rights
contemplated by this Agreement; and (3) none of Your Data violates this
Agreement, any applicable law or regulation or any third party's
intellectual property or other right. For the avoidance of doubt, as between
Beam and You, You will retain all right, title and interest in all Your Data
and to all models and analyses created by You or Your authorized personnel
using the Services.
## YOUR DATA
1. You are solely responsible for the development, content, operation,
maintenance, and use of Your Data. You will ensure that Your Data, and Your
use of it, complies with this Agreement and any applicable laws and
regulations. You are responsible for properly configuring and using the
Service and taking Your own steps to maintain appropriate security,
protection and backup of Your Data. You hereby consent that Beam may use Your
Data, the queries and models You submit to the Service, and metadata about
Your usage of the Service to measure and improve the Service and support Your
usage of the Service. If You include any data about any individual in Your
use of the Service, (1) Beam will hold and store Your Data on Your behalf,
and You are the data controller of such data; (2) Beam will process personal
data in compliance with this Section, Your instructions and in accordance
with Beam's privacy policy (3) You agree to follow all applicable
instructions to parameterize Your Data as set forth in the
Beam[ documentation](https://docs.slai.io) ("Documentation"); and (4) You
warrant that: (a) Your instructions to Beam comply with applicable privacy
and data protection laws and regulations, (b) You have all appropriate
consents and an appropriate lawful basis to provide the data to the Service,
and (c) You have provided proper privacy notifications to individuals as
required by applicable laws and regulations. If You are located in the
European Union or will transmit any of Your Data that includes personal data
regarding a resident of the European Union, You may contact us at [dpa@slai.io](mailto:dpa@slai.io)
to request a data processing addendum that is pre-signed by Beam and You
agree that under this Agreement, Beam is merely a data processor. Beam will
use commercially reasonable efforts designed to prevent the unauthorized
disclosure or destruction of Your Data stored with Beam in accordance with
our [Security Policy](/v2/security/terms-and-conditions).
2. HIPAA Data. You agree not to upload to any Service any HIPAA data unless You
have entered into BAA with Beam. Unless a BAA is in place, Beam will have no
liability under this Agreement for HIPAA data, notwithstanding anything to
the contrary in this Agreement or in HIPAA or any similar federal or state
laws, rules or regulations. If You are permitted to submit HIPAA data to a
Service, then You may submit HIPAA data to Beam and/or the Service only by
uploading it as Customer Data. Upon mutual execution of the BAA, the BAA is
incorporated by reference into this Agreement and is subject to its terms.
## CONFIDENTIAL INFORMATION
"Confidential Information" means any proprietary information that is marked
"confidential" or "proprietary" or any other similar term or in relation to
which its confidentiality should by its nature be inferred or, if disclosed
orally, is identified as being confidential at the time of disclosure and,
within two (2) weeks thereafter, is summarized, appropriately labeled and
provided in tangible form, received by the other party during, or prior to,
entering into this Agreement including, without limitation, the Service and any
non-public technical and business information. Confidential Information does not
include information that (i) is or becomes generally known to the public through
no fault of or breach of this Agreement by the receiving party; (ii) is
rightfully known by the receiving party at the time of disclosure without an
obligation of confidentiality; (iii) is independently developed by the receiving
party without the use of the disclosing party's Confidential Information; or
(iv) the receiving party rightfully obtains from a third party without
restriction on use or disclosure. You and Beam will maintain the confidentiality
of Confidential Information. The receiving party of any Confidential Information
of the other party agrees not to use such Confidential Information for any
purpose except as necessary to fulfill its obligations and exercise its rights
under this Agreement. The receiving party shall protect the secrecy of and
prevent disclosure and unauthorized use of the disclosing party's Confidential
Information using the same degree of care that it takes to protect its own
confidential information and in no event shall use less than reasonable care.
The receiving party may disclose the Confidential Information of the disclosing
party if required by judicial or administrative process, provided that the
receiving party first provides to the disclosing party prompt notice of such
required disclosure (to the extent allowed) to enable the disclosing party to
seek a protective order. Upon termination or expiration of this Agreement, the
receiving party will destroy (and provide written certification of such
destruction) the disclosing party's Confidential Information.
### TERM; TERMINATION
1. Term; Termination. The term of this Agreement commences when You accept this
Agreement (such as by creating an account or proceeding with the use of the
Service) and will remain in effect until terminated in accordance with this
Agreement. You may terminate this Agreement at any time by canceling Your
account by contacting us at [support@slai.io](mailto:support@slai.io) or
through the account management portal where applicable. Beam may terminate
this Agreement at any time on thirty (30) days advance notice. Beam may also
terminate Your account and this Agreement, or suspend Your account,
immediately if (i) Beam changes the way Beam provides or discontinues the
Service; (ii) Your account was suspended under Section 7 of this Agreement
and You have not remediated the reason for the suspension; or (iii) Beam
determines that: (1) Your use of the Service poses a security risk to the
Service or any third party; (2) Your use of the Service may adversely impact
other users of the Service; (3) Your use of the Service may subject Beam,
Beam's affiliates, or any third party to liability; (4) Your use of the
Service may be fraudulent; (5) You are in breach of this Agreement; or (6)
You have ceased to operate in the ordinary course, made an assignment for the
benefit of creditors or similar disposition of Your assets, or become the
subject of any bankruptcy, reorganization, liquidation, dissolution or
similar proceeding.
2. Effect of Termination. Upon termination of this Agreement (i) all Your rights
under this Agreement immediately terminate and You must cease using the
Service, (ii) You are solely responsible for deleting or retrieving Your Data
from the Service prior to termination for any reason, and (iii) You must pay
all unpaid fees to Beam. If either party terminates Your account or this
Agreement, Beam will provide You with a reasonable opportunity to retrieve
Your Data from the Service, if You so request. Such a request must be sent by
email to Beam at [support@slai.io](mailto:support@slai.io) within seven (7)
days after You receive notice regarding the termination. In any event, Your
Data will be deleted from the Service no earlier than thirty (30) days after
the termination notice regarding Your account has been sent to You.
3. You understand and agree that Beam may change, suspend or discontinue any
part of the Service and Service as a whole. Beam will notify You of any
material change to or discontinuation of the Service by email or via Beam's
website. If Beam discontinues the Service (excluding for Your breach), You
will receive a pro-rata refund for any pre-paid but unused fees.
### SUSPENSION
Without limiting other available remedies included in this Agreement or
otherwise, Beam may suspend Your access to the Service if You are in
non-compliance with this Agreement.
1. **WARRANTY AND WARRANTY DISCLAIMER.**
1. Beam warrants that the Service will materially conform to the
specifications set forth in the applicable Documentation for the duration
of Your Subscription term. If Beam is unable to correct any reported
non-conformity with this warranty, Beam may terminate the applicable
Subscription and as Your sole remedy, You will be entitled to receive a
pro-rata refund of any prepaid but unused Subscription fees. This warranty
will not apply if the error or non-conformance was caused by misuse of the
Service, or third-party hardware, software, or services used in connection
with the Service.
2. You should regularly back up Your Data while using the Service. Beam
PROVIDES THE SERVICE ON AN "AS IS" BASIS. Beam DOES NOT MAKE ANY
WARRANTIES REGARDING THE PERFORMANCE OF THE SERVICE OR UPTIME OF THE
SERVICE, OR THAT YOUR USE OF THE SERVICES WILL BE SECURE, UNINTERRUPTED OR
ERROR FREE, WHETHER EXPRESS, IMPLIED, STATUTORY OR OTHERWISE. Beam
EXPRESSLY DISCLAIMS ALL WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE
IMPLIED WARRANTIES OF NON-INFRINGEMENT OF THIRD PARTY RIGHTS, TITLE,
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. Beam HAS NO
RESPONSIBILITY FOR LOSS OF YOUR DATA OR INABILITY TO USE THE SERVICE FOR
ANY REASONS, INCLUDING, WITHOUT LIMITATION, IF DUE TO THE ACTS OR
OMISSIONS OF ITS THIRD PARTY HOSTING PROVIDERS.
### LIMITATION OF LIABILITY.
NEITHER SLAI, ITS AFFILIATES OR THEIR LICENSORS ARE LIABLE FOR SPECIAL,
INCIDENTAL, CONSEQUENTIAL OR INDIRECT DAMAGES, INCLUDING WITHOUT LIMITATION,
LOST PROFITS, LOST SAVINGS, OR DAMAGES ARISING FROM LOSS OF USE, LOSS OF
QUERIES, CONTENT OR DATA OR ANY ACTUAL OR ANTICIPATED DAMAGES, REGARDLESS OF THE
LEGAL THEORY ON WHICH SUCH DAMAGES MAY BE BASED, AND EVEN IF BEAMHAS BEEN
ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. SLAI’S AND SLAI’S AFFILIATES' AND
LICENSORS' AGGREGATE LIABILITY FOR ANY PERMITTED DIRECT DAMAGES UNDER THIS
AGREEMENT WILL BE LIMITED TO THE GREATER OF (i) THE AMOUNT OF ONE HUNDRED
DOLLARS; OR (ii) THE FEES THAT YOU HAVE ACTUALLY PAID OR PAYABLE TO BEAMFOR THE
RELEVANT SERVICES WITHIN THE SIX (6) MONTH PERIOD IMMEDIATELY PRECEDING THE
EVENT GIVING RISE TO THE CLAIM FOR DAMAGES. SECTION 9 ON LIMITATION OF LIABILITY
AND SECTION 8 ABOVE ON WARRANTY DISCLAIMER FAIRLY ALLOCATE THE RISKS IN THIS
AGREEMENT. THIS ALLOCATION IS AN ESSENTIAL ELEMENT OF THE BASIS OF THE BARGAIN
BETWEEN THE PARTIES AND THAT THE LIMITATIONS SPECIFIED IN THIS SECTION 9 SHALL
APPLY NOTWITHSTANDING ANY FAILURE OF THE ESSENTIAL PURPOSE OF THESE TERMS OR ANY
LIMITED REMEDY HEREUNDER.
### INDEMNIFICATION
You will, at Beam's option, defend, indemnify, and hold Beam, Beam's affiliates
and licensors, and each of their respective employees, officers, directors, and
representatives harmless from and against any claims, damages, losses,
liabilities, costs, and expenses (including reasonable legal fees) arising out
of or relating to any third party claim concerning: (a) breach of this Agreement
or violation of applicable law or regulation by You; (b) Your Data or the
combination of Your Data with other applications, content or processes,
including any claim involving alleged infringement or misappropriation of
third-party rights by Your Data or by the use, development, design, production,
advertising or marketing of Your Data; or (c) the use of the Services. Beam will
promptly notify You of any claim subject to this Section, but Beam's failure to
promptly notify You will only affect Your obligations to the extent that Beam's
failure prejudices Your ability to defend the claim. You may: (a) use counsel of
Your own choosing (subject to Beam's written consent) to defend against any
claim; and (b) settle the claim as You deem appropriate, provided that You
obtain Beam's prior written consent before entering into any settlement.
### GENERAL
1. 1. Miscellaneous. Beam and You are independent contractors, and neither
party, nor any of their respective affiliates, is an agent of the other
for any purpose or has the authority to bind the other. This Agreement
does not create any third party beneficiary rights in any individual or
entity that is not a party to this Agreement. You may not assign this
Agreement, or delegate or sublicense any of Your rights under this
Agreement, without Beam's prior written consent. Beam may without
restriction, assign, transfer or delegate this Agreement and any rights
and obligations hereunder, at its sole discretion, with 30 days prior
notice. Your right to terminate this Agreement at any time remains
unaffected. A party's failure to enforce any provision of this Agreement
will not constitute a present or future waiver of such provision nor
limit that party's right to enforce such provision at a later time. If
any portion of this Agreement is held to be invalid or unenforceable,
the remaining portions of this Agreement will remain in full force and
effect. In any action or proceeding to enforce rights under this
Agreement, the prevailing party shall be entitled to recover costs and
attorneys' fees. 2. Entire Agreement. This Agreement is the entire
agreement between You and Beam regarding the subject matter of this
Agreement. This Agreement supersedes all prior or contemporaneous
representations, understandings, agreements, or communications between
You and Beam, whether written or verbal, regarding the subject matter of
this Agreement. 3. Notice. All communications and notices to be made or
given pursuant to this Agreement must be in English. Beam may provide
any notice to You under this Agreement by posting a notice in the
Service or sending a message to the email address associated with Your
account. You will be deemed to have received any email sent to the email
address then associated with Your account when Beam sends the email,
whether or not You actually receive the email. To give Beam notice under
this Agreement, You must (1) email Beam at [legal@slai.io](mailto:legal@slai.io), or (2) send
Beam Your notice by certified mail, return receipt requested, to Beam at
1 Broadway 14th Floor, Cambridge MA 02142, Attn: Smartshare.
2. Dispute Resolution and Arbitration Agreement and Choice of Law and
Jurisdiction
3. This Dispute Resolution and Arbitration Agreement shall apply if You (i)
reside in the United States; or (ii) do not reside in the United States,
but bring any claim against Beam in the United States.
4. AGREEMENT TO ARBITRATE. ANY CONTROVERSY OR CLAIM ARISING OUT OF OR
RELATING TO THIS AGREEMENT, OR THE BREACH THEREOF, SHALL BE SETTLED BY
ARBITRATION ADMINISTERED BY THE AMERICAN ARBITRATION ASSOCIATION IN
ACCORDANCE WITH ITS COMMERCIAL ARBITRATION RULES, AND JUDGMENT ON THE
AWARD RENDERED BY THE ARBITRATOR MAY BE ENTERED IN ANY COURT HAVING
JURISDICTION THEREOF. IF THERE IS A DISPUTE ABOUT WHETHER THIS
ARBITRATION AGREEMENT CAN BE ENFORCED OR APPLIES TO OUR DISPUTE, YOU AND
BeamAGREE THAT THE ARBITRATOR WILL DECIDE THAT ISSUE.
5. 1. Pre-Arbitration Dispute Resolution and Notification. Prior to
initiating an arbitration, You and Beam each agree to notify the
other party of the dispute and attempt to negotiate an informal
resolution to it first. We will contact You at the email address You
have provided to us; You can contact Beam's customer service team by
emailing us at the contact addresses provided on the Site. If after a
good faith effort to negotiate, one of us feels the dispute has not
and cannot be resolved informally, the party intending to pursue
arbitration agrees to notify the other party via email prior to
initiating the arbitration. In order to initiate arbitration, a claim
must be filed with the AAA and the written Demand for Arbitration
(available at
[www.adr.org](http://www.adr.org/cs/idcplg?IdcService=GET%5FFILE\&dDocName=ADRSTAGE2034889\&RevisionSelectionMethod=LatestReleased))
provided to the other party, as specified in the AAA Rules.
2. Exceptions to Arbitration Agreement. You and Beam each agree that the
following claims are exceptions to the Arbitration Agreement and will
be brought in a judicial proceeding in a court of competent
jurisdiction: (i) Any claim related to actual or threatened
infringement, misappropriation or violation of a party's copyrights,
trademarks, trade secrets, patents, or other intellectual property
rights; (ii) Any claim seeking emergency injunctive relief based on
exigent circumstances (e.g., imminent danger or commission of a
crime, hacking, cyber-attack).
3. Arbitration Rules and Governing Law. This Arbitration Agreement
evidences a transaction in interstate commerce and thus the Federal
Arbitration Act governs the interpretation and enforcement of this
provision. The arbitration will be administered by AAA in accordance
with the Commercial Arbitration Rules (the " AAA Rules") then in
effect, except as modified here. The AAA Rules are available at
[www.adr.org](http://www.adr.org/) or by calling the AAA at
1–800–778–7879.
4. Modification to AAA Rules - Arbitration Hearing/Location. You agree
that any required arbitration hearing will be conducted in the
English language by one (1) mutually agreed upon arbitrator, at
Beam's sole and complete discretion, (a) in Delaware or in any other
location to which You and Beam both agree; (b) via phone or video
conference; or (c) for any claim or counterclaim under \$25,000, by
solely the submission of documents to the arbitrator.
5. **JURY TRIAL WAIVER. YOU AND BEAMACKNOWLEDGE AND AGREE THAT WE ARE
EACH WAIVING THE RIGHT TO A TRIAL BY JURY AS TO ALL ARBITRABLE
DISPUTES.**
6. **NO CLASS ACTIONS OR REPRESENTATIVE PROCEEDINGS. YOU AND SLAI
ACKNOWLEDGE AND AGREE THAT WE ARE EACH WAIVING THE RIGHT TO
PARTICIPATE AS A PLAINTIFF OR CLASS USER IN ANY PURPORTED CLASS
ACTION LAWSUIT, CLASS-WIDE ARBITRATION, PRIVATE ATTORNEY-GENERAL
ACTION, OR ANY OTHER REPRESENTATIVE PROCEEDING AS TO ALL DISPUTES.
FURTHER, UNLESS YOU AND BEAMBOTH OTHERWISE AGREE IN WRITING, THE
ARBITRATOR MAY NOT CONSOLIDATE MORE THAN ONE PARTY'S CLAIMS AND MAY
NOT OTHERWISE PRESIDE OVER ANY FORM OF ANY CLASS OR REPRESENTATIVE
PROCEEDING. IF THIS PARAGRAPH IS HELD UNENFORCEABLE WITH RESPECT TO
ANY DISPUTE, THEN THE ENTIRETY OF THE ARBITRATION AGREEMENT WILL BE
DEEMED VOID WITH RESPECT TO SUCH DISPUTE.**
7. Severability. Except as provided in the immediately preceding
paragraph, in the event that any portion of this Arbitration
Agreement is deemed illegal or unenforceable, such provision shall be
severed and the remainder of the Arbitration Agreement shall be given
full force and effect.
8. Changes. Notwithstanding the provisions of Section 3 ("Modification
of These Terms"), if Beam changes this Section ("Dispute Resolution
and Arbitration Agreement") after the date You last accepted these
Terms (or accepted any subsequent changes to these Terms), You may
reject any such change by sending us written notice (including by
email) within thirty (30) days of the date such change became
effective. By rejecting any change, You are agreeing that You will
arbitrate any Dispute between You and Beam in accordance with the
provisions of the "Dispute Resolution and Arbitration Agreement"
section as of the date You last accepted these Terms (or accepted any
subsequent changes to these Terms).
9. Choice of Law; Jurisdiction. If You reside in the United States,
these Terms will be interpreted in accordance with the laws of the
State of Delaware and the United States of America, without regard to
conflict-of-law provisions. Judicial proceedings (other than small
claims actions) that are excluded from the Arbitration Agreement
above must be brought in state or federal court in Delaware, unless
we both agree to some other location. You and we both consent to
venue and personal jurisdiction in Delaware.
2. Survival. All provisions of this Agreement which by their nature should
survive termination shall survive termination, including, without
limitation, accrued payment obligations, ownership provisions, warranty
disclaimers, indemnity, limitations of liability and dispute resolution.
3. 1. Force Majeure. Beam is not liable for any delay or failure to perform any
obligation under this Agreement where the delay or failure results from
any cause beyond Beam's reasonable control, including acts of God, labor
disputes or other industrial disturbances, systemic electrical,
telecommunications, or other utility failures, earthquake, storms or
other elements of nature, blockages, embargoes, riots, acts or orders of
government, acts of terrorism, or war.
2. Government Licensees. The Service is a commercial computer software
program developed solely at private expense. As defined in U.S. Federal
Acquisition Regulations (FAR) section 2.101 and U.S. Defense Federal
Acquisition Regulations (DFAR) sections 252.227-7014(a)(1) and
252.227-7014(a)(5) (or otherwise as applicable to You), the Service
licensed in this Agreement is deemed to be "commercial items" and
"commercial computer software" and "commercial computer software
documentation." Consistent with FAR section 12.212 and DFAR section
227.7202, (or such other similar provisions as may be applicable to You),
any use, modification, reproduction, release, performance, display, or
disclosure of such service commercial item, or service commercial
computer software, or service commercial documentation by the U.S.
government (or any agency or contractor thereof) shall be governed solely
by the terms of this Agreement and shall be prohibited except to the
extent expressly permitted by the terms of this Agreement.
3. Changes to the Terms. Beam reserves the right to modify this Agreement at
any time in accordance with this provision. If we make changes to this
Agreement, we will post this Agreement on the Beam website. If You
disagree with the revised Agreement, You may terminate this Agreement
with immediate effect by following the procedure described in the "Term
and Termination" section. If You do not terminate Your Agreement before
the date the revised Agreement becomes effective, Your continued access
to or use of the Services will constitute acceptance of the revised
Agreement.
# Amazon Web Services
Source: https://docs.beam.cloud/v2/self-hosting/aws
Learn how to deploy Beam OSS (Beta9) to Amazon EKS.
## Prerequisites
* Amazon EKS
* Karpenter
* Helm and kubectl
* Beta9 CLI
## Dependencies
Beta9 uses an S3-compatible object storage system for its file system. In this example, we'll deploy localstack.
Without a Localstack license, its data is temporary. If its pod is deleted, the data will be lost. We recommend that you use AWS S3 or something similar.
```sh theme={null}
helm repo add localstack https://localstack.github.io/helm-charts
helm install localstack localstack/localstack --values=- < Welcome to Beta9! Let's get started
,#@@&&&&&&&&&@&/
@&&&&&&&&&&&&&&&&&&&&@#
*@&&&&&&&&&&&&&&&&&&&&&@/
## /&&&&&&&&&&&&&@&&&&&&&&@,
@&&&&&. (&&&&&&@/ &&&&&&&&&&/
&&&&&&&&&@* %&@. @& ,@&&&&&&&,
.@&&&&&&&&&&& &&* ,@&&&&&&&&
*&&&&&&&&&&&@, %&@/@&* @&&&&&&&&@
.@&&&&&&&&&* *&@ .@&&&&&&&&&&
%&&&&&&&& /@@* .@&&&&&&&&&&@,
&&&&&&&/.#@&&. .&&& %&&&&&@,
/&&&&&&&@%*,,*#@&&( ,@&&
/&&&&&&&&&&&&&&,
#@&&&&&&&&&&,
,(&@@&&&,
Context Name [default]:
Gateway Host [0.0.0.0]:
Gateway Port [1993]:
Token:
Added new context!
```
Confirm the config was created and has a token set.
```sh theme={null}
cat ~/.beta9/config.ini
[default]
token =
gateway_host = localhost
gateway_port = 1993
```
## Setting Configuration Values
Setup your [config file](https://github.com/beam-cloud/beta9/blob/main/pkg/common/config.default.yaml). You will need to set a few values in here and create a secret in your cluster, under the `beta9` namespace.
### Recommended Settings
```yaml theme={null}
gateway:
externalURL: https://app.example.com
imageService:
registryStore: s3
registryCredentialProvider: aws
registries:
s3:
bucketName:
region:
# keys not needed if using iam with k8s service account (irsa)
accessKey:
secretKey:
runner:
baseImageTag: 0.1.10
baseImageName: beta9-runner
baseImageRegistry: public.ecr.aws/n4e0e1y0
worker:
imageTag: 0.1.143
imageName: beta9-worker
imageRegistry: public.ecr.aws/n4e0e1y0
serviceAccountName:
storage:
mode: juicefs
juicefs:
awsS3Bucket:
# keys not needed if using iam with k8s service account (irsa)
awsAccessKey:
awsSecretKey:
```
## Mounting Secrets
Once you've configured the config and created a secret in K8s, you'll need to do two more things:
1. Mount the secret to the gateway by modifying the persistence value in the `values.yaml` file.
2. Add an env var to the gateway called `CONFIG_PATH` that points to where you are mounting the secret.
## IAM Policies
To access the S3 bucket that you need to setup and configure in the config/secret, you'll need to also setup an IAM role that a K8s service account can authenticate with.
This is called [EKS IRSA](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html). Once you figure this out, you'll need to add an annotation to the K8s service account that points to their IAM role.
Here is an example in the `values.yaml` file:
```yaml theme={null}
serviceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam:::role/beta9-role
name: beta9-role
```
We recommend saving secrets with the [External Secrets Operator](https://github.com/external-secrets/external-secrets), but you can also create secrets manually in the cluster.
To create a secret manually, create your secrets file on disk and run `kubectl apply` like you would normally.
## Gotchas
* Make sure your ingress supports GRPC and HTTP
* Your IAM permissions need to be set correctly. You will need to create S3 buckets manually or in Terraform.
* If you are using [Karpenter](https://karpenter.sh/) for your autoscaler, you'll need to add a label to the nodes which you want the Beta9 scheduler to pick up.
# Local Machine
Source: https://docs.beam.cloud/v2/self-hosting/local-machine
Learn how to deploy Beam OSS (Beta9) to your local machine.
## Prerequisites
* Kubernetes
* Helm and kubectl
* Beta9 CLI
## Dependencies
Beta9 uses an S3-compatible object storage system for its file system. In this example, we'll deploy localstack.
Without a Localstack license, its data is temporary. If its pod is deleted, the data will be lost. We recommend that you use AWS S3 or something similar.
```sh theme={null}
helm repo add localstack https://localstack.github.io/helm-charts
helm install localstack localstack/localstack --values=- < Welcome to Beta9! Let's get started
,#@@&&&&&&&&&@&/
@&&&&&&&&&&&&&&&&&&&&@#
*@&&&&&&&&&&&&&&&&&&&&&@/
## /&&&&&&&&&&&&&@&&&&&&&&@,
@&&&&&. (&&&&&&@/ &&&&&&&&&&/
&&&&&&&&&@* %&@. @& ,@&&&&&&&,
.@&&&&&&&&&&& &&* ,@&&&&&&&&
*&&&&&&&&&&&@, %&@/@&* @&&&&&&&&@
.@&&&&&&&&&* *&@ .@&&&&&&&&&&
%&&&&&&&& /@@* .@&&&&&&&&&&@,
&&&&&&&/.#@&&. .&&& %&&&&&@,
/&&&&&&&@%*,,*#@&&( ,@&&
/&&&&&&&&&&&&&&,
#@&&&&&&&&&&,
,(&@@&&&,
Context Name [default]:
Gateway Host [0.0.0.0]:
Gateway Port [1993]:
Token:
Added new context!
```
Confirm the config was created and has a token set.
```sh theme={null}
cat ~/.beta9/config.ini
[default]
token =
gateway_host = localhost
gateway_port = 1993
```
# Overview
Source: https://docs.beam.cloud/v2/self-hosting/overview
Beta9 is the open source project that powers Beam
## Beam vs. Beta9
**Beam and Beta9 have similar functionality.**
You can switch between either product by changing the SDK imports and CLI commands used:
| | [beam.cloud](https://beam.cloud) | [Beta9](https://github.com/beam-cloud/beta9/) |
| ------------ | -------------------------------- | --------------------------------------------- |
| Installation | `uv tool install beam-client` | `pip install beta9` |
| Imports | `from beam import endpoint` | `from beta9 import endpoint` |
| CLI Commands | `beam serve app.py:function` | `beta9 serve app.py:function` |
## Self-Hosting Beta9
## Architecture
## Contributor Guide
We welcome contributions, big or small! These are the most helpful things for us:
* Rank features in our roadmap
* Open a PR
* Submit a [feature request](https://github.com/beam-cloud/beta9/issues/new?assignees=\&labels=\&projects=\&template=feature-request.md\&title=) or [bug report](https://github.com/beam-cloud/beta9/issues/new?assignees=\&labels=\&projects=\&template=bug-report.md\&title=)
# Querying Task Status
Source: https://docs.beam.cloud/v2/task-queue/query-status
You can check the status of any task by querying the `task` API:
```sh theme={null}
https://api.beam.cloud/v2/task/{TASK_ID}/
```
## Task Statuses
Your payload will return the status of the task. These are the possible statuses for a task:
| Status | Description |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `PENDING` | The task is enqueued and has not started yet. |
| `RUNNING` | The task is running. |
| `COMPLETE` | The task completed without any errors. |
| `RETRY` | The task is being retried. Defaults to 3, unless `max_retries` is provided in the function decorator. |
| `CANCELLED` | The task was cancelled by the client. |
| `TIMEOUT` | The task timed out, based on the `timeout` provided in the function decorator. |
| `EXPIRED` | The task remained in the queue and was never picked up by a worker. **For endpoints, this usually occurs when the task does not start running before the request timeout (180 seconds).** |
| `FAILED` | The task did not complete successfully. |
### Request
```sh theme={null}
curl -X GET \
'https://api.beam.cloud/v2/task/{TASK_ID}/' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json'
```
### Response
The response to `/task` returns the following data:
| Field | Type | Description |
| ------------------------- | ------- | ----------------------------------------------------------------------------------------------------------- |
| `id` | string | The unique identifier of the task. |
| `started_at` | string | The timestamp when the task started, in ISO 8601 format. Null if the task hasn't started yet. |
| `ended_at` | string | The timestamp when the task ended, in ISO 8601 format. Null if the task is still running or hasn't started. |
| `status` | string | The current status of the task (e.g., COMPLETE, RUNNING, etc.). |
| `container_id` | string | The identifier of the container running the task. |
| `updated_at` | string | The timestamp when the task was last updated, in ISO 8601 format. |
| `created_at` | string | The timestamp when the task was created, in ISO 8601 format. |
| `outputs` | array | An array containing the outputs of the task. |
| `stats` | object | An object containing statistics about the task's execution environment. |
| `stats.active_containers` | integer | The number of active containers for the task. |
| `stats.queue_depth` | integer | The depth of the queue for the deployment. |
| `stub` | object | An object containing detailed information about the task's configuration and deployment. |
| `stub.id` | string | The identifier of the deployment stub. |
| `stub.name` | string | The name of the deployment stub. |
| `stub.type` | string | The type of the deployment stub. |
| `stub.config` | string | The configuration details of the deployment stub in JSON format. |
| `stub.config_version` | integer | The version number of the deployment stub configuration. |
| `stub.object_id` | integer | The object identifier associated with the deployment stub. |
| `stub.created_at` | string | The timestamp when the deployment stub was created, in ISO 8601 format. |
| `stub.updated_at` | string | The timestamp when the deployment stub was last updated, in ISO 8601 format. |
Here's what the response payload looks like as JSON:
```json theme={null}
{
"id": "c5f01c46-4eb3-4021-9d5f-eae9a08c4aad",
"started_at": "2025-05-22T22:49:03.839612Z",
"ended_at": "2025-05-22T22:49:03.913964Z",
"status": "COMPLETE",
"container_id": "taskqueue-da2e6878-e202-40d4-9b7a-21706f3a2b13-c23f1166",
"updated_at": "2025-05-22T22:49:03.915891Z",
"created_at": "2025-05-22T22:49:03.832363Z",
"outputs": [],
"stats": {
"active_containers": 1,
"queue_depth": 0
},
"stub": {
"id": "da2e6878-e202-40d4-9b7a-21706f3a2b13",
"name": "taskqueue/serve/app:handler",
"type": "taskqueue/serve",
"config": {
"runtime": {
"cpu": 1000,
"gpu": "",
"gpu_count": 1,
"memory": 128,
"image_id": "d055bc4ee4ad0e61",
"gpus": ["A10G"]
},
"handler": "app:handler",
"on_start": "",
"on_deploy": "",
"on_deploy_stub_id": "",
"python_version": "python3",
"keep_warm_seconds": 10,
"max_pending_tasks": 100,
"callback_url": "",
"task_policy": {
"max_retries": 3,
"timeout": 3600,
"expires": "0001-01-01T00:00:00Z",
"ttl": 7200
},
"workers": 1,
"concurrent_requests": 1,
"authorized": true,
"volumes": null,
"autoscaler": {
"type": "queue_depth",
"max_containers": 1,
"tasks_per_container": 1,
"min_containers": 0
},
"extra": {},
"checkpoint_enabled": false,
"work_dir": "",
"entry_point": null,
"ports": null
},
"config_version": 0,
"created_at": "2025-05-22T22:48:57.156033Z",
"updated_at": "2025-05-22T22:48:57.156033Z"
},
"deployment": {
"name": null,
"version": null
}
}
```
## Cancelling Tasks
Tasks can be cancelled through the `api.beam.cloud/v2/task/cancel/` endpoint.
### Request
```bash theme={null}
curl -X DELETE --compressed 'https://api.beam.cloud/v2/task/cancel/' \
-H 'Authorization: Bearer [YOUR_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{"task_ids": ["TASK_ID"]}'
```
This API accepts a list of tasks, which can be passed in like this:
```json theme={null}
{
"task_ids": [
"70101e46-269c-496b-bc8b-1f7ceeee2cce",
"81bdd7a3-3622-4ee0-8024-733227d511cd",
"7679fb12-94bb-4619-9bc5-3bd9c4811dca"
]
}
```
### Response
`200`
```json theme={null}
{}
```
# Running Async Tasks
Source: https://docs.beam.cloud/v2/task-queue/running-tasks
### What Are Task Queues?
Task Queues are great for deploying resource-intensive functions on Beam.
Instead of processing tasks immediately, the task queue enables you to add tasks to a queue and process them later, either sequentially or concurrently.
### Creating a Task Queue
You can run any function as a task queue by using the `task_queue` decorator:
```python theme={null}
from beam import task_queue, Output
@task_queue(cpu=1.0, memory=128)
def handler():
result = 839 * 18
# Save the result to a text file
file_name = "result.txt"
with open(file_name, "w") as f:
f.write(f"The result is: {result}")
# Upload task result to Beam to retrieve later
Output(path=file_name).save()
```
You’ll be able to access the `result.txt` file when the task completes.
**Endpoints vs. Task Queues**
Endpoints are RESTful APIs, designed for synchronous tasks that can complete in 180 seconds or less. For longer running tasks, you'll want to use an async [`task_queue`](/v2/task-queue/running-tasks) instead.
### Sending Async Requests
Because task queues run asynchronously, the API will return a Task ID.
**Example Request**
```bash Request theme={null}
curl -X POST "https://9655d778-58c2-4c5d-8c55-03735b63607e.app.beam.cloud" \
-H 'Authorization: Basic [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
```
**Example Response**
```bash Response theme={null}
{ "task_id": "edbcf7ff-e8ce-4199-8661-8e15ed880481" }
```
### Viewing Task Responses
Because `task_queue` is async, you will need to make a separate API call to retrieve the task output.
### Saving and Returning Output Files
You can save files using Beam's [Output](/v2/reference/py-sdk#output) class.
The code below saves a file, wraps it in an `Output`, and generates a URL that can be retrieved later:
```python app.py theme={null}
from beam import task_queue, Output
@task_queue(
cpu=1.0,
memory=128,
gpu="A10G",
callback_url="https://webhook.site/9b74f73d-9ec1-4c8e-adcc-07c78aafab6d",
)
def handler():
sum = 839 * 18
# Create a new text file with the result
file_name = "sum.txt"
# Write to new text file
with open(file_name, "w") as f:
f.write(f"The sum is: {sum}")
# Save output
output_file = Output(path=file_name)
# Uploads the file to Beam storage
output_file.save()
```
### Retrieving Results
There are two ways to retrieve response payloads:
1. Beam makes a webhook request to your server, based on the [`callback_url`](/v2/topics/callbacks) in your endpoint
2. Saving an `Output` and calling the `/task` API
#### Webhooks
If you've added a [`callback_url`](/v2/topics/callbacks) to your decorator, Beam will fire a webhook to your server with the task response when it completes:
```json theme={null}
{
"data": {
"url": "https://app.beam.cloud/output/id/00894876-38df-42c8-a098-879db17e1bf8"
}
}
```
For testing purposes, you can setup a temporary webhook URL using
[https://webhook.site](https://webhook.site)
#### Polling for Results
`Output` payloads can be retrieved by polling the `task` API:
```bash theme={null}
curl -X GET \
'https://api.beam.cloud/v2/task/{TASK_ID}/' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json'
```
Your Output will be available in the `outputs` list in the response:
```json theme={null}
{
"id": "828a5f6b-0852-44cb-97dc-3c2105b745d3",
"started_at": "2025-05-22T23:19:58.995396Z",
"ended_at": "2025-05-22T23:19:59.061813Z",
"status": "COMPLETE",
"container_id": "taskqueue-2365b036-39df-408f-946f-b25025d1251a-bf09bf62",
"updated_at": "2025-05-22T23:19:59.063168Z",
"created_at": "2025-05-22T23:19:58.950594Z",
"outputs": [
{
"name": "sum.txt",
"url": "https://app.beam.cloud/output/id/c339b459-34de-4f0c-adb9-8be7c20951ce",
"expires_in": 3600
}
],
"stats": {
"active_containers": 1,
"queue_depth": 0
}
}
```
### Retry Behavior
Task Queues include a built-in retry system. If a task fails for any reason,
such as out-of-memory error or an application exception, your task will be
retried three times before automatically moving to a failed state.
### Programmatically Enqueuing Tasks
You can interact with the task queue either through an API (when deployed), or directly in Python through the `.put()` method.
This is useful for queueing tasks programmatically without exposing an
endpoint.
```python app.py theme={null}
from beam import task_queue, Image
@task_queue(
cpu=1.0,
memory=128,
gpu="T4",
image=Image(python_packages=["torch"]),
keep_warm_seconds=1000,
)
def multiply(x):
result = x * 2
return {"result": result}
# Manually insert task into the queue
multiply.put(x=10)
```
If invoked directly from your local computer, the code above will produce this output:
```
$ python app.py
=> Building image
=> Using cached image
=> Syncing files
=> Files synced
Enqueued task: f0d205da-e74b-47ba-b7c3-8e1b9a3c0669
```
# Task Callbacks
Source: https://docs.beam.cloud/v2/topics/callbacks
Setup a callback to your server when a task finishes running
If you supply a `callback_url` argument to your function decorator, Beam will make a POST request to your server whenever a task finishes running. *Callbacks fire for both successful and failed tasks.*
Callbacks include the Beam Task ID in the request headers, and the task response URL-encoded in the request body.
For testing purposes, you can setup a temporary webhook URL using
[https://webhook.site](https://webhook.site)
## Registering a Callback URL
Callbacks can be added onto endpoints, functions, and task queues:
```python theme={null}
from beam import function
@function(callback_url="https://your-server.io")
def handler(x):
return {"result": x}
if __name__ == "__main__":
handler.remote(x=10)
```
## Callback format
### Data Payload
The callback will send the response from your function as JSON, in the `data` field:
```
{
"data": {
"result": 10
}
}
```
## Request headers
The request headers include the following fields:
* `x-task-timestamp` -- timestamp the task was created.
* `x-task-signature` -- signature to verify that the request was sent from Beam.
* `x-task-status` -- status of the task.
* `x-task-id` -- the task ID.
## Request Level Callbacks
There are cases where you might want to define a different `callback_url` for each request, for example if you have different environments for staging and prod.
You can pass `callback_url` as a payload to anything you're running on Beam, and we'll use that as the callback for the request:
```sh theme={null}
curl -X POST \
--compressed 'https://multiply-712408b-v1.app.beam.cloud' \
-H 'Authorization: [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{"callback_url": "https://webhook.site/341d3777-cdd0-4c7e-82cb-dcc06ea4f774"}'
```
When using request-level callbacks, you must include either the `callback_url` value or kwargs (`**inputs`) as input to the handler function:
```python theme={null}
from beam import endpoint
@endpoint()
def handler(callback_url): # Make sure to pass this value!
return {"response": "true"}
@endpoint()
def handler(**inputs): # You can use kwargs too
return {"response": "true"}
```
## Verifying Requests
### Timestamp Verification
To secure your server against replay attacks, a **timestamp** and **signature** are included in the callback request headers.
As a best-practice, it is wise to check the timestamp header of each callback request. If the timestamp is over 5s old, there is a risk that the callback was not fired from Beam.
### Signature Verification
The most secure way of verifying a callback request is through **signature verification**.
Your Signature Token can be found in the dashboard, on the `Settings` -> `General` page.
#### Validating a Signature
The callback request will include a header field called `x-task-signature`.
`x-task-signature` is a unique signature generated by converting the request body to base64, concatenating it with the timestamp, and signing it with your Beam **signature token**.
The code below shows how to validate a callback signature:
```python theme={null}
import base64
import hashlib
import hmac
def verify_signature(
request_body: bytes, secret_key: str, timestamp: int, signature: str
):
# Encode request body to Base64
base64_payload = base64.b64encode(request_body).decode()
# Create data to sign by concatenating base64 payload with timestamp
data_to_sign = f"{base64_payload}:{timestamp}"
# Initialize HMAC with SHA256 and secret key
h = hmac.new(secret_key.encode(), data_to_sign.encode(), hashlib.sha256)
# Compute the HMAC signature
computed_signature = h.hexdigest()
assert signature == computed_signature
```
# Integrate into CI/CD
Source: https://docs.beam.cloud/v2/topics/ci
You can integrate Beam into an existing CI/CD process to deploy your code automatically.
## Automated Deploys
It's fairly straightforward to setup automation for deploying your code to Beam. At a high level, the following steps are all you need:
```sh theme={null}
pip3 install --upgrade pip
pip3 install beam-client
beam configure default --token $BEAM_TOKEN
beam deploy file.py:function
```
## Example: Github Actions
You can setup a Github workflow to deploy your code whenever a new commit is made to your Git repo.
### Setup Environment Variables
First, add your `BEAM_TOKEN` to your [Github Secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#creating-secrets-for-a-repository):
### Create Actions file
For a detailed walk-through of this step, [Github's
documentation](https://docs.github.com/en/actions/quickstart) is the best
resource.
1. Create a directory called `.github/workflows` in your project.
2. In the `.github/workflows` directory, create a file named `beam-actions.yml`
### Deploying to Different Environments
You might want to setup separate Beam apps for your `staging` or `prod` environments.
In your Beam app, you can setup your app name to dynamically update based on the Github branch you've deployed to. `BEAM_DEPLOY_ENV` will get set in our Github Actions script, based on the branch name:
```python app.py theme={null}
from beam import endpoint
import os
@endpoint(name=f'app-{os.getenv("BEAM_DEPLOY_ENV", "staging")}')
def handler():
return {}
```
If you push to the `main` branch, the app `app-prod` will be deployed. If you push to the `staging` branch, `app-staging` will be deployed. You can customize this with your own branch names.
Here's what the Github Action looks like. Make sure you've added a `BEAM_TOKEN` to your [Github Secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#creating-secrets-for-a-repository):
```yaml beam-actions.yml theme={null}
name: Deploy to Beam
on:
push:
branches:
- main
- staging
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set environment variables
run: |
if [[ "${{ github.ref }}" == 'refs/heads/main' ]]; then
echo "Setting environment variables: PROD"
echo "BEAM_DEPLOY_ENV=prod" >> $GITHUB_ENV
elif [[ "${{ github.ref }}" == 'refs/heads/staging' ]]; then
echo "Setting environment variables: STAGING"
echo "BEAM_DEPLOY_ENV=staging" >> $GITHUB_ENV
fi
- name: Authenticate and deploy to Beam
env:
BEAM_TOKEN: ${{ secrets.BEAM_TOKEN }}
run: |
pip3 install --upgrade pip
pip3 install beam-client
pip3 install fastapi
echo "beam configure default --token $BEAM_TOKEN"
beam configure default --token $BEAM_TOKEN
beam deploy app.py:function
```
When you push to either `main` or `staging`, a new app will be deployed for each push:
# Cold Start Performance
Source: https://docs.beam.cloud/v2/topics/cold-start
This page covers a list of optimizations to make your containers boot up as fast as possible.
# Cold Start Optimizations
## Cache Models in Volumes
To avoid downloading your models from the internet on each request, you can cache them in Beam's Volumes.
In the example below, the models are saved to the Volume by passing the `cache_dir` argument in the Huggingface Transformers method:
```python theme={null}
from beam import Image, endpoint, Volume
# Path to cache model weights
CACHE_PATH = "./weights"
@endpoint(
volumes=[Volume(name="weights", mount_path=CACHE_PATH)],
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_version="python3.9",
python_packages=[
"transformers",
"torch",
],
),
)
def predict():
from transformers import AutoTokenizer, OPTForCausalLM
import torch
model = OPTForCausalLM.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
# Run inference
model.generate(...)
return {"text": ""}
```
Alternatively, if you want to use Transformers' pipeline abstraction, you can pass the `cache_dir` argument to the underlying models using the `model_kwargs` argument of the pipeline:
```python theme={null}
from beam import Image, endpoint, Volume
# Path to cache model weights
CACHE_PATH = "./weights"
@endpoint(
volumes=[Volume(name="weights", mount_path=CACHE_PATH)],
...
)
def predict():
from transformers import pipeline
# Load the model
generator = pipeline(
"text-generation",
model="facebook/opt-125m",
model_kwargs={"cache_dir": CACHE_PATH},
)
# Run inference
generator(...)
return {"text": ""}
```
## Load Models Using `on_start`
In addition to using a Volume, it's best-practice to ensure models are only loaded *once* when the container first starts. Beam lets you define an `on_start` function that will run exactly *once* when the container first starts:
This example combines the `on_start` functionality with the Volume caching:
```python theme={null}
from beam import Image, endpoint, Volume
# Path to cache model weights
CACHE_PATH = "./weights"
# This runs once when the container first starts
def download_models():
from transformers import AutoTokenizer, OPTForCausalLM
import torch
model = OPTForCausalLM.from_pretrained("facebook/opt-125m", cache_dir=CACHE_PATH)
return model
@endpoint(
on_start=download_models,
volumes=[Volume(name="weights", mount_path=CACHE_PATH)],
cpu=1,
memory="16Gi",
gpu="T4",
image=Image(
python_version="python3.9",
python_packages=[
"transformers",
"torch",
],
),
)
def predict(context):
# Retrieve cached model and tokenizer from on_start function
model = context.on_start_value
# Run inference
model.generate(...)
return {"text": ""}
```
## Enable Checkpoint Restore (New)
This allows you to specify a `checkpoint_enabled` flag on your decorator, which will capture a memory snapshot of the running container after `on_start` completes. This means that you can load a model onto a GPU, run some setup logic, and when the app cold starts, it will start *right from that point*.
```python theme={null}
@endpoint(
secrets=["HF_TOKEN"],
on_start=load_models,
name="meta-llama-3.1-8b-instruct",
cpu=2,
memory="16Gi",
gpu="H100",
keep_warm_seconds=30,
checkpoint_enabled=True # Add this field!
)
```
Checkpoint restore is available on these GPU types:
* RTX4090
* H100
* A10G
### Notes
* If checkpoint fails, please forward us any errors that appear in logs. It's likely the reason for failure is a missing volume -- to resolve that you need to ensure the cache path is set properly for the model.
* If checkpoint fails, the deployment will revert to standard cold boots. To try checkpointing again, you will need to redeploy.
Checkpoints can take up to 3 minutes to capture, and 5 minutes to distribute
among our servers. To properly benchmark the cold start improvement, you need
to call the app after it has been spun down for a few minutes. Otherwise it
may block as the checkpoint is syncing.
## Measuring Cold Start
We've made it easier to optimize your cold starts by adding a cold start profile to each task.
You can view the cold start profile of a task by clicking on any task in the tasks table.
This breakdown shows the entire lifecycle of your task: spinning up a container, running your `on_start` function, and running the task itself.
Here's a breakdown of a serverless cold start:
* **Container Start Time**. This is typically under 1s.
* **Image Load Time**. Pulling your container image from our image cache. This varies based on the size of your model and the dependencies you've added.
* **Application Start Time**. Running your code. This is the time running your `on_start`, and loading it on the GPU.
# Runtime Variables
Source: https://docs.beam.cloud/v2/topics/context
Accessing information about the runtime while running tasks
## Available Runtime Variables
In order to access information about the runtime while running a task, you can use the `context` value.
`context` includes important contextual information about the runtime, like the current `task_id` and `callback_url`.
| Field Name | Purpose |
| ---------------- | ------------------------------------------------------ |
| `container_id` | Unique identifier for a container |
| `stub_id` | Identifier for a stub |
| `stub_type` | Type of the stub (function, endpoint, task queue, etc) |
| `callback_url` | URL called when the task status changes |
| `task_id` | Identifier for the specific task |
| `timeout` | Maximum time allowed for the task to run (seconds) |
| `on_start_value` | Any values returned from the `on_start` function |
| `bind_port` | Port number to bind a service to |
| `python_version` | Version of Python to be used |
## Using a Runtime Variable
Any of the fields above can be accessed on the `context` variable:
```python theme={null}
from beam import task_queue
@task_queue()
def handler(context):
task_id = context.task_id
return {}
```
# Public Endpoints
Source: https://docs.beam.cloud/v2/topics/public-endpoints
Deploying public web endpoints on Beam
## Creating a Public Endpoint
By default, endpoints are private and require a bearer token to access. You can remove the authentication requirement for endpoints using the `Authorized=False` argument:
```python auth.py theme={null}
from beam import endpoint
@endpoint(authorized=False) # Disable authentication
def create_public_endpoint():
print("This API can be invoked without an auth token")
return {"success": "true"}
```
## Invoking a Public Endpoint
Public endpoints have slightly different URL schemes than private ones:
```
https://app.beam.cloud/endpoint/public/[STUB-ID]
```
```
https://app.beam.cloud/endpoint/public/4f78aaae-f35c-4eb0-9236-cdd34509bad8
```
You can find your **Stub ID** on the deployment detail page in the web dashboard.
You can view your the API URL by clicking the `Call API` button on the deployment detail page in the web dashboard.
A full request to a public endpoint might look something like this:
```bash theme={null}
curl -X POST \
--compressed 'https://app.beam.cloud/endpoint/public/4f78aaae-f35c-4eb0-9236-cdd34509bad8' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-d '{}'
```
# Send Events Between Apps
Source: https://docs.beam.cloud/v2/topics/signal
There are certain cases where you'll want to send events between different apps running on Beam.
A common scenario is if you have a model inference and retraining pipeline, where the inference app (App #1) needs to use the latest version of a trained model (App #2).
See the code for this example on Github.
## Invoking Functions in Other Apps
This example demonstrates how to invoke functions in other apps on Beam. Specifically, we cover the scenario with an inference and a retraining function.
The retraining function needs a way to tell the inference function to use the latest model.
We use an `experimental.Signal()`, which is a special type of event listener that can be triggered from the retrain function.
### App 1: Retraining App
This is the retraining app. Below, we register a `Signal` that will fire an event to our inference app, which is subscribed to this Signal event.
```python theme={null}
from beam import endpoint, experimental
@endpoint(name="trainer")
def train():
# Send a signal to another app letting it know that it needs to reload the models
s = experimental.Signal(name="reload-model")
s.set(ttl=60)
```
### App 2: Inference App
Below is the inference app, which needs to reload the `on_start` function when retraining is finished.
You'll notice that a Signal is registered with a handler that tells us which function to run when an event is fired.
```python theme={null}
from beam import endpoint, Volume, experimental, Image
VOLUME_NAME = "brand_classifier"
CACHE_PATH = f"./{VOLUME_NAME}-cache"
def load_latest_model():
# Preload and return the model and tokenizer
global model, tokenizer
print("Loading latest...")
model = lambda x: x + 1 # This is just example code
s.clear() # Clear the signal so it doesn't fire again
# Set a signal handler - when invoked, it will run the handler function
s = experimental.Signal(
name="reload-model",
handler=load_latest_model,
)
@endpoint(
name="inference",
volumes=[Volume(name=VOLUME_NAME, mount_path=CACHE_PATH)],
image=Image(python_packages=["transformers", "torch"]),
on_start=load_latest_model,
)
def predict(**inputs):
global model, tokenizer # These will have the latest values
return {"success": "true"}
```
To test this example, you can open two terminal windows:
* In window 1, serve and invoke the inference function
* In window 2, serve and invoke the retrain function
Look at the logs in window 1 -- you'll notice that the signal has fired, and `load_latest_model` ran again.
## Clearing Signals
Signals will refresh every 1 second while a container is running, until `signal.clear()` is called. It is recommended to run `signal.clear()` after each signal invovocation.
# Timeouts and Retries
Source: https://docs.beam.cloud/v2/topics/timeouts-and-retries
You can customize the default timeout and retry behavior for your tasks.
# Timeouts
### Default Timeouts
Tasks automatically timeout after 20 minutes *if they haven't started running*. This default exists to prevent stuck tasks from consuming compute resources and potentially blocking other tasks in the queue.
### Customizing Timeouts
You can specify your own timeouts. Timeouts can be used for endpoints, task queues, and functions:
```python timeout.py theme={null}
from beam import function
import time
@function(timeout=600) # Override default timeout
def timeout():
import time
# Without the timeout specified above, this function would timeout at 300s
time.sleep(350)
if __name__ == "__main__":
timeout()
```
# Retries
Beam includes retry logic, which can be customized using the parameters below.
### Max Retries
You can configure tasks to automatically retry based on a specific exception in your app.
In the example below, we'll specify `retries` and `retry_for`:
```python timeout.py theme={null}
from beam import task_queue
@task_queue(retries=2, retry_for=[Exception]) # Override default retry logic
def handler():
raise Exception("Something went wrong, retry please!")
```
### Retry for Exceptions
When the task is invoked, we'll see the exception get caught and the task automatically retry:
```sh theme={null}
Running task <87067d0e-5900-413b-a3a3-5ee4c85706ad>
Traceback (most recent call last):
File "/mnt/code/app.py", line 6, in handler
raise Exception("Something went wrong, retry please!")
Exception: Something went wrong, retry please!
retry_for error caught: Exception('Something went wrong, retry please!')
Retrying task <87067d0e-5900-413b-a3a3-5ee4c85706ad> after Exception exception
Running task <87067d0e-5900-413b-a3a3-5ee4c85706ad>
```