With Beam, you can deploy web servers that use the ASGI protocol. This means that you can deploy applications built with popular frameworks like FastAPI and Django.

Multiple Endpoints Per App

In the example below, we are deploying a FastAPI web server that uses the Huggingface Transformers library to perform sentiment analysis and text generation.

We also included a warmup endpoint so that we can preemptively get our container ready for incoming requests.

This example uses Pydantic to serialize request inputs. You can read more about it here.

app.py
from beam import Image, asgi
from pydantic import BaseModel


# Request payload for API, declared with Pydantic
class GenerateInput(BaseModel):
    text: str
    max_length: int


class SentimentInput(BaseModel):
    text: str


def init_models():
    from transformers import pipeline

    model = "gpt2"

    # Initialize two simple models
    sentiment_analyzer = pipeline("sentiment-analysis")
    text_generator = pipeline("text-generation", model="gpt2")

    return sentiment_analyzer, text_generator, model


@asgi(
    name="sentiment-and-generation",
    image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
    on_start=init_models,
    memory=2048,
)
def handler(context):
    import asyncio

    from fastapi import FastAPI, Query

    app = FastAPI()

    sentiment_analyzer, text_generator, generate_model = context.on_start_value

    @app.post("/sentiment")
    async def analyze_sentiment(input: SentimentInput):
        # Unpack request input and send to ML model
        result = sentiment_analyzer(input.text)
        return result

    @app.post("/generate")
    async def generate_text(input: GenerateInput):
        result = text_generator(input.text, max_length=input.max_length)
        return result

    @app.post("/warmup")
    async def warmup():
        return {"status": "warm"}

    return app

Launch a Preview Environment

Just like an endpoint, you can prototype your web server using beam serve. This command will monitor changes in your local file system, live-reload the remote environment as you work, and forward remote container logs to your local shell.

beam serve app.py:web_server

Deploying the Web Server

When you are ready to deploy your web server, run the following command:

beam deploy app.py:web_server

You’ll see some logs in the console that show the progress of your deployment.

=> Building image
=> Syncing files
...
=> Invocation details
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{}'

Sending Requests

If we wanted to perform sentiment analysis using our deployed example from above, we would send a POST request like this:

curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/generate' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{"text": "The meaning of life is "}'

Concurrent Requests

When building an ASGI app, you can specify the number of concurrent requests your app can handle using the concurrent_requests parameter in the @asgi decorator.

@asgi(
    name="sentiment-and-generation",
    image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
    on_start=init_models,
    memory=1024,
    concurrent_requests=10
)

This allows you to increase the number of requests your app can handle at once, which can help you achieve higher throughput. For instance, if your app is doing I/O-bound work, additional requests can be handled while your I/O operations complete in the background.

We can simulate this by adding a model endpoint that pretends to do some expensive I/O to our example from above.

    @app.get("/model")
    async def model(model: str = Query(...)):
        # Pretend we're doing expensive I/O here to demonstrate the value of concurrent requests
        await asyncio.sleep(10)
        return {"model": model}

Now, if you send a request to model and then send another request to generate, you will see that the second request will complete before the first.

Response Types

Beam supports various response types, including any FastAPI response type. You can find a list of FastAPI response types here.

Uploading Local Files

If your web server needs access to local files like model weights or other resources, you can use Beam volumes.

To add files to a volume, you can use the beam cp command.

beam cp [local-file] beam://[volume-name]

Then, you can define a volume and pass it into your @asgi decorator like this:

from beam import asgi, Volume, Image

@asgi(
    name="sentinent-analysis",
    image=Image(python_packages=["fastapi"]),
    volumes=[Volume(name="model-weights", mount_path="./model_weights")],
)
def web_server():
    from fastapi import FastAPI

    app = FastAPI()

    @app.get("/")
    async def root():
        with open("./model_weights/somefile.txt", "r") as f:
            return {"message": f.read()}

    return app