Hosting a Web Server

With Beam, you can deploy web servers that use the ASGI protocol. This means that you can deploy applications built with popular frameworks like FastAPI and Django.

Multiple Endpoints Per App

In the example below, we are deploying a FastAPI web server that uses the Huggingface Transformers library to perform sentiment analysis and text generation. We also included a warmup endpoint so that we can preemptively get our container ready for incoming requests.

This example uses Pydantic to serialize request inputs. You can read more about it here.

app.py

from beam import Image, asgi
from pydantic import BaseModel


# Request payload for API, declared with Pydantic
class GenerateInput(BaseModel):
    text: str
    max_length: int


class SentimentInput(BaseModel):
    text: str


def init_models():
    from transformers import pipeline

    model = "gpt2"

    # Initialize two simple models
    sentiment_analyzer = pipeline("sentiment-analysis")
    text_generator = pipeline("text-generation", model="gpt2")

    return sentiment_analyzer, text_generator, model


@asgi(
    name="sentiment-and-generation",
    image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
    on_start=init_models,
    memory=2048,
)
def handler(context):
    import asyncio

    from fastapi import FastAPI, Query

    app = FastAPI()

    sentiment_analyzer, text_generator, generate_model = context.on_start_value

    @app.post("/sentiment")
    async def analyze_sentiment(input: SentimentInput):
        # Unpack request input and send to ML model
        result = sentiment_analyzer(input.text)
        return result

    @app.post("/generate")
    async def generate_text(input: GenerateInput):
        result = text_generator(input.text, max_length=input.max_length)
        return result

    @app.post("/warmup")
    async def warmup():
        return {"status": "warm"}

    return app

As shown above, the handler function for the web server must return the ASGI app object.

Launch a Preview Environment (Optional)

Just like an endpoint, you can prototype your web server using beam serve. This command will monitor changes in your local file system, live-reload the remote environment as you work, and forward remote container logs to your local shell.

beam serve app.py:web_server

Serve sessions end automatically after 10 minutes of inactivity. The entire duration of the session is counted towards billable usage, even if the session is not receiving requests.

Deploying the Web Server

When you are ready to deploy your web server, run the following command:

beam deploy app.py:web_server

You’ll see some logs in the console that show the progress of your deployment.

=> Building image
=> Syncing files
...
=> Invocation details
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{}'

The container handling the app will spin down after 180 seconds of inactivity by default, or customized with the keep_warm_seconds parameter. The container will be billed for the time it is active and handling requests.

Sending Requests

If we wanted to perform sentiment analysis using our deployed example from above, we would send a POST request like this:

curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/generate' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-d '{"text": "The meaning of life is "}'

Concurrent Requests

When building an ASGI app, you can specify the number of concurrent requests your app can handle using the concurrent_requests parameter in the @asgi decorator.

@asgi(
    name="sentiment-and-generation",
    image=Image(python_packages=["transformers", "torch", "fastapi", "pydantic"]),
    on_start=init_models,
    memory=1024,
    concurrent_requests=10
)

This allows you to increase the number of requests your app can handle at once, which can help you achieve higher throughput. For instance, if your app is doing I/O-bound work, additional requests can be handled while your I/O operations complete in the background. We can simulate this by adding a model endpoint that pretends to do some expensive I/O to our example from above.

@app.get("/model")
async def model(model: str = Query(...)):
    # Pretend we're doing expensive I/O here to demonstrate the value of concurrent requests
    await asyncio.sleep(10)
    return {"model": model}

Now, if you send a request to model and then send another request to generate, you will see that the second request will complete before the first.

curl -X GET 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/model?model=gpt2' \
-H 'Connection: keep-alive' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \

Response Types

Beam supports various response types, including any FastAPI response type. You can find a list of FastAPI response types here.

Uploading Local Files

If your web server needs access to local files like model weights or other resources, you can use Beam volumes. To add files to a volume, you can use the beam cp command.

beam cp [local-file] beam://[volume-name]

Then, you can define a volume and pass it into your @asgi decorator like this:

from beam import asgi, Volume, Image

@asgi(
    name="sentinent-analysis",
    image=Image(python_packages=["fastapi"]),
    volumes=[Volume(name="model-weights", mount_path="./model_weights")],
)
def web_server():
    from fastapi import FastAPI

    app = FastAPI()

    @app.get("/")
    async def root():
        with open("./model_weights/somefile.txt", "r") as f:
            return {"message": f.read()}

    return app

Start Here

Sandboxes

Endpoints and Web Servers

One-Off Functions

Running Arbitrary Images

Task Queues

Customizing Container Images

Managing Data

Autoscaling and Concurrency

Advanced Topics

Self-Hosting

Resources

Security

Hosting a Web Server

Multiple Endpoints Per App

Launch a Preview Environment (Optional)

Deploying the Web Server

Sending Requests

Concurrent Requests

Response Types

Uploading Local Files

Start Here

Sandboxes

Endpoints and Web Servers

One-Off Functions

Running Arbitrary Images

Task Queues

Customizing Container Images

Managing Data

Autoscaling and Concurrency

Advanced Topics

Self-Hosting

Resources

Security

​Multiple Endpoints Per App

​Launch a Preview Environment (Optional)

​Deploying the Web Server

​Sending Requests

​Concurrent Requests

​Response Types

​Uploading Local Files

Multiple Endpoints Per App

Launch a Preview Environment (Optional)

Deploying the Web Server

Sending Requests

Concurrent Requests

Response Types

Uploading Local Files