With Beam, you can deploy web servers that use the ASGI protocol. This means that you can deploy applications built with popular frameworks like FastAPI and Django.
In the example below, we are deploying a FastAPI web server that uses the Huggingface Transformers library to perform sentiment analysis and text generation.We also included a warmup endpoint so that we can preemptively get our container ready for incoming requests.
Just like an endpoint, you can prototype your web server using beam serve. This command will monitor changes in your local file system, live-reload the remote environment as you work, and forward remote container logs to your local shell.
Copy
Ask AI
beam serve app.py:web_server
Serve sessions end automatically after 10 minutes of inactivity. The entire
duration of the session is counted towards billable usage, even if the session
is not receiving requests.
The container handling the app will spin down after 180 seconds of inactivity
by default, or customized with the keep_warm_seconds parameter. The
container will be billed for the time it is active and handling requests.
If we wanted to perform sentiment analysis using our deployed example from above, we would send a POST request like this:
Copy
Ask AI
curl -X POST 'https://sentiment-and-generation-53b4230-v1.app.beam.cloud/generate' \-H 'Connection: keep-alive' \-H 'Content-Type: application/json' \-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \-d '{"text": "The meaning of life is "}'
When building an ASGI app, you can specify the number of concurrent requests your app can handle using the concurrent_requests parameter in the @asgi decorator.
This allows you to increase the number of requests your app can handle at once, which can help you achieve higher throughput. For instance, if your app is doing I/O-bound work, additional requests can be handled while your I/O operations complete in the background.We can simulate this by adding a model endpoint that pretends to do some expensive I/O to our example from above.
Copy
Ask AI
@app.get("/model")async def model(model: str = Query(...)): # Pretend we're doing expensive I/O here to demonstrate the value of concurrent requests await asyncio.sleep(10) return {"model": model}
Now, if you send a request to model and then send another request to generate, you will see that the second request will complete before the first.
If your web server needs access to local files like model weights or other resources, you can use Beam volumes.To add files to a volume, you can use the beam cp command.
Copy
Ask AI
beam cp [local-file] beam://[volume-name]
Then, you can define a volume and pass it into your @asgi decorator like this:
Copy
Ask AI
from beam import asgi, Volume, Image@asgi( name="sentiment-analysis", image=Image(python_packages=["fastapi"]), volumes=[Volume(name="model-weights", mount_path="./model_weights")],)def web_server(): from fastapi import FastAPI app = FastAPI() @app.get("/") async def root(): with open("./model_weights/somefile.txt", "r") as f: return {"message": f.read()} return app