A beginner’s guide to running highly performant inference workloads on Beam.
endpoint
decorator with an Image
Endpoint
is the wrapper for your inference function.endpoint
is an Image
. The Image
defines the image your container will run on.commands
argument. Read more
about custom images.env.is_remote()
to conditionally import packages only when inside the remote cloud environment.
facebook/opt-125m
via Huggingface Transformers.
Since we’ll deploy this as a REST API, we add a endpoint()
decorator above the inference function:
.beamignore
.beam serve app.py:predict
. This will:
beam serve
, you’ll notice the server reloading with your code changes.
You’ll use this workflow anytime you’re developing an app on Beam. Trust us — it makes the development process uniquely fast and painless.
on_start
method, which you can pass to your function decorators. on_start
is run exactly once when the container first starts:
The value of the on_start
function can be retrieved from context.on_start_value
:
on_start
method us from having to download the model multiple times, but we can avoid downloading the model entirely by caching it in a Storage Volume:
Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
cache_dir
argument in transformers:
QueueDepthAutoscaler
.
QueueDepthAutoscaler
takes two parameters:
max_containers
tasks_per_container