cache_dir
argument in the Huggingface Transformers method:
cache_dir
argument to the underlying models using the model_kwargs
argument of the pipeline:
on_start
on_start
function that will run exactly once when the container first starts:
This example combines the on_start
functionality with the Volume caching:
checkpoint_enabled
flag on your decorator, which will capture a memory snapshot of the running container after on_start
completes. This means that you can load a model onto a GPU, run some setup logic, and when the app cold starts, it will start right from that point.
on_start
function, and running the task itself.
Here’s a breakdown of a serverless cold start:
on_start
, and loading it on the GPU.