GPU Acceleration
Running Tasks on GPU
You can run any code on a cloud GPU by passing a gpu
argument in your function decorator.
Available GPUs
Currently available GPU options are:
A10G
(24Gi)T4
(16Gi)A100-40
(40Gi)H100
(80Gi)A6000
(40Gi)RTX4090
(24Gi)
Check GPU Availability
Run beam machine list
to check whether a machine is available.
Looking for a specific GPU that isn’t listed here? Let us know!
Prioritizing GPU Types
You can split traffic across multiple GPUs by passing a list to the gpu
parameter.
The list is ordered by priority. You can choose which GPUs to prioritize by specifying them at the front of the list.
In this example, the T4
is prioritized over the A10G
, followed by the A100-40
.
Configuring CPU and Memory
In addition to choosing a GPU, you’ll be prompted to choose an amount of CPU and Memory to allocate for your functions.
GPU graphics cards have VRAM and run on servers with RAM.
RAM vs. VRAM
VRAM is the amount of memory available on the GPU device. For example, if you are running inference on a 13B parameter LLM, you’ll usually need at least 40Gi of VRAM in order for the model to be loaded onto the GPU.
In contrast, RAM is responsible for the amount of data that can be stored and accessed by the CPU on the server. For example, if you try downloading a 20Gi file, you’ll need sufficient disk space and RAM.
In the context of LLMs, here are some approximate guidelines for resources to use in your apps:
LLM Parameters | Recommended CPU | Recommended Memory (RAM) | Recommended GPU |
---|---|---|---|
0-7B | 2 | 32Gi | A10G (24Gi VRAM) |
7-14B | 4 | 32Gi | A100-40 (40Gi VRAM) |
14B+ | 4 | 32Gi | H100 (80Gi VRAM) |
Monitoring Resource Usage
In the web dashboard, you can monitor the amount of CPU, Memory, and GPU memory used for your tasks.
On a deployment, click the Metrics
button.
On this page, you can see the resource usage over time. The graph will also show the periods when your resource usage exceeded the resource limits set on your app:
GPU Regions
Beam runs on servers distributed around the world, with primary locations in the United States, Europe, and Asia. If you would like your workloads to run in a specific region of the globe, please reach out.
Was this page helpful?