Running Tasks on GPU

You can run any code on a cloud GPU by passing a gpu argument in your function decorator.

from beam import endpoint


@endpoint(gpu="A100-40")
def handler():
    print("This is running on a GPU now")
    return {}

Available GPUs

Currently available GPU options are:

  • A10G (24Gi)
  • T4 (16Gi)
  • A100-40 (40Gi)
  • H100 (80Gi)
  • A6000 (40Gi)

Looking for a specific GPU that isn’t listed here? Let us know!

Configuring CPU and Memory

In addition to choosing a GPU, you’ll be prompted to choose an amount of CPU and Memory to allocate for your functions.

GPU graphics cards have VRAM and run on servers with RAM.

RAM vs. VRAM

VRAM is the amount of memory available on the GPU device. For example, if you are running inference on a 13B parameter LLM, you’ll usually need at least 40Gi of VRAM in order for the model to be loaded onto the GPU.

In contrast, RAM is responsible for the amount of data that can be stored and accessed by the CPU on the server. For example, if you try downloading a 20Gi file, you’ll need sufficient disk space and RAM.

In the context of LLMs, here are some approximate guidelines for resources to use in your apps:

LLM ParametersRecommended CPURecommended Memory (RAM)Recommended GPU
0-7B232GiA10G (24Gi VRAM)
7-14B432GiA100-40 (40Gi VRAM)
14B+432GiH100 (80Gi VRAM)

GPU Regions

Beam runs on servers distributed around the world, with primary locations in the United States and Europe. If you would like your workloads to run in a specific region of the globe, please reach out.