Beam allows you to create highly-available storage volumes that can be used across tasks. You might use volumes for things like storing model weights or large datasets.
Beam Volumes are mounted directly to the containers that run your code, so they are more performant than using cloud object storage.
We strongly recommend storing your data in Beam Volumes for any data you plan to access from your Beam functions.
How to Write Files in Beam Containers
There are two use-cases for saving files: persistent files, that you want to access between tasks, and temporary files that will be deleted when your container spins down.
- Persisting Files: write to a volume.
- Temporary Files: temporary files can be written to the
/tmp
directory in your Beam container, for example you could save an image to /tmp/myimage.png
.
Reading and Writing to Volumes
You can read and write to your Volume like any ordinary Python file:
from beam import function, Volume
VOLUME_PATH = "./model_weights"
@function(
volumes=[Volume(name="model-weights", mount_path=VOLUME_PATH)],
)
def access_files():
# Write files to a volume
with open(f"{VOLUME_PATH}/somefile.txt", "w") as f:
f.write("This is being written to a file in the volume")
# Read files from a volume
with open(f"{VOLUME_PATH}/somefile.txt", "r") as f:
print(f.readlines())
if __name__ == "__main__":
access_files()
It can take up to 60 seconds for any files written to a distributed volume to become available to other containers.
To run this code, run python [filename].py
. You’ll see it print the text we just wrote to the file.
(.venv) $ python reading_and_writing_data.py
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
Collecting files from /Users/beta9/beam/examples/06_volume
=> Files synced
=> Running function: <reading_and_writing_data:access_files>
['This is being written to a file in the volume']
=> Function complete <e1526222-f665-47a5-9377-6f9036de3951>
Creating a Volume
Volumes can be attached anything you run on Beam.
By default, Volumes are shared across all apps in your Beam account.
from beam import function, Volume
VOLUME_PATH = "./model_weights"
@function(
volumes=[Volume(name="model-weights", mount_path=VOLUME_PATH)],
)
def load_model():
from transformers import AutoModel
# Load model from cloud storage cache
AutoModel.from_pretrained(VOLUME_PATH)
If you add a volume to your app, it will be created automatically. You can also create volumes manually in the CLI, by using:
$ beam volume create my-volume
Name Created At Updated At Workspace Name
───────────────────────────────────────────────────────
my-volume just now just now f6fa28
Uploading Data
You can upload files with the CLI using the beam cp
command.
beam cp [local-file] beam://[volume-name]
Files
beam cp file.txt beam://myvol/ # ./file.txt => beam://myvol/file.txt
beam cp file.txt beam://myvol/file.txt # ./file.txt => beam://myvol/file.txt
beam cp file.txt beam://myvol/file.new # ./file.txt => beam://myvol/file.new
beam cp file.txt beam://myvol/hello # ./file.txt => beam://myvol/hello.txt (keeps the extension)
Directories
beam cp mydir beam://myvol # ./mydir/file.txt => beam://myvol/file.txt
beam cp mydir beam://myvol/mydir # ./mydir/file.txt => beam://myvol/mydir/file.txt
beam cp mydir beam://myvol/newdir # ./mydir/file.txt => beam://myvol/newdir/file.txt
Downloading Data
Files
beam cp beam://myvol/file.txt . # beam://myvol/file.txt => ./file.txt
beam cp beam://myvol/file.txt file.new # beam://myvol/file.txt => ./file.new
Directories
beam cp beam://myvol/mydir . # beam://myvol/mydir/file.txt => ./file.txt
CLI Management Commands
Create a Volume
beam volume create [VOLUME-NAME]
$ beam volume create weights
Name Created At Updated At Workspace Name
───────────────────────────────────────────────────────
weights May 07 2024 May 07 2024 cf2db0
Delete a Volume
beam volume delete [VOLUME-NAME]
$ beam volume delete model-weights
Any apps (functions, endpoints, taskqueue, etc) that
refer to this volume should be updated before it is deleted.
Are you sure? (y/n) [n]: y
Deleted volume model-weights
List Volumes
$ beam volume list
Name Size Created At Updated At Workspace Name
─────────────────────────────────────────────────────────────────────────────────────
weights 240.23 MiB 2 days ago 2 days ago cf2db0
1 volumes | 240.23 MiB used
List Volume Contents
$ beam ls weights
Name Size Modified Time IsDir
──────────────────────────────────────────────────────────────────
.locks 0.00 B 29 minutes ago Yes
models--facebook--opt-125m 240.23 MiB 28 minutes ago Yes
2 items | 240.23 MiB used
Copy Files to Volumes
beam cp [LOCAL-PATH] beam://[VOLUME-NAME]
$ beam cp my-file beam://my-volume
[beam://my-volume/my-file] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MiB 1.29 MiB/s 0:00:07
Move Files in Volumes
$ beam mv file.txt files/text-files
Moved file.txt to files/text-files/file.txt
Remove Files from Volumes
=> weights/app.py (1 object deleted)
app.py