This guide demonstrates how to run the Meta Llama 3.1 8B Instruct model on Beam.
You need an access token from Huggingface to run this example. You can sign up
for Huggingface and access your token on the settings
page, and store it in the Beam
Secrets Manager.
The first thing we’ll do is set up an Image with the Python packages required for this app.
We use the if env.is_remote() flag to conditionally import the Python packages only when the script is running remotely on Beam.
Copy
Ask AI
from beam import endpoint, Image, Volume, env# This ensures that these packages are only loaded when the script is running remotely on Beamif env.is_remote(): import torch from transformers import AutoModelForCausalLM, AutoTokenizer# Model parametersMODEL_NAME = "meta-llama/Meta-Llama-3.1-8B-Instruct"MAX_LENGTH = 512TEMPERATURE = 0.7TOP_P = 0.9TOP_K = 50REPETITION_PENALTY = 1.05NO_REPEAT_NGRAM_SIZE = 2DO_SAMPLE = TrueNUM_BEAMS = 1EARLY_STOPPING = TrueBEAM_VOLUME_PATH = "./cached_models"# This runs once when the container first startsdef load_models(): tokenizer = AutoTokenizer.from_pretrained( MODEL_NAME, cache_dir=BEAM_VOLUME_PATH, padding_side='left' ) tokenizer.pad_token = tokenizer.eos_token model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, device_map="auto", torch_dtype=torch.float16, cache_dir=BEAM_VOLUME_PATH, use_cache=True, low_cpu_mem_usage=True ) model.eval() return model, tokenizer