Large Language Models
Serve and run inference with open and custom LLMs.Hugging Face Models
A beginner’s guide to running performant inference workloads on Beam.
LLaMA 3.1 8B
Serve Meta’s LLaMA 3.1 8B model on a GPU.
Run an OpenAI-Compatible vLLM Server
Host an OpenAI-compatible inference server with vLLM.
Chat with DeepSeek R1
Run the DeepSeek R1 reasoning model.
Qwen2.5-7B with SGLang
Serve Qwen2.5-7B with the SGLang runtime.
Image and Video
Generate and transform images and video on GPUs.Serverless ComfyUI
Host ComfyUI for image generation workflows.
Text-to-Video with Mochi
Generate video from text with the Mochi model.
Stable Diffusion with LoRAs
Run Stable Diffusion with custom LoRA adapters.
Audio and Transcription
Transcribe and synthesize speech.Faster Whisper
Transcribe audio with Faster Whisper.
Parler TTS
Synthesize speech with Parler TTS.
Zonos
Generate speech with the Zonos model.
Web Apps
Host interactive apps and scrape the web.Web Scraping with Beam Functions
Build a web scraper that runs on Beam functions.
Running Streamlit Apps
Host a Streamlit app behind a public URL.
Agents
Build and coordinate AI agents.Building AI Agents
Build stateful agents with concurrency built in.
Research Assistant
A research assistant that synchronizes state across tasks.
Fine-Tuning
Fine-tune open models on GPUs.Fine-tuning Gemma with LoRA
Fine-tune Google’s Gemma model with LoRA.
Fine-Tuning Llama 3.1 8B with Unsloth
Fast fine-tuning of Llama 3.1 8B with Unsloth.