Stable Diffusion with LoRAs
Introduction
This guide demonstrates how to run Stable Diffusion with custom LoRAs.
View the Code
See the code for this example on Github.
Setup Remote Environment
The first thing we’ll do is setup an Image
with the Python packages required for this app.
Because this script will run remotely, we need to make sure our local Python interpreter doesn’t try to install these packages locally.
We’ll use the if env.is_remote()
flag to conditionally import the Python packages only when the script is running remotely on Beam.
from beam import Image, Volume, endpoint, Output, env
# This check ensures that the packages are only imported when running this script remotely on Beam
if env.is_remote():
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import os
import uuid
# The container image for the remote runtime
image = Image(
python_version="python3.9",
python_packages=[
"diffusers[torch]>=0.10",
"transformers",
"huggingface_hub",
"torch",
"peft",
"pillow",
"accelerate",
"safetensors",
"xformers",
],
)
Pre-Load Models
Next, we’ll set up a function to run once when the container first starts up. This allows us to cache the model in memory between requests and ensures we don’t unnecessarily re-load the model.
CACHE_PATH = "./models"
MODEL_URL = "https://huggingface.co/martyn/sdxl-turbo-mario-merge-top-rated/blob/main/topRatedTurboxlLCM_v10.safetensors"
LORA_WEIGHT_NAME = "raw.safetensors"
LORA_REPO = "ntc-ai/SDXL-LoRA-slider.raw"
# This function once when the container first boots
def load_models():
hf_hub_download(repo_id=LORA_REPO, filename=LORA_WEIGHT_NAME, cache_dir=CACHE_PATH)
pipe = StableDiffusionXLPipeline.from_single_file(
MODEL_URL,
torch_dtype=torch.float16,
safety_checker=None,
cache_dir=CACHE_PATH,
).to("cuda")
return pipe
Inference Function
Here’s our inference function. By adding the @endpoint()
decorator to it, we can expose this function as a RESTful API.
There are a few things to take note of:
- an
image
with the Python requirements we defined above - an
on_start
function that runs once when the container first boots. The value fromon_start
(in this case, ourpipe
handler) is available in the inference function using thecontext
value:pipe = context.on_start_value
volumes
, which are used to store the downloaded LoRAs and model weights on Beamkeep_warm_seconds
, which tells Beam how long to keep the container running between requests
@endpoint(
image=image,
on_start=load_models,
keep_warm_seconds=60,
cpu=2,
memory="32Gi",
gpu="A10G",
volumes=[Volume(name="models", mount_path=CACHE_PATH)],
)
def generate(context, prompt="medieval rich kingpin sitting in a tavern, raw"):
# Retrieve pre-loaded model from loader
pipe = context.on_start_value
pipe.enable_sequential_cpu_offload()
pipe.enable_attention_slicing("max")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
# Use a unique adapter name
adapter_name = f"raw_{uuid.uuid4().hex}"
# Load and activate the LoRA from a local path
pipe.load_lora_weights(
LORA_REPO, weight_name=LORA_WEIGHT_NAME, adapter_name=adapter_name
)
# Activate the LoRA
pipe.set_adapters(["raw"], adapter_weights=[2.0])
# Generate image
image = pipe(
prompt,
negative_prompt="nsfw",
width=512,
height=512,
guidance_scale=2,
num_inference_steps=10,
).images[0]
# Save image file
output = Output.from_pil_image(image).save()
# Retrieve pre-signed URL for output file
url = output.public_url()
return {"image": url}
Saving Image Outputs
Notice the Output.from_pil_image(image).save()
method below.
This will generate a sharable URL to access the images created from the inference function:
from beam import Output
# Save image file
output = Output.from_pil_image(image).save()
# Retrieve pre-signed URL for output file
url = output.public_url()
Create a Preview Deployment
You can spin up a temporary REST API to test this endpoint on Beam, using the beam serve
command:
beam serve app.py:generate
When you run this command, Beam will spin up a GPU-backed container to test your code on the cloud:
=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/id/bcaa198b-2556-4c8c-9429-46d3202dbc95' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
=> Watching '/Users/beta9/beam/examples/07_image_generation' for changes...
You can paste the curl
command in your shell to call the API.
The API will return a pre-signed URL with the image generated:
{"image":"https://app.beam.cloud/output/id/09cb70bf-b5e8-4679-9da2-71611a1c3b57"}
medieval rich kingpin sitting in a tavern, raw
Deploy to Production
The beam serve
command is used for temporary APIs. When you’re ready to move to production, deploy a persistent endpoint:
beam deploy app.py:generate
Was this page helpful?