Introduction

This guide demonstrates how to run Stable Diffusion with custom LoRAs.

View the Code

See the code for this example on Github.

Setup Remote Environment

The first thing we’ll do is setup an Image with the Python packages required for this app.

Because this script will run remotely, we need to make sure our local Python interpreter doesn’t try to install these packages locally.

We’ll use the if env.is_remote() flag to conditionally import the Python packages only when the script is running remotely on Beam.

app.py
from beam import Image, Volume, endpoint, Output, env


# This check ensures that the packages are only imported when running this script remotely on Beam
if env.is_remote():
    from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler
    import torch
    from huggingface_hub import hf_hub_download
    from safetensors.torch import load_file
    import os
    import uuid


# The container image for the remote runtime
image = (
    Image(python_version="python3.9")
    .add_python_packages(
        [
            "diffusers[torch]>=0.10",
            "transformers",
            "huggingface_hub",
            "huggingface_hub[hf-transfer]",
            "torch",
            "peft",
            "pillow",
            "accelerate",
            "safetensors",
            "xformers",
        ]
    )
    .with_envs("HF_HUB_ENABLE_HF_TRANSFER=1")
)

Pre-Load Models

Next, we’ll set up a function to run once when the container first starts up. This allows us to cache the model in memory between requests and ensures we don’t unnecessarily re-load the model.

app.py
CACHE_PATH = "./models"
MODEL_URL = "https://huggingface.co/martyn/sdxl-turbo-mario-merge-top-rated/blob/main/topRatedTurboxlLCM_v10.safetensors"

LORA_WEIGHT_NAME = "raw.safetensors"
LORA_REPO = "ntc-ai/SDXL-LoRA-slider.raw"


# This function once when the container first boots
def load_models():

    hf_hub_download(repo_id=LORA_REPO, filename=LORA_WEIGHT_NAME, cache_dir=CACHE_PATH)

    pipe = StableDiffusionXLPipeline.from_single_file(
        MODEL_URL,
        torch_dtype=torch.float16,
        safety_checker=None,
        cache_dir=CACHE_PATH,
    ).to("cuda")

    return pipe

Inference Function

Here’s our inference function. By adding the @endpoint() decorator to it, we can expose this function as a RESTful API.

There are a few things to take note of:

  • an image with the Python requirements we defined above
  • an on_start function that runs once when the container first boots. The value from on_start (in this case, our pipe handler) is available in the inference function using the context value: pipe = context.on_start_value
  • volumes, which are used to store the downloaded LoRAs and model weights on Beam
  • keep_warm_seconds, which tells Beam how long to keep the container running between requests
app.py
@endpoint(
    image=image,
    on_start=load_models,
    keep_warm_seconds=60,
    cpu=2,
    memory="32Gi",
    gpu="A10G",
    volumes=[Volume(name="models", mount_path=CACHE_PATH)],
)
def generate(context, prompt="medieval rich kingpin sitting in a tavern, raw"):
    # Retrieve pre-loaded model from loader
    pipe = context.on_start_value

    pipe.enable_sequential_cpu_offload()
    pipe.enable_attention_slicing("max")

    pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

    # Use a unique adapter name
    adapter_name = f"raw_{uuid.uuid4().hex}"

    # Load and activate the LoRA from a local path
    pipe.load_lora_weights(
        LORA_REPO, weight_name=LORA_WEIGHT_NAME, adapter_name=adapter_name
    )

    # Activate the LoRA
    pipe.set_adapters(["raw"], adapter_weights=[2.0])

    # Generate image
    image = pipe(
        prompt,
        negative_prompt="nsfw",
        width=512,
        height=512,
        guidance_scale=2,
        num_inference_steps=10,
    ).images[0]

    # Save image file
    output = Output.from_pil_image(image).save()

    # Retrieve pre-signed URL for output file
    url = output.public_url()

    return {"image": url}

Saving Image Outputs

Notice the Output.from_pil_image(image).save() method below.

This will generate a sharable URL to access the images created from the inference function:

app.py
from beam import Output

# Save image file
output = Output.from_pil_image(image).save()

# Retrieve pre-signed URL for output file
url = output.public_url()

Create a Preview Deployment

You can spin up a temporary REST API to test this endpoint on Beam, using the beam serve command:

beam serve app.py:generate

When you run this command, Beam will spin up a GPU-backed container to test your code on the cloud:

=> Building image
=> Using cached image
=> Syncing files
Reading .beamignore file
=> Files synced
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/id/bcaa198b-2556-4c8c-9429-46d3202dbc95' \
-H 'Connection: keep-alive' \
-H 'Authorization: Bearer [YOUR_AUTH_TOKEN]' \
-H 'Content-Type: application/json' \
-d '{}'
=> Watching '/Users/beta9/beam/examples/07_image_generation' for changes...

You can paste the curl command in your shell to call the API.

The API will return a pre-signed URL with the image generated:

{"image":"https://app.beam.cloud/output/id/09cb70bf-b5e8-4679-9da2-71611a1c3b57"}

medieval rich kingpin sitting in a tavern, raw

Deploy to Production

The beam serve command is used for temporary APIs. When you’re ready to move to production, deploy a persistent endpoint:

beam deploy app.py:generate