> ## Documentation Index
> Fetch the complete documentation index at: https://docs.beam.cloud/llms.txt
> Use this file to discover all available pages before exploring further.

# Zonos

This guide demonstrates how to deploy a Text-to-Speech (TTS) API using the [Zonos model](https://github.com/Zyphra/Zonos) from Zyphra. The API converts input text into spoken audio, leveraging a pre-trained transformer model and speaker embeddings derived from an example audio file. We use Beam’s infrastructure for compute and file output handling.

<Card title="View the Code" icon="github" href="https://github.com/beam-cloud/examples/tree/main/audio_and_transcription/zonos">
  See the full code for this example on GitHub.
</Card>

## Setup

### Environment Configuration

First, create a file named `app.py`:

```python theme={null}
from beam import Image, endpoint, Output, env

if env.is_remote():
    import torchaudio
    from zonos.model import Zonos
    from zonos.conditioning import make_cond_dict
    from zonos.utils import DEFAULT_DEVICE as device
    import os
    import uuid

# Custom image configuration
image = (
    Image(
        base_image="nvidia/cuda:12.4.1-devel-ubuntu22.04",
        python_version="python3.11"
    )
    .add_commands(["apt update && apt install -y espeak-ng git"])
    .add_commands([
        "pip install -U uv",
        "git clone https://github.com/Zyphra/Zonos.git /tmp/Zonos",
        "cd /tmp/Zonos && pip install setuptools wheel && pip install -e .",
    ])
)

@endpoint(
    name="zonos-tts",
    image=image,
    cpu=12,
    memory="32Gi",
    gpu="H100",
    timeout=-1
)
def generate(**inputs):
    text = inputs.get("text")

    if not text:
        return {"error": "Please provide a text"}

    os.chdir("/tmp/Zonos")

    model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device=device)

    wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3")
    speaker = model.make_speaker_embedding(wav, sampling_rate)

    cond_dict = make_cond_dict(text=text, speaker=speaker, language="en-us")
    conditioning = model.prepare_conditioning(cond_dict)

    codes = model.generate(conditioning)

    # Save generated audio
    file_name = f"/tmp/zonos_out_{uuid.uuid4()}.wav"
    wavs = model.autoencoder.decode(codes).cpu()
    torchaudio.save(file_name, wavs[0], model.autoencoder.sampling_rate)

    # Upload and get public URL
    output_file = Output(path=file_name)
    output_file.save()
    public_url = output_file.public_url(expires=1200000000)

    return {"output_url": public_url}

if __name__ == "__main__":
    generate()
```

## Deployment

Run this command to deploy the endpoint:

```bash theme={null}
beam deploy app.py:generate
```

It will return a URL with the endpoint:

```bash theme={null}
=> Building image
=> Syncing files
=> Deploying
=> Deployed 🎉
=> Invocation details
curl -X POST 'https://app.beam.cloud/endpoint/zonos-tts/v1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{"text": "On Beam run AI workloads anywhere with zero complexity."}'
```

## API Usage

The deployed endpoint accepts POST requests with a JSON payload containing the text to convert to speech.

### Request Format

```json theme={null}
{
  "text": "Your text to convert to speech"
}
```

### Example Request

```bash theme={null}
curl -X POST 'https://app.beam.cloud/endpoint/zonos-tts/v1' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer {YOUR_AUTH_TOKEN}' \
-d '{"text": "On Beam run AI workloads anywhere with zero complexity. One line of Python, global GPUs, full control"}'
```

### Example Response

The API returns a JSON object with a URL to the generated audio file:

```json theme={null}
{
  "output_url": "https://app.beam.cloud/output/id/704defd0-9370-4499-9124-677925e64961"
}
```
