# Audio & Music

BonzAI provides text-to-speech and music generation pipelines, all running locally.

## Text-to-Speech

### Turbo TTS (Kokoro)

Fast, high-quality speech synthesis. Best for real-time companion voice responses.

```
POST http://localhost:65000/audio/turbo
```

### Quality TTS (Qwen3-TTS)

Persona-based speech synthesis with fine-grained control over voice characteristics. Higher quality, slower generation.

```
POST http://localhost:65000/audio/quality/persona
```

## Music Generation (ACE-Step)

Generate original music with lyrics using the ACE-Step model.

```
POST http://localhost:65000/audio/music
```

## Companion Voice Pipeline

In the roleplay system, companions use a multi-modal pipeline:

1. **Text response** generated by the selected LLM
2. **TTS** converts the response to speech (Qwen3-TTS for quality, Kokoro for speed)
3. Audio plays back in the chat interface

Each companion can have a distinct voice persona based on their personality and gender.

## Minting Audio NFTs

Generated audio can be minted on Base or LUKSO. Requires **LVL2** (5,000 BONZAI held).
