Audio & Music

BonzAI provides text-to-speech and music generation pipelines, all running locally.

Text-to-Speech

Turbo TTS (Kokoro)

Fast, high-quality speech synthesis. Best for real-time companion voice responses.

POST http://localhost:65000/audio/turbo

Quality TTS (Qwen3-TTS)

Persona-based speech synthesis with fine-grained control over voice characteristics. Higher quality, slower generation.

POST http://localhost:65000/audio/quality/persona

Music Generation (ACE-Step)

Generate original music with lyrics using the ACE-Step model.

POST http://localhost:65000/audio/music

Companion Voice Pipeline

In the roleplay system, companions use a multi-modal pipeline:

  1. Text response generated by the selected LLM

  2. TTS converts the response to speech (Qwen3-TTS for quality, Kokoro for speed)

  3. Audio plays back in the chat interface

Each companion can have a distinct voice persona based on their personality and gender.

Minting Audio NFTs

Generated audio can be minted on Base or LUKSO. Requires LVL2 (5,000 BONZAI held).

Last updated