Audio & Music
BonzAI provides text-to-speech and music generation pipelines, all running locally.
Turbo TTS (Kokoro)
Fast, high-quality speech synthesis. Best for real-time companion voice responses.
POST http://localhost:65000/audio/turbo
Quality TTS (Qwen3-TTS)
Persona-based speech synthesis with fine-grained control over voice characteristics. Higher quality, slower generation.
POST http://localhost:65000/audio/quality/persona
Music Generation (ACE-Step)
Generate original music with lyrics using the ACE-Step model.
POST http://localhost:65000/audio/music
Companion Voice Pipeline
In the roleplay system, companions use a multi-modal pipeline:
Text response generated by the selected LLM
TTS converts the response to speech (Qwen3-TTS for quality, Kokoro for speed)
Audio plays back in the chat interface
Each companion can have a distinct voice persona based on their personality and gender.
Minting Audio NFTs
Generated audio can be minted on Base or LUKSO. Requires LVL2 (5,000 BONZAI held).
Last updated