Video Generation

BonzAI generates videos locally using the LTX-2 model (GGUF quantized, ~10 GB).

Endpoint

POST http://localhost:65000/video

Modes

  • Text-to-Video: Generate a video from a text prompt

  • Image-to-Video: Animate a static image into a video clip

Request Format

{
  "prompt": "A cat walking in a garden, cinematic lighting",
  "height": 480,
  "width": 768,
  "num_frames": 81,
  "fps": 24,
  "num_inference_steps": 30,
  "guidance_scale": 3.5,
  "image_base64": "..."
}
Parameter
Default
Description

prompt

Text description of the video

height

480

Video height in pixels

width

768

Video width in pixels

num_frames

81

Number of frames to generate

fps

24

Frames per second

num_inference_steps

30

Denoising steps (higher = better quality, slower)

guidance_scale

3.5

How closely to follow the prompt

image_base64

Optional base64 image for image-to-video mode

Companion Video Pipeline

In the roleplay system, companions can generate contextual videos:

  1. LLM generates the scene description

  2. Optionally, an image is generated first as a keyframe

  3. LTX-2 generates the video, conditioned on audio if available

Minting Video NFTs

Generated videos can be minted on Base. Requires LVL4 (25,000 BONZAI held).

Last updated