Text Generation (LLM)

BonzAI ships with 21 large language models accessible through the GenerativeText interface. All models run locally via OpenClaw on port 3002 using an OpenAI-compatible API.

Available Models

Small (2-4 GB VRAM)

Model
Parameters
Size
Best For

Gemma 3 1B

1B

0.8 GB

Quick responses, low-resource machines

DeepSeek R1 1.5B

1.5B

1.1 GB

Reasoning on constrained hardware

Qwen 2.5 3B

3B

2 GB

Balanced quality/speed

Phi-3 Mini

3.8B

2.4 GB

General purpose

Phi-4 Mini

3.8B

2.5 GB

Improved Phi-3 successor

Medium (6-10 GB VRAM)

Model
Parameters
Size
Best For

LLaMA 2 7B

7B

4 GB

General chat

Mistral 7B

7B

4.4 GB

Instruction following

DeepSeek R1 7B

7B

4.7 GB

Chain-of-thought reasoning

GLM-4 7B Flash

7B

4.7 GB

Fast bilingual (EN/CN)

Qwen Coder 7B

7B

4.7 GB

Code generation

LLaMA 3.1 8B

8B

5 GB

Latest Meta model

Yi Coder 9B

9B

5.5 GB

Code + general

MythoMax 13B

13B

7.9 GB

Creative writing, roleplay

Hermes 4 14B

14B

9 GB

Tool use, function calling

Large (12-24 GB VRAM)

Model
Parameters
Size
Best For

Qwen Coder 14B

14B

9 GB

Advanced code generation

GPT-OSS 20B

20B

12 GB

Open-source GPT alternative

Codestral 22B

22B

13 GB

Production-grade coding

Qwen3 Coder 30B

30B

18 GB

Top-tier code + reasoning

DeepSeek R1 32B

32B

19 GB

Advanced reasoning

Qwen Coder 32B

32B

19 GB

Large-scale code projects

API Endpoint

All LLM inference uses the OpenAI-compatible endpoint:

Request format:

Model Downloads

Models are downloaded automatically on first use to ~/bonzai-models/. All model files use GGUF quantization format for efficient local inference.

Important: Model filenames are always lowercase (e.g., qwen3.5-27b-q2_k.gguf), even if the download URL uses mixed case.

Last updated