LLM Gateway¶

Location: llm-gateway/ (separate directory) Default Port: 4000 Backend: Ollama (local model runtime)

OpenAI-compatible API gateway powered by Ollama. Provides user management, API key auth, usage tracking, rate limiting, and a web admin UI — fully self-hosted.

Architecture¶

Clients (OpenAI SDK, curl, tbjs, etc.)
        │
        ▼
┌──────────────────────┐
│   LLM Gateway :4000  │  FastAPI · Auth · Rate Limiting · Usage Tracking
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│   Ollama :11434      │  Text · Vision · Embedding · Audio · TTS
└──────────────────────┘

Features¶

OpenAI-compatible API — drop-in replacement for openai SDK
Multi-modal — text, vision (llava, qwen-vl), audio, embeddings, TTS
Smart routing — auto-detects content type (image/audio) → routes to capable model
Live Voice — real-time voice conversation via WebSocket
User management — API keys, tiers (PAYG / subscription), balance tracking
HuggingFace GGUF import — search and download models directly
Admin UI — model management, user management, system monitoring
Playground — chat with streaming, image upload, audio recording
Two deployment modes — bare metal (native Ollama) or Docker

Quick Start¶

1. Install Ollama¶

# Linux / macOS
curl -fsSL https://ollama.com/install.sh | sh

# Windows: download from ollama.com/download

2. Setup Gateway¶

cd llm-gateway
bash setup.sh          # Linux/macOS
# or:
powershell -ExecutionPolicy Bypass -File win_setup.ps1   # Windows

3. Start¶

source venv/bin/activate
uvicorn server:app --host 0.0.0.0 --port 4000

4. Via ToolBoxV2¶

tb llm-gateway setup
tb llm-gateway start
tb llm-gateway status
tb llm-gateway stop
tb llm-gateway restart

OpenAI-Compatible API¶

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000/v1",
    api_key="sk-admin-..."   # from data/config.json
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}]
)

Endpoints¶

Chat / Completions¶

POST /v1/chat/completions          → Chat (streaming supported)
POST /v1/embeddings                → Embeddings
POST /v1/audio/speech              → TTS
POST /v1/audio/transcriptions      → STT
GET  /v1/models                    → List loaded models

Live Voice (WebSocket)¶

POST /v1/audio/live                → Create session
WS   /v1/audio/live/ws/{token}    → Stream voice
DEL  /v1/audio/live/{token}       → Close session

Admin¶

GET/POST /admin/api/users          → User management
GET/POST /admin/api/models/*       → Model load/unload
GET      /admin/api/system         → System stats
GET/PUT  /admin/api/config         → Configuration

Model Types¶

Type	Capabilities	Example Models
`text`	Chat + tool calling	llama3.2, qwen2.5, mistral
`vision`	Chat + images	llava, qwen2-vl, bakllava
`omni`	Chat + image + audio	qwen2-audio
`embedding`	Text vectors	nomic-embed-text, mxbai-embed-large
`tts`	Text-to-speech	kokoro
`audio`	Speech-to-text	whisper

Configuration (`data/config.json`)¶

Key	Default	Description
`ollama_url`	`http://127.0.0.1:11434`	Ollama server URL
`backend_mode`	`bare`	`bare` (native) or `docker`
`admin_key`	generated	Admin API key
`hf_token`	null	HuggingFace token
`rate_limits.payg`	5	Req/min for PAYG users
`rate_limits.sub`	10	Req/min for subscribers

Docker¶

# Gateway only (Ollama on host)
OLLAMA_URL=http://host.docker.internal:11434 docker compose up gateway

# Full stack (Gateway + Ollama)
docker compose --profile ollama up

GPU support: uncomment deploy.resources.reservations.devices in compose.yaml.

Web UIs¶

URL	Description
`/`	Landing page
`/admin/`	Admin panel
`/playground/`	Chat playground
`/live`	Live voice
`/user/`	User dashboard

Testing¶

# 99 tests total
python -m unittest discover tests/

# Individual modules
python -m unittest tests.test_server          # 38 tests
python -m unittest tests.test_model_manager   # 38 tests
python -m unittest tests.test_live_handler    # 23 tests

ISAA Agent Framework — Uses LLM Gateway as a backend provider
System Stack — How gateway fits into the overall architecture