Open WebUI
Docker-friendly chat UI. UPtrim auto-resolves users from OWUI's auth headers.
- Auto-identifies each user
- Signed HMAC identity for teams
- New users get private memory on first message
- AI tools surface inline in chat
Point your app's API URL at UPtrim. That's the integration.
Docker-friendly chat UI. UPtrim auto-resolves users from OWUI's auth headers.
Power-user chat UI. UPtrim picks up identity from existing auth tokens.
UPtrim implements the full OpenAI /chat/completions spec.
Your local LLMs. Plus, in v2.0, the big cloud providers — on the same proxy.
Run any GPU-accelerated backend that speaks the OpenAI API. Swap between up to 5 on paid tiers.
OAuth sign-in, no raw keys. Smart routing pushes hard problems to the cloud, keeps small talk local.
Everything runs local on one box. Nothing to wire up.
Memory, files, users, and history all live in one local file. Fast keyword search, WAL-mode concurrency, trivial backup.
Bundled bge-base-en-v1.5 for semantic search — runs on CPU, no external vector DB needed.
Expose UPtrim to your team via HTTPS without port-forwarding. Built-in tunnel management in the dashboard.
Premium-tier plugin engine. Write .trim scripts to extract, inject, filter, and react to events.
Full REST surface for managing memories, users, files, and settings. Build your own tooling on top.
Premium tier: hand complex multi-step tasks to your installed Claude Code app, preserving session & context cache.
You need a local model runner (Ollama is the easiest start), then point UPtrim at it. Here's the 5-minute setup.
Ollama is a one-line install on any OS. Pull a model like llama3.1, qwen2.5-coder, or mistral. It'll listen on localhost:11434.
curl -fsSL https://ollama.com/install.sh | sh ollama pull llama3.1 ollama serve
Download UPtrim, launch it, paste the free developer key from the README. On first run it'll ask for your LLM backend URL — enter Ollama's.
./uptrim --web --backend http://localhost:11434
In Open WebUI, SillyTavern, or any OpenAI-compatible app, set the API base URL to http://localhost:9099/v1. Use any API key value (UPtrim ignores it unless trust mode is strict). Chat normally — the proxy records memory in the background.
Visit http://localhost:9099/dashboard to see every memory, conversation, setting, and agent. You can edit or delete any stored fact. Export encrypted backups anytime.
UPtrim itself is light (200 MB RAM, no GPU). The heavy lift is your local LLM. Here's what you can run at each tier of hardware.
Any modern laptop, 16 GB RAM. Runs 3B models (Phi-3, Llama-3.2-3b) at ~10 tok/s on CPU.
RTX 3060, M2/M3 Mac, or any 8 GB VRAM GPU. Runs 8B models (Llama-3.1-8b, Qwen-coder-7b) at 40-60 tok/s.
RTX 4080/4090, M3 Max/M4, dual GPUs. Runs 32-70B models at usable speeds. Best experience.
No GPU? No problem. Pair a small local model with hybrid cloud routing (Pro tier) — local handles fast tasks, Claude or GPT handle the heavy lifting, and your memory stays put.
Takes about 2 minutes once your local model is running.
localhost:9099 and paste your free developer key when prompted.
http://localhost:9099/v1. No key rotation needed.
localhost:9099/dashboard to see what it knows.