home →
Download Free
v2.0 adds Claude + OpenAI routing

Works With What You Already Use.

No new tools to learn. UPtrim speaks the OpenAI chat API, so it drops into the apps and backends you already run. v2.0 expands the reach to Claude and GPT via OAuth — no raw keys.

Chat Clients

Point your app's API URL at UPtrim. That's the integration.

Supported

Open WebUI

Docker-friendly chat UI. UPtrim auto-resolves users from OWUI's auth headers.

  • Auto-identifies each user
  • Signed HMAC identity for teams
  • New users get private memory on first message
  • AI tools surface inline in chat
Supported

SillyTavern

Power-user chat UI. UPtrim picks up identity from existing auth tokens.

  • Recognizes users from any auth scheme
  • Works with custom identity plugins
  • Per-character memory lanes
  • Users manage their own access
Compatible

Any OpenAI API Client

UPtrim implements the full OpenAI /chat/completions spec.

  • Drop-in for LibreChat, LobeChat, etc.
  • Works with custom Python / Node apps
  • Streaming responses supported
  • Tool-use compatible

Model Backends

Your local LLMs. Plus, in v2.0, the big cloud providers — on the same proxy.

🏠 Local Backends v1.0 • Live

Run any GPU-accelerated backend that speaks the OpenAI API. Swap between up to 5 on paid tiers.

llama.cpp Ollama vLLM LM Studio Text-Gen-WebUI NVIDIA CUDA AMD ROCm Apple MLX

☁ Cloud Providers v2.0 • Pro tier

OAuth sign-in, no raw keys. Smart routing pushes hard problems to the cloud, keeps small talk local.

Anthropic Claude OpenAI GPT OAuth Login Cost Tracking Smart Router Capability Matrix

Ops & Storage

Everything runs local on one box. Nothing to wire up.

Built-In

SQLite + FTS5

Memory, files, users, and history all live in one local file. Fast keyword search, WAL-mode concurrency, trivial backup.

New in v2.0

Local Embeddings

Bundled bge-base-en-v1.5 for semantic search — runs on CPU, no external vector DB needed.

Built-In

Cloudflare Tunnels

Expose UPtrim to your team via HTTPS without port-forwarding. Built-in tunnel management in the dashboard.

New in v2.0

TrimScript Plugins

Premium-tier plugin engine. Write .trim scripts to extract, inject, filter, and react to events.

Built-In

FastAPI Admin API

Full REST surface for managing memories, users, files, and settings. Build your own tooling on top.

New in v2.0

Claude Code Subprocess

Premium tier: hand complex multi-step tasks to your installed Claude Code app, preserving session & context cache.

Local LLM Setup

New to local AI?

You need a local model runner (Ollama is the easiest start), then point UPtrim at it. Here's the 5-minute setup.

1Install Ollama (or llama.cpp)

Ollama is a one-line install on any OS. Pull a model like llama3.1, qwen2.5-coder, or mistral. It'll listen on localhost:11434.

curl -fsSL https://ollama.com/install.sh | sh
ollama pull llama3.1
ollama serve
2Run UPtrim & point it at Ollama

Download UPtrim, launch it, paste the free developer key from the README. On first run it'll ask for your LLM backend URL — enter Ollama's.

./uptrim --web --backend http://localhost:11434
3Point your chat app at UPtrim

In Open WebUI, SillyTavern, or any OpenAI-compatible app, set the API base URL to http://localhost:9099/v1. Use any API key value (UPtrim ignores it unless trust mode is strict). Chat normally — the proxy records memory in the background.

4Open the dashboard

Visit http://localhost:9099/dashboard to see every memory, conversation, setting, and agent. You can edit or delete any stored fact. Export encrypted backups anytime.

Hardware

Minimum & recommended.

UPtrim itself is light (200 MB RAM, no GPU). The heavy lift is your local LLM. Here's what you can run at each tier of hardware.

MIN

Laptop CPU

Any modern laptop, 16 GB RAM. Runs 3B models (Phi-3, Llama-3.2-3b) at ~10 tok/s on CPU.

llama-3.2-3b · phi-3-mini
REC

8 GB GPU

RTX 3060, M2/M3 Mac, or any 8 GB VRAM GPU. Runs 8B models (Llama-3.1-8b, Qwen-coder-7b) at 40-60 tok/s.

llama-3.1-8b · qwen2.5-coder-7b
PRO

16+ GB GPU

RTX 4080/4090, M3 Max/M4, dual GPUs. Runs 32-70B models at usable speeds. Best experience.

qwen2.5-coder-32b · llama-3.1-70b

No GPU? No problem. Pair a small local model with hybrid cloud routing (Pro tier) — local handles fast tasks, Claude or GPT handle the heavy lifting, and your memory stays put.

UPtrim in 3 Steps

Takes about 2 minutes once your local model is running.

  1. Download & run UPtrim Grab it from GitHub. Launch — it listens on localhost:9099 and paste your free developer key when prompted.
  2. Point your chat app at it In Open WebUI or SillyTavern, set the OpenAI API base URL to http://localhost:9099/v1. No key rotation needed.
  3. Chat normally UPtrim learns who you are in the background. Open the dashboard at localhost:9099/dashboard to see what it knows.

Works With Your Stack.

Drop it in today. Add hybrid cloud routing when v2.0 lands — same proxy, more reach.

Download v1.0 Free See the Full Platform →