Persistent Memory
for Your
LLM Stack.
UPtrim sits between your chat clients and LLM backend to keep conversations coherent at scale — graduated trimming, strict isolation, identity-aware memory, and file retrieval.
Core Pillars
Six capabilities that make your LLM stack production-ready for real multi-user deployments.
Context Trimming Engine
Pressure-aware history management with soft/hard token zones and adaptive pruning. Long chats stay coherent without hitting context limits.
Multi-User Isolation
Strict, required, and quarantine identity modes enforce per-user memory boundaries in shared deployments. Zero bleed.
Memory Intelligence
Category-aware extraction, dedup, contradiction resolution, audit trails, TTL rules, and intent-aware relevance scoring.
File-Aware Retrieval
Upload documents, inject excerpts into context, and optionally use embeddings for deeper semantic search across your knowledge base.
Dashboard + TUI
Full web dashboard for memory management, user admin, and real-time monitoring. Terminal UI for headless environments.
Client Compatibility
Drop-in support for Open WebUI and SillyTavern with automatic identity-aware header resolution. No code changes needed.
Up and Running in Minutes
Three steps from download to full operation.
Connect Clients
Point Open WebUI or SillyTavern at the proxy endpoint. Identity resolution is automatic.
Set Identity Mode
Choose strict, required, or quarantine profiles to control memory safety for each user.
Tune Context + Memory
Adjust trimming thresholds, inject budget, upload settings, and dashboard controls.