ARCHITECTURE
Every layer independently replaceable.
A production stack where the LLM provider, orchestrator, and tool layer are each swappable without touching the others. When a provider deprecates a model or an API changes, your stack keeps running.
Core stack — Included in every deployment
Self-maintaining
Show specs
- SSOT compiler: one config tree renders all containers, units, scripts
- Pre-flight checks refuse to render on drift
- Sub-2h bare-metal recovery, tested and verified
- Operator-grade DR playbook (see /resources for the public summary)
Meaning — When hardware dies, you rebuild from a single command. When a container drifts from the source of truth, the pre-flight check refuses to render until you resolve it. When you need to hand the stack to a new operator, the runbook is already written. Maintenance isn’t a line item in your budget — it’s built into the architecture.
Decoupling & resilience
Show specs
- 6+ LLM providers behind one OpenAI-compatible router (LiteLLM)
- Provider swap is a config diff — application code unchanged
- Per-call routing by cost, latency, capability
- Docker Compose, not Kubernetes — right scale for this problem
Meaning — When your LLM provider deprecates a model, your application keeps running. You swap providers with a config change, not a six-week migration. When pricing shifts, you route around it. When a sovereignty review requires EU-only inference, you switch providers without rewriting a line of application code.
Cost control
Show specs
- Per-provider, per-model, per-call cost tracking
- Token-level billing attribution
- Budget alerts before you burn through credits
- Cost dashboard accessible without SSH
Meaning — You see exactly which provider burns the most money per query, per document class, per month. Before a pilot goes to production and costs 10× what you budgeted, you catch it. Before a single runaway workflow drains your credits overnight, the alert fires. Cost control isn’t a feature — it’s why your CFO signs off on the deployment.
Add-ons — Scoped per engagement
ADD-ON
Document intelligence
Show specs
- 4 OCR engines: Vision LLM, Surya, Tesseract, gemma4-ocr
- Per-document-class routing (not one engine for everything)
- Deterministic post-processing pipeline
- Audit trail: which engine, why, what it produced
Meaning — Every document class routes to the engine that handles it best. Clean PDFs go to Tesseract. Messy scans go to Vision LLM. Tables go to Surya. When an engine fails on a document class, you route around it — not retire the whole pipeline.
ADD-ON
Custom MCP bridges
Show specs
- Bespoke Model Context Protocol servers for your environment
- SAP / ERP automation bridges
- Read-only SQL/PostgreSQL execution within your private subnet
- Filesystem RAG pipelines for internal document stores
- Live web orchestration via privacy-hardened proxies
Meaning — Your AI talks directly to your internal systems — SAP, SQL databases, document stores, web APIs — without exposing them to the public internet. Each MCP server is container-isolated and independently restartable. Custom-built for your stack, your data, your security boundaries.
ADD-ON
Custom integrations
Show specs
- Single sign-on (SSO) with your identity provider
- Custom branding and UI theming for LibreChat
- Environment-specific compliance mappings (ISO 27001, SOC 2, NIS2)
- Custom agent workflows for your industry vertical
- Any environment-specific requirement not covered by the standard stack
Meaning — Your AI stack fits your organization — not the other way around. Authentication, branding, compliance mappings, and agent behavior are tailored to your environment during scoping.
For full custom GRC pipeline deployments, see cplt.tech.
ADD-ON
Observability stack
Show specs
- Prometheus metrics for every container, service, provider call
- Grafana dashboards: cost per model, latency per provider, error rates per endpoint
- LiteLLM native cost tracking with token-level attribution
- Alert routing to email, Slack, PagerDuty, or your existing monitoring stack
- Runs on CPLT’s own production stack today
Meaning — When something breaks at 2 AM, you don’t open a Zoom call — you open the dashboard. Cost overruns, model degradation, and silent failures fire alerts before they become incidents. Built on the same Prometheus + Grafana stack that monitors CPLT’s production infrastructure.
ADD-ON
Off-site backup pipeline
Show specs
- Encrypted daily snapshots to S3, Backblaze B2, or your object storage of choice
- Three-layer architecture: local volume → host snapshot → encrypted off-site sync
- Sub-2h bare-metal rebuild procedure, tested and verified on CPLT’s own production stack
- Restore verification scripts — not just “hope the tarball is valid”
- Currently running daily on CPLT’s own infrastructure (verified May 2026)
Meaning — Disaster recovery isn’t a slide deck — it’s a procedure that runs every night and a rebuild that completes in under two hours. The same pipeline that protects CPLT’s production stack protects yours.
See it running or scope a deployment?
Discuss the architecture in detail, or scope a sovereign deployment for your infrastructure.
Scope your deployment → Read the DR summary →