Self-Hosted AI: CPLT vs OpenAI, Anthropic, Together, Ollama

If you're reading this, you've probably already had the conversation: "We can't put production data into ChatGPT, but we can't afford a frontier-tier private deployment either." What you actually need is a real architecture, on real hardware, with a real audit trail.

This page compares CPLT against the providers most often shortlisted for that need. We've tried to be honest — when a competitor wins on something, we say so. The goal is to help you choose correctly, not to talk you into anything.

Last fact-checked May 2026 against publicly published pricing and documentation. If anything's wrong or out of date, tell us — we'll fix it.

The matrix

	CPLT build & handover	OpenAI Enterprise	Anthropic Claude	Together AI Dedicated	Anyscale Endpoints	Ollama (DIY)
Deployment Model
Runs on your hardware / VPC	Yes	No	No	VPC option	VPC option	Yes
Air-gap deployable	Yes	No	No	No	No	Yes
Multi-model behind one API	6+ providers	OpenAI only	Claude only	Together catalog	Anyscale catalog	Local only
Bring your own GPU	Yes	No	No	No	No	Yes
Compliance & Sovereignty
Data never leaves your network	Yes	No	No	VPC only	VPC only	Yes
EU-hosted by default	Yes	Region opt-in	Region opt-in	US-primary	US-primary	You choose
GDPR Art. 28 DPA included	Yes	Yes	Yes	Yes	Yes	N/A — you are processor
Operator-grade DR documentation	Included	SaaS-side only	SaaS-side only	SaaS-side only	SaaS-side only	DIY
Full audit log of every request	On your infra	Vendor-side	Vendor-side	Vendor-side	Vendor-side	DIY
Cost & Predictability
Pricing model	Fixed-scope project	Per-token + seats	Per-token + seats	Per-hour GPU	Per-hour GPU	Hardware only
Predictable monthly run-rate	Yes (own infra)	Usage-driven	Usage-driven	Reserved tiers	Reserved tiers	Yes
Typical entry investment	€5K–€10K (Tactical) €15K–€60K (build)	~$60+/seat/mo + tokens	~$30+/seat/mo + tokens	$2–8/GPU-hr × 24×7	$2–8/GPU-hr × 24×7	Hardware + 0 license
Mandatory retainer / minimums	None	Annual commit	Annual commit	Reserved minimum	Reserved minimum	None
Lock-in & Exit
You own the build artefacts	Yes — full handover	No	No	No	No	Yes
Swap underlying model without rewrite	LiteLLM router	No	No	In-vendor only	In-vendor only	DIY
Operate without the original builder	Yes — runbooks shipped	Yes	Yes	Yes	Yes	Yes
Custom MCP / tool integrations	Yes (add-on)	Function calling	Tool use API	No	No	DIY
Operator Burden
You operate the GPUs	Yes — you do	No	No	No	No	Yes
You patch & maintain the stack	Self-maintaining + your team	No	No	No	No	Fully on you
Frontier-class quality on day 1	70B open-weight	GPT-class	Claude-class	Frontier OSS	Frontier OSS	7–70B local

Yes = capability present ~ = partial / conditional No = not offered

Sources: vendor public pricing & docs as of May 2026. "Per-token" / "per-seat" figures reflect published list prices and are not audited. CPLT figures reflect typical engagement bands, not committed rates. Your actual scope dictates your actual price.

When to choose which

No vendor wins everything. Honest guidance on when to skip CPLT.

CPLT CHOOSE US

If: Your data can't leave your network, your finance team needs a fixed-cost line item, and your compliance team needs a deterministic audit trail. You want to own the platform — not rent it indefinitely.

Regulated mid-market (legal, health, finance, public sector)
Existing GPUs or budget for hardware
Internal team that can run a Linux box
You want exit options, not a deeper integration with a single vendor

OpenAI / Anthropic FRONTIER APIS

Choose them when: You need absolute frontier-class quality, you don't have data-sovereignty requirements, and per-token economics work for your usage profile. They're the right answer for a lot of teams — just not for the ones who can't send the data.

Consumer products with low compliance burden
Internal tools where leakage risk is low
Burst usage too small to justify dedicated hardware
You need GPT-class or Claude-class reasoning specifically

Together / Anyscale DEDICATED OSS

Choose them when: You want frontier OSS models (Llama, Qwen, DeepSeek) without operating GPUs yourself, you're OK with US-region hosting or VPC deployment, and your scale justifies reserved capacity.

Inference workloads ≥ 24×7 on a single model
Engineering team that doesn't want hardware
Acceptable to be a tenant, even a single-tenant one
VPC deployment satisfies your compliance posture

Ollama / DIY ROLL YOUR OWN

Choose it when: You have strong infra engineering in-house, you're happy to own the full stack (auth, audit, DR, observability, model routing), and your throughput needs are modest.

You have a senior platform engineer with capacity
Single-tenant, single-team, single-model is enough
You don't need a defensible compliance narrative
Time-to-deployment is less important than zero spend

What we're not

CPLT is not a SaaS. We don't operate your GPUs, host your model, or take a per-token margin. If you want someone else to be on-call for your inference cluster, hire OpenAI or Anthropic. We build the platform, hand it over, and leave. Your team owns operations from day one.

We're also not a frontier-model lab. The models we deploy are open-weight (Llama, Qwen, DeepSeek, Mistral), routed through LiteLLM so you can swap or add frontier APIs later if your data classification allows. If you genuinely need GPT-5-class reasoning today on data that can't leave your network, no self-hosted option meets that bar — including ours. Wait six months for the open-weight gap to close, or accept a hybrid posture.

And we're a small, focused team. If your procurement requires a Tier-1 vendor with global support and a red phone, we are not that vendor. If it requires a deterministic build, a documented handover, and a price you can sign off on, we are.

Still deciding?

Download the full Architecture Decision Matrix — an 8-page PDF with build-vs-buy worksheets, hardware sizing tables, and a vendor-neutral RFP template you can use against anyone in this comparison (including us).

Read the blog → Scope a deployment →

Self-hosted AI vs the frontier APIs