Blog
Field notes from a production self-hosted AI stack — KV cache, inference scaling, RAG infrastructure, Linux tuning. No hype, no listicles.
€8K Server vs €60K/year OpenAI Enterprise: When Each One Actually Wins
Every CTO evaluating generative AI eventually does the same napkin math:
Read post →GDPR Article 28 for AI Vendors: What OpenAI's DPA Actually Says (And What It Doesn't)
When enterprise procurement teams start auditing LLM architectures, the conversation hits a brick wall at GDPR Article 28. If your application or your employees are sending European personal data to an external model,…
Read post →The 3 Hidden Failure Modes of Self-Hosted LLMs (When You Scale Past 25 Users)
Spinning up a local Large Language Model is trivial today. A Docker Compose file, a consumer GPU, and you have a private AI assistant. The self-hosted AI dream is real — until you invite your team to use it.
Read post →Sovereign OCR: When Scanned PDFs Eat Your AI Pipeline (And Your Compliance Posture)
You bought a private LLM. You routed every chat through your own infrastructure. You wrote a DPA addendum your legal team actually signed. And then you piped your documents through a SaaS OCR API to extract the text —…
Read post →vLLM vs llama.cpp vs Ollama at 25 Users: What the Published Benchmarks Actually Show
If you run ollama run llama3.1 on your MacBook, see tokens flying across the screen at 80 TPS, and conclude you're ready to deploy an enterprise AI API — you're walking into a trap.
Read post →