Why on-premises is not "cloud without internet"
The engineering trade-offs behind real isolation, GDPR data residency, and where most "private ChatGPT" pitches fall apart under audit.
01 - INTRODUCTIONThe phrase that breaks every audit.
Every quarter or so, a prospective client forwards me a deck from another vendor. The deck always has a slide titled "Private AI." Underneath it, there's a diagram with a server icon and the words "runs in your environment." When I ask what that means, the answer is almost always the same: a managed cloud tenant in a region "close to" the client, with traffic routed through a VPN.
That is not on-premises. That is cloud with a polite accent. The distinction matters because under audit, only one of those things survives a serious question, and it is not the polite one.
This is the conversation I have with every new client in week one. It is also why a five-day deployment costs what it costs, and why I turn down about a third of the projects that come in.
02 - THE COMPLIANCE GAPThree questions that break the pitch.
A compliance officer does not need to understand transformers. They need answers to three questions, and they will keep asking until they get them in writing - and under the EU AI Act, those answers now have to survive a regulator, not just an internal review:
- Where does the data physically reside the moment after a user presses Enter? Not the marketing answer - the network path, traceable to a rack unit, with GDPR data residency implications you can prove on paper.
- Who has root on the machine that holds the model weights and the embeddings? "We do" is not an answer; "your sysadmin, with these named keys" is.
- If your vendor disappears tomorrow - bankrupt, acquired, sanctioned - what happens to the system on Monday morning?
Most "private ChatGPT" pitches answer question one with a region, question two with a shared-responsibility chart, and question three with a contractual clause. None of those are technical answers. They are billing answers in technical clothing - and they fall apart the moment the conversation shifts to GDPR liability for cloud LLMs or to a concrete deployment like the private RAG contract review for a Dubai and London law firm, where every retrieval is bound to a rack unit and cited on paper. The regulatory pressure is not theoretical either: TechCrunch's reporting on the Italian DPA's formal notification to OpenAI for GDPR violations shows how "trust us, it's encrypted in transit" stops working the moment a regulator opens a file.
If the answer to "where is the data" includes the word region, you are looking at cloud. If it includes the words rack unit, you are looking at on-prem.
03 - WHAT IT ACTUALLY MEANSOn-premises, defined narrowly.
For the systems I deploy, on-premises has a working definition with four clauses. Any system that fails one of them is not on-prem, regardless of what the brochure says.
- Locality. The hardware running inference is in a building the client owns or leases, with badge access controlled by the client's own physical-security process.
- Egress posture. The default network policy is deny, in the sense of NIST SP 800-53 SC-7 (Boundary Protection). Any outbound connection is a deliberate, named exception logged at the firewall - including model updates, telemetry, and license checks.
- Key custody. The client holds the only credentials with administrative access. No vendor, including me, has standing access after handover.
- Operational independence. The system continues to serve users with the upstream link severed for at least 30 days, following CISA guidance on isolated systems. If a license server is required, the system is not on-prem.
Why these four, and not more
I used to keep a longer list. After a dozen deployments I cut it. Anything outside these four, the client can verify from their own audit log - no checklist needed. Anything inside, you cannot verify after the fact, so it goes in the contract.
04 - ARCHITECTUREWhat the diagram looks like.
This is the reference deployment I bring to a first conversation. Three zones, one boundary, an explicit place for every byte. Inference stack is Ollama serving Llama 3.3 70B on a Mac Studio M4 Max. Browser layer is Open WebUI; RAG runs on AnythingLLM. Everything left of the firewall belongs to the client. Everything to the right does not exist as far as the system is concerned.
05 - TRADE-OFFSWhat you give up. Honestly.
On-prem is not free. It costs in three places, and pretending otherwise is the fastest way to lose a client's trust in week three.
The real cost is operational, not capital
Hardware is a one-time line that amortizes. The real ongoing cost of on-prem is process - a named person who owns the rotation schedule, audit log review, and the disaster-recovery drill. If the client cannot name that person on day one, I do not take the project. Same pattern shows up in the Gartner Peer Community analysis of total cost of ownership for custom-built AI solutions versus purchased platforms: hardware amortizes, but unowned process compounds into the line items that quietly break the second-year budget.
"Air-gapped" is a posture, not a product. The product is the discipline that maintains the posture after you have left the building.
06 - CLOSINGThe question that ends the conversation.
When a client is unsure whether they need on-prem, I ask them one question: "If your AI vendor's status page went red on a Friday at 5pm, would your business stop?" If the answer is yes, the conversation about cloud is over. If the answer is no, they probably do not need me. The question stops being hypothetical the moment you read TechCrunch's account of the January 2025 ChatGPT major outage - a single upstream incident that took business-critical workflows offline across multiple jurisdictions for hours of a working day.
The rest is engineering. Model selection, appliance sizing, ingestion pipeline - these are problems with known answers, and one full instance is written up in the 14-day private LLM rollout playbook. The hard part is naming what you are actually buying. It is not intelligence in the cloud. It is intelligence you own.
07 - WORK WITH MEOn-premise private AI deployment, done as engineering.
If the four-clause definition above describes a posture you actually need - locality, no egress, key custody, operational independence - the next step is not another deck. It is sizing the box, picking the model, writing the runbook. That is the scope of an on-premise private AI deployment: a network-isolated, GDPR-compliant private LLM appliance running Ollama with RAG over your own documents, delivered to law firms, clinics, accountants, universities and crypto media across the EU and UAE.
What a private LLM engagement covers
- Scoping call. One hour against the three audit questions from section 02 - your data, your regulator, your hardware budget. If the honest answer is cloud, you hear that first.
- Appliance sizing. Hardware selection for the model class you actually need - Apple Silicon, RTX A6000 or H100 - referenced against the benchmarks in Apple Silicon as an inference node, not a sales sheet.
- Private RAG over your corpus. Ingestion pipeline, clause-aware chunking, retrieval eval set, post-hoc citation verifier - the work that separates a demo from a production self-hosted ChatGPT alternative. One fully shipped instance is documented in the medical NER pipeline for UAE clinic referral letters, where hand-written referrals become structured EHR records on a single on-prem appliance.
- Handover and audit pack. Runbook, network diagram, egress test results, data-flow map suitable for a DPO, a GDPR auditor, or an EU AI Act review.
For regulated teams in the EU and UAE who have already answered the three audit questions and need a network-isolated, GDPR-compliant on-premise private AI deployment with private RAG, Ollama and a documented audit pack for the DPO, the intake form lives on the service page.