Why now
01.01Why are firms moving AI on-premises in 2026?
Three things shifted at once.
- Regulation caught up with the cloud. The EU AI Act is in force, ISO/IEC 42001 is now a procurement checkbox, and UAE PDPL enforcement has teeth. "Our vendor is SOC 2" is no longer a defensible answer when auditors ask where the data physically sits.
- Open-weight models closed the gap. On public benchmarks (MMLU, IFEval, HumanEval), Llama, Qwen, and DeepSeek now sit in the same tier as GPT-4-class commercial models - close enough that the work most firms actually do (drafting, summarising, document Q&A) is no longer the bottleneck.
- The hardware became affordable. A single workstation runs a 70B model for a team of 4 to 6 users at the cost of one year of cloud AI seats.
The result: keeping data inside the office is no longer the expensive option. In many cases it is the cheaper one.
01.02Why not wait another year?
Two reasons not to.
- Your competitors are not waiting. Boutique law firms in London, Dubai, and Singapore are deploying private AI in 2026 and using it as a differentiator on RFPs. The window where this is a competitive edge, rather than table stakes, is short.
- The regulatory direction is one-way. Every revision of GDPR, AI Act, PDPL, and HIPAA tightens, never loosens. Moving sensitive workflows to local infrastructure now is cheaper than retrofitting compliance later.
If you have a 12-month horizon, start the audit now. If you have a 3-year horizon, you are already late.
01.03Is the US-cloud-for-everything era really over?
For sensitive workloads in regulated industries: yes, for most of Europe and the Gulf. Not because US clouds are unsafe, but because the legal premise changed.
The CLOUD Act gives US authorities reach into data held by US providers regardless of where the server sits. EU courts have ruled this incompatible with GDPR twice (Schrems I, Schrems II). The result is a slow but steady migration of regulated workloads to local infrastructure: EU sovereign clouds, on-premises systems, or hybrid setups with hard data-residency boundaries.
Most workloads can stay in the cloud. The ones with attorney-client privilege, patient data, or trade secrets increasingly cannot.
01.04I am an independent consultant. Why does that matter to you?
Because the advice you get is not shaped by anyone's quota.
- I do not resell Azure, AWS, or Google Cloud. No partner tier, no kickback, no incentive to push you toward a platform that is not the right fit.
- I do not have an SDR team booking your time. The first call is with me, and so is the last.
- If on-premises is the wrong answer for your situation, I will tell you on the discovery call and not invoice you for it.
The trade-off: I am one person. I am deliberately careful about how many clients I take on, and I will not take a project I cannot deliver. If that is a risk for you, the support category below addresses it directly.
General
02.01What is zedbyl.tech?
zedbyl.tech is the consulting practice of Nikita Chetverikov, an independent specialist who designs, installs, and hands off on-premises AI systems for businesses that cannot send their data to a third party.
The deliverable is a working appliance inside your office: the AI model, the interface, the document search, the integrations. Plus the documentation your team needs to run it without me.
02.02Who is this for?
Organisations of 5 to 50 people in fields where data leaving the building is a compliance problem: law firms, private clinics, financial advisories, family offices, government suppliers, R&D teams handling proprietary work.
A simple test: if your clients, your regulator, or your insurance policy would have a problem with you sending documents to a US cloud, this is for you. If you just want a cheaper ChatGPT, it is not.
02.03What does "on-premises AI" actually mean in plain English?
The AI model and all of your documents live on a computer that physically sits in your office. The system can be disconnected from the internet entirely and it still works.
Nothing leaves your network. There is no call to OpenAI, Anthropic, Microsoft, or anyone else. When someone asks a question, the answer is computed on your machine, from your data, by software you own.
02.04How is this different from a "private ChatGPT"?
"Private ChatGPT" almost always means a contract with OpenAI or Microsoft that promises not to train on your data. The promise is legal, not technical: your text still travels to their servers.
On-premises means the data has no network path to a model vendor. The privacy is built into where the machine sits, not into a clause in a contract. The difference matters the day a regulator, an auditor, or a client asks you to prove it.
02.05Where are you based and how do you work with clients?
Based in Thailand on a permanent basis. Working hours GMT+7.
- Asia-Pacific. I travel regularly to Singapore, Vietnam, Hong Kong, and Japan and can schedule on-site time around those trips.
- Gulf. I also fly to Dubai on a regular basis and can plan on-site engagements across the UAE around those visits.
- Personal invitation. Anywhere else in Asia, including private engagements that require my physical presence, I am available by personal invitation.
- Europe and the Americas. On-site work in Europe or the United States is by personal invitation and a signed contract only. Remote engagements are open to clients in those regions year-round.
Typical buyer: a managing partner, a founder, or a head of operations, someone who can authorise the engagement without a six-month committee.
Deployment & architecture
03.01How long does a typical deployment take?
Between 5 and 14 working days for the standard appliance. Most of the time goes into loading your documents and connecting your existing tools, not into the AI itself.
| Discovery | 1 to 2 days |
| Hardware setup | 1 day |
| Model & interface | 1 day |
| Documents & integrations | 3 to 8 days |
| Training & handoff | 1 day |
This is the timeline for the Sovereign Deployment tier, delivered under a fixed fee.
03.02What does the process look like, step by step?
- Scoping call (90 min). What you want to do, what data is involved, who needs access.
- Written proposal. Hardware, model, integrations, timeline, fixed price.
- Hardware procurement. You buy it directly, or I procure it for you at cost.
- On-site installation. 1 to 3 days in your office.
- Documents & tuning. Loading your files, tuning search, iterating with a small pilot group.
- Team training. Two 60-minute sessions, recorded.
- Handoff. Written runbook, admin credentials, 30-day warranty.
03.03Can the system work fully offline?
Yes. Once installed, the AI model, the document database, and the chat interface all run locally. You can unplug the network cable and the system keeps working at full quality.
Internet is only needed for the initial model download and, if you choose, for occasional software updates. For air-gapped environments, updates are delivered on physical media.
03.04Does it work in Arabic, French, German, Chinese?
Yes. The default models I deploy (Llama, Qwen) are strong in major business languages: Arabic, French, German, Spanish, Chinese, Japanese, Hindi. Qwen in particular handles Arabic at near-English quality, which matters for clients in the GCC.
Document search works in mixed-language collections: your contracts can be in Arabic and English in the same folder, and the system handles it.
03.05Does it connect to the tools we already use?
Standard integrations covered out of the box:
- Document storage:
SharePoint,Nextcloud, local file shares, S3-compatible buckets. - Email:
Microsoft 365,Google Workspace, IMAP. - CRM:
HubSpot,Bitrix24,Salesforce. - Automation:
n8nas the orchestration layer, with 350+ connectors.
Anything custom is possible if it has an API. Listed and priced in the proposal.
03.06Can we run a hybrid, some local and some cloud?
Yes, and it is a common pattern. Confidential workflows (contracts, patient records, financials) run on the local model. Lower-sensitivity tasks (public-web research, general drafting) can route to a commercial API if you want them to.
The routing is explicit: the user sees which model is answering. No silent fallback to OpenAI without you knowing.
03.07Can our team keep it running after you leave?
Yes. The stack is deliberately boring: Ollama, Open WebUI, AnythingLLM, n8n. No exotic dependencies. A competent IT person handles daily operations.
Handoff includes a written runbook, recorded admin training, and admin credentials. Most clients do not call me for 6+ months after handoff.
Hardware & infrastructure
04.01What hardware do you recommend?
Three honest tiers, sized so the AI model runs entirely in memory at full quality. No swap, no compression shortcut.
- Sanctum Max: single workstation for 4 to 6 users.
- Sanctum Ultra: single high-memory workstation for 10 to 20 users.
- Sanctum Bespoke: multi-node GPU cluster for 30+ users or specialised workloads.
Full configurations, benchmarks, and the reasoning behind each tier are on the Sanctum Box product page.
04.02Why a Mac Studio workstation instead of an NVIDIA box?
For 70B-class models with 4 to 20 concurrent users, an Apple Silicon workstation is the most efficient appliance available. Unified memory replaces VRAM, the machine draws under 200 watts, sits silently in an office, and comes with a first-party warranty.
For higher concurrency or models above 200 billion parameters, NVIDIA still wins. That is what Sanctum Bespoke is for.
04.03Can it run on hardware we already own?
If you already have a recent Apple Silicon workstation (M2 Ultra, M3/M4 Max), or a workstation with a recent NVIDIA card (A6000, RTX 4090, RTX 5090), yes. I will benchmark it during discovery and tell you honestly whether it is enough for your workload.
Older or smaller hardware: I will tell you it is not enough and explain why. Underspec hardware is the single biggest reason on-prem projects fail.
04.04How many people can use one machine at the same time?
Measured in real engagements with Llama 3.3 70B at full quality:
| Sanctum Max | 4 to 6 |
| Sanctum Ultra | 10 to 20 |
| Sanctum Bespoke (2-node) | 30 to 40 |
"Active" means a user mid-prompt. A team of 50 with normal bursty usage fits comfortably on Ultra. A team of 50 all running long-document analysis at once does not.
04.05What happens when we outgrow the first machine?
Add a second node behind a load balancer. The interface, document database, and integrations all stay in place. Only the AI itself scales out. No re-platforming, no data migration.
Most clients reach this point 12 to 18 months in. Worth budgeting for from day one.
Models & capabilities
05.01Which AI models do you deploy?
Open-weight models that you own outright. No license fee, no API meter. Current defaults in 2026:
Llama 3.3 70B: general reasoning, drafting, summarising.Qwen 2.5 72B: strongest multilingual option, especially Arabic.DeepSeek-V3: heavy reasoning when memory permits.Nomic Embed v2: document search.
Model choice is part of discovery. Swapping a model later is a one-line config change. You are not locked in.
05.02Are open-source models actually good enough?
For the work my clients actually do (contract review, report drafting, internal Q&A, document search, structured data extraction): yes, comfortably. On public benchmarks (MMLU, IFEval, HumanEval) Llama 3.3 70B sits in the same tier as GPT-4-class commercial models, and for the office workflows above the gap closed in 2024–2025.
I do not ask you to take that on faith. Every engagement includes a customer-specific evaluation set: your tasks, your documents, your acceptance criteria, agreed at discovery and re-run at handover. The results land in the model card you keep.
For frontier reasoning (advanced math, novel research), commercial frontier models still win. That is rarely the workload an office actually has.
05.03Can it read our internal documents?
Yes. This is the core of every deployment. Your documents are read, indexed, and stored in a local search database. The AI retrieves the relevant passages for each question and answers with citations so you can verify.
Supported formats: PDF, DOCX, XLSX, PPTX, EML, MD, TXT, HTML. Scanned PDFs go through local OCR. Photos and audio are supported with optional modules.
05.04Can it generate images, voice, or video locally?
Images and voice: yes, on the standard appliance, with optional modules. Images via Stable Diffusion XL or Flux.1. Voice transcription via Whisper. Voice synthesis via Coqui or F5-TTS.
Video: yes, locally, but only on a dedicated GPU cluster. Modern open-weight video models (HunyuanVideo, Wan 2.1, LTX-Video, CogVideoX) require multiple high-end GPUs and significant cooling. This is a different class of machine from the standard Sanctum tiers.
If video generation is part of your workflow, get in touch to discuss a Bespoke cluster sized for it. The conversation starts with what you want to produce, how often, and at what quality, and goes from there.
05.05How current is the AI's knowledge?
The model's general-knowledge cutoff is typically 6 to 18 months before deployment. For most office work, this does not matter. The AI reasons over your documents, not its memory.
For tasks that need current information, you have two options: re-ingest the relevant new documents (no retraining needed) or enable optional web search through a controlled local proxy.
Security & compliance
06.01Can you prove data does not leave the building?
Yes. The proof is physical, not contractual. Every deployment ships with:
- A documented network egress policy you (or your IT) enforce at the firewall.
- Optional air-gap mode: the appliance has no route to the internet at all.
- Full traffic logs from the appliance interface, exportable for audit.
- An 11-point compliance checklist your auditor can run themselves.
If a regulator unplugs the network cable mid-audit, the system keeps working.
06.02Are you compliant with GDPR, UAE PDPL, HIPAA, EU AI Act?
Compliance is a property of your organisation, not of a piece of software. What I deliver is the technical foundation that makes your compliance position defensible: data residency, role-based access, audit logs, encryption at rest, no third-party processors.
Final sign-off is yours and your legal counsel's. I provide the documentation pack they will ask for, including the artefacts needed for an ISO 42001 audit and an EU AI Act risk assessment.
06.03Does this help us prepare for an ISO 42001 or AI Act audit?
Yes, and increasingly this is the explicit reason clients call. The deliverable includes:
- A model card describing each deployed model, its training data lineage, and known limitations.
- An access-control matrix mapping users to allowed actions.
- Logging policies that satisfy AI Act Article 12 (record-keeping).
- A risk-assessment template aligned with ISO 42001 Annex A.
It does not replace your legal counsel or auditor. It gives them documents to start from rather than blank pages.
06.04Who has access once the system is deployed?
Only the users you create. The interface supports role-based access (admin, editor, viewer) and integrates with your identity provider (SAML, LDAP, Microsoft Entra).
After handoff I have no access. No backdoor, no telemetry, no remote-support agent quietly running. If you need me back, you grant access on the spot and revoke it after.
06.05What happens if the hardware is stolen?
Full-disk encryption is on by default. On Apple Silicon workstations, FileVault with the recovery key escrowed in your password manager. On Linux nodes, LUKS with the same.
A stolen machine without the unlock credential is a brick. For higher-risk deployments I also recommend a tamper-evident enclosure and physical access logs.
06.06Do you sign NDAs and DPAs?
Yes. Mutual NDA before discovery, and a Data Processing Agreement before any contact with production data. Your templates are fine. I have my own if you prefer.
I do not subcontract. You sign with one person, and the same person shows up in your office.
Pricing & engagement
07.01How much does a deployment cost?
Pricing depends on tier, the size of your document set, and the integrations involved. Every proposal is itemised so you see exactly what you are paying for, with hardware billed at cost and no markup.
Hardware tier configurations and indicative ranges are published on the Sanctum Box product page. Fixed fees for the consulting tiers (assessment, deployment, operations, bespoke) are listed in EUR on the services & pricing page. For a tailored quote, book a 15-minute call and I will return a written proposal within two working days.
07.02Are there ongoing fees?
No mandatory subscription, no per-seat fee, no API meter. The appliance is yours.
Two optional retainers exist if you want them: a quarterly check-in and a priority support retainer (Managed Operations tier, €490/mo, cancel-any-month). After the 30-day warranty, most clients run unsupported.
07.03What is the difference between Max, Ultra, and Bespoke?
Scale, mostly.
- Max: one workstation. The minimum honest configuration. Right for 4 to 6 users in one office.
- Ultra: one high-memory workstation. Right for 10 to 20 users with mixed workloads.
- Bespoke: multi-node, GPU-based, custom integrations. Right when you have an unusual constraint: very high concurrency, custom-trained models, video generation, or regulated environments with their own hardware requirements.
Full hardware specifications live on the Sanctum Box product page. These hardware tiers are separate from the consulting tiers (Architectural Assessment, Sovereign Deployment, Managed Operations, Bespoke Capability Build) — those are priced on the services & pricing page.
07.04Do you offer a pilot or proof of concept?
A two-week paid pilot is available for organisations evaluating a Bespoke engagement. It runs on loaner hardware in your office, with a representative subset of your documents, and a success metric you define up front.
Pilot fee is credited against the full engagement if you proceed. If you do not, you keep the documentation and benchmark numbers.
If a paid pilot is too much, the Architectural Assessment tier (€490, 90 minutes + written deliverable) is the lighter entry point.
07.05Fixed price or hourly?
Always fixed price for the scope in the proposal. Out-of-scope work is quoted separately before it starts, never quietly absorbed into the invoice. Examples of fixed prices for each consulting tier are on the services & pricing page.
Invoices in EUR, USD, or AED. Payment by bank transfer. Crypto on request. No corporate cards.
Support & operations
08.01What if something breaks?
The first 30 days after handoff carry a full warranty. Anything that does not work as specified is fixed at no cost. After that, the system is yours.
Hardware faults go to the manufacturer or your hardware vendor under their warranty. Software issues are handled by your IT team via the runbook, or by me on the priority-support retainer.
08.02Do you train our team?
Yes, two passes. A user session for the people using the chat interface (60 min, recorded). An admin session for whoever keeps the system running (60 min, recorded, plus the written runbook).
Both sessions are included in the deployment fee. Extra workshops are quoted as needed.
08.03How are model updates handled?
Open-weight models release a new version every 4 to 8 months. Updates are opt-in, never automatic. The process: I notify clients on retainer, benchmark on a copy of your workload, and only swap if the new model is measurably better at your tasks.
Old weights are kept on disk. Rollback is one config line.
08.04Can we see who uses the system and how?
A built-in admin dashboard shows active sessions, queries per day, response time, and storage usage. Per-user activity is visible only to designated admins.
Query content is not logged centrally by default. Only metadata. If your compliance posture requires content logging, it is a config option and an audit decision, not a default.
08.05What if you, as the consultant, disappear?
Fair question to ask of an independent. Two layers of protection:
- No proprietary code. Everything I install is open-source. Ollama, Open WebUI, AnythingLLM, n8n, all backed by communities and companies that will outlive me.
- Written runbook. A plain-English operations manual delivered at handover. Any competent sysadmin - yours, an MSP, or another independent - can pick it up. The stack is common enough that finding someone is a hiring problem, not a sourcing problem.
If I get hit by a bus, your appliance still runs, and someone else can maintain it. That is the design intent, not an afterthought.
08.06What is NOT included in a standard engagement?
Things I will not do, or will quote separately:
- Custom model training from scratch (fine-tuning is in scope, pre-training is not).
- Consumer-facing chatbots or marketing assistants.
- Ongoing operations as a managed service. You own the appliance.
- Cloud-hosted versions of the same stack. Defeats the point.
Your question is not here?
Then it is probably specific to your firm and worth a 15-minute call. You speak to me directly. No SDR, no quota. I reply within a working day from Thailand (GMT+7).