Q: What is zedbyl.tech?

zedbyl.tech is the consulting practice of Nikita Chetverikov, an independent specialist who designs, installs, and hands off on-premises AI systems for businesses that cannot send their data to a third party. The deliverable is a working appliance inside your office: the AI model, the interface, the document search, the integrations. Plus the documentation your team needs to run it without me.

Question 1

Why are firms moving AI on-premises in 2026?

Accepted Answer

Three things shifted at once. Regulation caught up with the cloud. The EU AI Act is in force, ISO/IEC 42001 is now a procurement checkbox, and UAE PDPL enforcement has teeth. "Our vendor is SOC 2" is no longer a defensible answer when auditors ask where the data physically sits. Open-weight models closed the gap. On public benchmarks (MMLU, IFEval, HumanEval), Llama, Qwen, and DeepSeek now sit in the same tier as GPT-4-class commercial models - close enough that the work most firms actually do (drafting, summarising, document Q&A) is no longer the bottleneck. The hardware became affordable. A single workstation runs a 70B model for a team of 4 to 6 users at the cost of one year of cloud AI seats. The result: keeping data inside the office is no longer the expensive option. In many cases it is the cheaper one.

Question 2

Why not wait another year?

Accepted Answer

Two reasons not to. Your competitors are not waiting. Boutique law firms in London, Dubai, and Singapore are deploying private AI in 2026 and using it as a differentiator on RFPs. The window where this is a competitive edge, rather than table stakes, is short. The regulatory direction is one-way. Every revision of GDPR, AI Act, PDPL, and HIPAA tightens, never loosens. Moving sensitive workflows to local infrastructure now is cheaper than retrofitting compliance later. If you have a 12-month horizon, start the audit now. If you have a 3-year horizon, you are already late.

Question 3

Is the US-cloud-for-everything era really over?

Accepted Answer

For sensitive workloads in regulated industries: yes, for most of Europe and the Gulf. Not because US clouds are unsafe, but because the legal premise changed.

The CLOUD Act gives US authorities reach into data held by US providers regardless of where the server sits. EU courts have ruled this incompatible with GDPR twice (Schrems I, Schrems II). The result is a slow but steady migration of regulated workloads to local infrastructure: EU sovereign clouds, on-premises systems, or hybrid setups with hard data-residency boundaries.

Most workloads can stay in the cloud. The ones with attorney-client privilege, patient data, or trade secrets increasingly cannot.

Question 4

I am an independent consultant. Why does that matter to you?

Accepted Answer

Because the advice you get is not shaped by anyone's quota.

I do not resell Azure, AWS, or Google Cloud. No partner tier, no kickback, no incentive to push you toward a platform that is not the right fit.
I do not have an SDR team booking your time. The first call is with me, and so is the last.
If on-premises is the wrong answer for your situation, I will tell you on the discovery call and not invoice you for it.

The trade-off: I am one person. I am deliberately careful about how many clients I take on, and I will not take a project I cannot deliver. If that is a risk for you, the support category below addresses it directly.

Question 5

What is zedbyl.tech?

Accepted Answer

zedbyl.tech is the consulting practice of Nikita Chetverikov, an independent specialist who designs, installs, and hands off on-premises AI systems for businesses that cannot send their data to a third party.

The deliverable is a working appliance inside your office: the AI model, the interface, the document search, the integrations. Plus the documentation your team needs to run it without me.

Question 6

Who is this for?

Accepted Answer

Organisations of 5 to 50 people in fields where data leaving the building is a compliance problem: law firms, private clinics, financial advisories, family offices, government suppliers, R&D teams handling proprietary work. A simple test: if your clients, your regulator, or your insurance policy would have a problem with you sending documents to a US cloud, this is for you. If you just want a cheaper ChatGPT, it is not.

Question 7

What does "on-premises AI" actually mean in plain English?

Accepted Answer

The AI model and all of your documents live on a computer that physically sits in your office. The system can be disconnected from the internet entirely and it still works.

Nothing leaves your network. There is no call to OpenAI, Anthropic, Microsoft, or anyone else. When someone asks a question, the answer is computed on your machine, from your data, by software you own.

Question 8

How is this different from a "private ChatGPT"?

Accepted Answer

"Private ChatGPT" almost always means a contract with OpenAI or Microsoft that promises not to train on your data. The promise is legal, not technical: your text still travels to their servers.

On-premises means the data has no network path to a model vendor. The privacy is built into where the machine sits, not into a clause in a contract. The difference matters the day a regulator, an auditor, or a client asks you to prove it.

Question 9

Where are you based and how do you work with clients?

Accepted Answer

Based in Thailand on a permanent basis. Working hours GMT+7. Asia-Pacific. I travel regularly to Singapore, Vietnam, Hong Kong, and Japan and can schedule on-site time around those trips. Gulf. I also fly to Dubai on a regular basis and can plan on-site engagements across the UAE around those visits. Personal invitation. Anywhere else in Asia, including private engagements that require my physical presence, I am available by personal invitation. Europe and the Americas. On-site work in Europe or the United States is by personal invitation and a signed contract only. Remote engagements are open to clients in those regions year-round. Typical buyer: a managing partner, a founder, or a head of operations, someone who can authorise the engagement without a six-month committee.

Question 10

How long does a typical deployment take?

Accepted Answer

Between 5 and 14 working days for the standard appliance. Most of the time goes into loading your documents and connecting your existing tools, not into the AI itself. Stage Duration --- --- Discovery 1 to 2 days Hardware setup 1 day Model & interface 1 day Documents & integrations 3 to 8 days Training & handoff 1 day This is the timeline for the Sovereign Deployment tier, delivered under a fixed fee.

Question 11

What does the process look like, step by step?

Accepted Answer

Scoping call (90 min). What you want to do, what data is involved, who needs access. Written proposal. Hardware, model, integrations, timeline, fixed price. Hardware procurement. You buy it directly, or I procure it for you at cost. On-site installation. 1 to 3 days in your office. Documents & tuning. Loading your files, tuning search, iterating with a small pilot group. Team training. Two 60-minute sessions, recorded. Handoff. Written runbook, admin credentials, 30-day warranty.

Question 12

Can the system work fully offline?

Accepted Answer

Yes. Once installed, the AI model, the document database, and the chat interface all run locally. You can unplug the network cable and the system keeps working at full quality.

Internet is only needed for the initial model download and, if you choose, for occasional software updates. For air-gapped environments, updates are delivered on physical media.

Question 13

Does it work in Arabic, French, German, Chinese?

Accepted Answer

Yes. The default models I deploy (Llama, Qwen) are strong in major business languages: Arabic, French, German, Spanish, Chinese, Japanese, Hindi. Qwen in particular handles Arabic at near-English quality, which matters for clients in the GCC.

Document search works in mixed-language collections: your contracts can be in Arabic and English in the same folder, and the system handles it.

Question 14

Does it connect to the tools we already use?

Accepted Answer

Standard integrations covered out of the box:

Document storage: SharePoint, Nextcloud, local file shares, S3-compatible buckets.
Email: Microsoft 365, Google Workspace, IMAP.
CRM: HubSpot, Bitrix24, Salesforce.
Automation: n8n as the orchestration layer, with 350+ connectors.

Anything custom is possible if it has an API. Listed and priced in the proposal.

Question 15

Can we run a hybrid, some local and some cloud?

Accepted Answer

Yes, and it is a common pattern. Confidential workflows (contracts, patient records, financials) run on the local model. Lower-sensitivity tasks (public-web research, general drafting) can route to a commercial API if you want them to.

The routing is explicit: the user sees which model is answering. No silent fallback to OpenAI without you knowing.

Question 16

Can our team keep it running after you leave?

Accepted Answer

Yes. The stack is deliberately boring: Ollama, Open WebUI, AnythingLLM, n8n. No exotic dependencies. A competent IT person handles daily operations.

Handoff includes a written runbook, recorded admin training, and admin credentials. Most clients do not call me for 6+ months after handoff.

Question 17

What hardware do you recommend?

Accepted Answer

Three honest tiers, sized so the AI model runs entirely in memory at full quality. No swap, no compression shortcut.

Sanctum Max: single workstation for 4 to 6 users.
Sanctum Ultra: single high-memory workstation for 10 to 20 users.
Sanctum Bespoke: multi-node GPU cluster for 30+ users or specialised workloads.

Full configurations, benchmarks, and the reasoning behind each tier are on the Sanctum Box product page.

Question 18

Why a Mac Studio workstation instead of an NVIDIA box?

Accepted Answer

For 70B-class models with 4 to 20 concurrent users, an Apple Silicon workstation is the most efficient appliance available. Unified memory replaces VRAM, the machine draws under 200 watts, sits silently in an office, and comes with a first-party warranty.

For higher concurrency or models above 200 billion parameters, NVIDIA still wins. That is what Sanctum Bespoke is for.

Question 19

Can it run on hardware we already own?

Accepted Answer

If you already have a recent Apple Silicon workstation (M2 Ultra, M3/M4 Max), or a workstation with a recent NVIDIA card (A6000, RTX 4090, RTX 5090), yes. I will benchmark it during discovery and tell you honestly whether it is enough for your workload.

Older or smaller hardware: I will tell you it is not enough and explain why. Underspec hardware is the single biggest reason on-prem projects fail.

Question 20

How many people can use one machine at the same time?

Accepted Answer

Measured in real engagements with Llama 3.3 70B at full quality:

Sanctum Max	4 to 6
Sanctum Ultra	10 to 20
Sanctum Bespoke (2-node)	30 to 40

"Active" means a user mid-prompt. A team of 50 with normal bursty usage fits comfortably on Ultra. A team of 50 all running long-document analysis at once does not.

Question 21

What happens when we outgrow the first machine?

Accepted Answer

Add a second node behind a load balancer. The interface, document database, and integrations all stay in place. Only the AI itself scales out. No re-platforming, no data migration.

Most clients reach this point 12 to 18 months in. Worth budgeting for from day one.

Question 22

Which AI models do you deploy?

Accepted Answer

Open-weight models that you own outright. No license fee, no API meter. Current defaults in 2026:

Llama 3.3 70B: general reasoning, drafting, summarising.
Qwen 2.5 72B: strongest multilingual option, especially Arabic.
DeepSeek-V3: heavy reasoning when memory permits.
Nomic Embed v2: document search.

Model choice is part of discovery. Swapping a model later is a one-line config change. You are not locked in.

Question 23

Are open-source models actually good enough?

Accepted Answer

For the work my clients actually do (contract review, report drafting, internal Q&A, document search, structured data extraction): yes, comfortably. On public benchmarks (MMLU, IFEval, HumanEval) Llama 3.3 70B sits in the same tier as GPT-4-class commercial models, and for the office workflows above the gap closed in 2024–2025.

I do not ask you to take that on faith. Every engagement includes a customer-specific evaluation set: your tasks, your documents, your acceptance criteria, agreed at discovery and re-run at handover. The results land in the model card you keep.

For frontier reasoning (advanced math, novel research), commercial frontier models still win. That is rarely the workload an office actually has.

Question 24

Can it read our internal documents?

Accepted Answer

Yes. This is the core of every deployment. Your documents are read, indexed, and stored in a local search database. The AI retrieves the relevant passages for each question and answers with citations so you can verify.

Supported formats: PDF, DOCX, XLSX, PPTX, EML, MD, TXT, HTML. Scanned PDFs go through local OCR. Photos and audio are supported with optional modules.

Question 25

Can it generate images, voice, or video locally?

Accepted Answer

Images and voice: yes, on the standard appliance, with optional modules. Images via Stable Diffusion XL or Flux.1. Voice transcription via Whisper. Voice synthesis via Coqui or F5-TTS. Video: yes, locally, but only on a dedicated GPU cluster. Modern open-weight video models (HunyuanVideo, Wan 2.1, LTX-Video, CogVideoX) require multiple high-end GPUs and significant cooling. This is a different class of machine from the standard Sanctum tiers. If video generation is part of your workflow, get in touch to discuss a Bespoke cluster sized for it. The conversation starts with what you want to produce, how often, and at what quality, and goes from there.

Question 26

How current is the AI's knowledge?

Accepted Answer

The model's general-knowledge cutoff is typically 6 to 18 months before deployment. For most office work, this does not matter. The AI reasons over your documents, not its memory.

For tasks that need current information, you have two options: re-ingest the relevant new documents (no retraining needed) or enable optional web search through a controlled local proxy.

Question 27

Can you prove data does not leave the building?

Accepted Answer

Yes. The proof is physical, not contractual. Every deployment ships with:

A documented network egress policy you (or your IT) enforce at the firewall.
Optional air-gap mode: the appliance has no route to the internet at all.
Full traffic logs from the appliance interface, exportable for audit.
An 11-point compliance checklist your auditor can run themselves.

If a regulator unplugs the network cable mid-audit, the system keeps working.

Question 28

Are you compliant with GDPR, UAE PDPL, HIPAA, EU AI Act?

Accepted Answer

Compliance is a property of your organisation, not of a piece of software. What I deliver is the technical foundation that makes your compliance position defensible: data residency, role-based access, audit logs, encryption at rest, no third-party processors. Final sign-off is yours and your legal counsel's. I provide the documentation pack they will ask for, including the artefacts needed for an ISO 42001 audit and an EU AI Act risk assessment.

Question 29

Does this help us prepare for an ISO 42001 or AI Act audit?

Accepted Answer

Yes, and increasingly this is the explicit reason clients call. The deliverable includes:

A model card describing each deployed model, its training data lineage, and known limitations.
An access-control matrix mapping users to allowed actions.
Logging policies that satisfy AI Act Article 12 (record-keeping).
A risk-assessment template aligned with ISO 42001 Annex A.

It does not replace your legal counsel or auditor. It gives them documents to start from rather than blank pages.

Question 30

Who has access once the system is deployed?

Accepted Answer

Only the users you create. The interface supports role-based access (admin, editor, viewer) and integrates with your identity provider (SAML, LDAP, Microsoft Entra).

After handoff I have no access. No backdoor, no telemetry, no remote-support agent quietly running. If you need me back, you grant access on the spot and revoke it after.

Question 31

What happens if the hardware is stolen?

Accepted Answer

Full-disk encryption is on by default. On Apple Silicon workstations, FileVault with the recovery key escrowed in your password manager. On Linux nodes, LUKS with the same.

A stolen machine without the unlock credential is a brick. For higher-risk deployments I also recommend a tamper-evident enclosure and physical access logs.

Question 32

Do you sign NDAs and DPAs?

Accepted Answer

Yes. Mutual NDA before discovery, and a Data Processing Agreement before any contact with production data. Your templates are fine. I have my own if you prefer.

I do not subcontract. You sign with one person, and the same person shows up in your office.

Question 33

How much does a deployment cost?

Accepted Answer

Pricing depends on tier, the size of your document set, and the integrations involved. Every proposal is itemised so you see exactly what you are paying for, with hardware billed at cost and no markup.

Hardware tier configurations and indicative ranges are published on the Sanctum Box product page. Fixed fees for the consulting tiers (assessment, deployment, operations, bespoke) are listed in EUR on the services & pricing page. For a tailored quote, book a 15-minute call and I will return a written proposal within two working days.

Question 34

Are there ongoing fees?

Accepted Answer

No mandatory subscription, no per-seat fee, no API meter. The appliance is yours.

Two optional retainers exist if you want them: a quarterly check-in and a priority support retainer (Managed Operations tier, €490/mo, cancel-any-month). After the 30-day warranty, most clients run unsupported.

Question 35

What is the difference between Max, Ultra, and Bespoke?

Accepted Answer

Scale, mostly. Max: one workstation. The minimum honest configuration. Right for 4 to 6 users in one office. Ultra: one high-memory workstation. Right for 10 to 20 users with mixed workloads. Bespoke: multi-node, GPU-based, custom integrations. Right when you have an unusual constraint: very high concurrency, custom-trained models, video generation, or regulated environments with their own hardware requirements. Full hardware specifications live on the Sanctum Box product page. These hardware tiers are separate from the consulting tiers (Architectural Assessment, Sovereign Deployment, Managed Operations, Bespoke Capability Build) — those are priced on the services & pricing page.

Question 36

Do you offer a pilot or proof of concept?

Accepted Answer

A two-week paid pilot is available for organisations evaluating a Bespoke engagement. It runs on loaner hardware in your office, with a representative subset of your documents, and a success metric you define up front.

Pilot fee is credited against the full engagement if you proceed. If you do not, you keep the documentation and benchmark numbers.

If a paid pilot is too much, the Architectural Assessment tier (€490, 90 minutes + written deliverable) is the lighter entry point.

Question 37

Fixed price or hourly?

Accepted Answer

Always fixed price for the scope in the proposal. Out-of-scope work is quoted separately before it starts, never quietly absorbed into the invoice. Examples of fixed prices for each consulting tier are on the services & pricing page. Invoices in EUR, USD, or AED. Payment by bank transfer. Crypto on request. No corporate cards.

Question 38

What if something breaks?

Accepted Answer

The first 30 days after handoff carry a full warranty. Anything that does not work as specified is fixed at no cost. After that, the system is yours.

Hardware faults go to the manufacturer or your hardware vendor under their warranty. Software issues are handled by your IT team via the runbook, or by me on the priority-support retainer.

Question 39

Do you train our team?

Accepted Answer

Yes, two passes. A user session for the people using the chat interface (60 min, recorded). An admin session for whoever keeps the system running (60 min, recorded, plus the written runbook). Both sessions are included in the deployment fee. Extra workshops are quoted as needed.

Question 40

How are model updates handled?

Accepted Answer

Open-weight models release a new version every 4 to 8 months. Updates are opt-in, never automatic. The process: I notify clients on retainer, benchmark on a copy of your workload, and only swap if the new model is measurably better at your tasks.

Old weights are kept on disk. Rollback is one config line.

Question 41

Can we see who uses the system and how?

Accepted Answer

A built-in admin dashboard shows active sessions, queries per day, response time, and storage usage. Per-user activity is visible only to designated admins.

Query content is not logged centrally by default. Only metadata. If your compliance posture requires content logging, it is a config option and an audit decision, not a default.

Question 42

What if you, as the consultant, disappear?

Accepted Answer

Fair question to ask of an independent. Two layers of protection:

No proprietary code. Everything I install is open-source. Ollama, Open WebUI, AnythingLLM, n8n, all backed by communities and companies that will outlive me.
Written runbook. A plain-English operations manual delivered at handover. Any competent sysadmin - yours, an MSP, or another independent - can pick it up. The stack is common enough that finding someone is a hiring problem, not a sourcing problem.

If I get hit by a bus, your appliance still runs, and someone else can maintain it. That is the design intent, not an afterthought.

Question 43

What is NOT included in a standard engagement?

Accepted Answer

Things I will not do, or will quote separately: Custom model training from scratch (fine-tuning is in scope, pre-training is not). Consumer-facing chatbots or marketing assistants. Ongoing operations as a managed service. You own the appliance. Cloud-hosted versions of the same stack. Defeats the point.

Why now

General

Deployment & architecture

Hardware & infrastructure

Models & capabilities

Security & compliance

Pricing & engagement

Support & operations

Your question is not here?

Discovery	1 to 2 days
Hardware setup	1 day
Model & interface	1 day
Documents & integrations	3 to 8 days
Training & handoff	1 day