Nikita Chetverikovzedbyl.tech
zedbyl.tech/blog/Public AI assistants in higher education: the GDPR exposure most institutions have not assessed
COMPLIANCEField note · 001

Public AI assistants in higher education: the GDPR exposure most institutions have not assessed

When staff paste student work into a public AI assistant - ChatGPT, Claude, Gemini, whichever - the institution becomes the controller for a processor it never contracted. A walk through GDPR Articles 5, 28, 32 and 35, the rulings already issued, and the architectural fix that does not require banning AI.

PublishedMay 4, 2026
Reading time11 min
AuthorNikita Chetverikov
Categoryon-prem · compliance

01 - THE FORWARDED SCREENSHOTA grading workflow, in three messages.

A university DPO forwarded me three screenshots last month. The first was a Slack message from a tenured professor: "ChatGPT cut my grading time in half - anyone else?" The second was a thread of replies, mostly enthusiastic. The third was the screenshot that ended the DPO's afternoon: a ChatGPT conversation, pasted as proof, containing the unredacted text of a Master's thesis along with the student's name in the file header.

The thesis had not been submitted publicly. The student had not consented to its content being processed by a third party. The professor had not informed the institution that personal data was being shared outside the controller's perimeter. Three of those three things are problems under European data-protection law, and only one of them - the consent gap - is the one most academics worry about.

This is the conversation I have with every higher-education prospect on the first call. Not a lecture about ethics. It walks through the four GDPR articles already triggered, the rulings already issued, and the one architecture that closes the gap without forcing the institution to ban a tool the staff will use anyway.

Heuristic

If a member of staff is using any public AI assistant for tasks touching student records, grant proposals, or unpublished research, the institution is already the controller for a processor it has not contracted. The only question is whether the auditor finds out before the regulator does.

02 - WHO BECOMES THE CONTROLLERArt. 4(7), Art. 28, and the chain of custody.

Under Article 4(7), the controller is the entity that determines the purposes and means of processing personal data. When a professor uses a public AI assistant to grade an essay, the institution - not the individual professor, not the AI vendor - is the controller. The professor is acting in the scope of employment; the institution prescribed the educational record; the legal basis for processing the student's work flows from the enrolment contract.

This is the part most academics get wrong. They reason that the assistant is a personal tool, that they signed up with a personal account, and that what they paste into it is their own business. That reasoning collapses as soon as the input contains data the institution holds under a duty of confidentiality.

  • The student's essay is personal data - it identifies the student, contains pseudonymous evaluations, and may contain special-category data (mental health disclosures, ethnic identity in personal-statement text).
  • The grade and feedback are personal data the controller is generating about the data subject. Article 22 provisions on automated decision-making attach the moment a model meaningfully influences the grade.
  • The grant proposal, the unpublished paper, the supervision notes - all of these contain data the institution is contractually or statutorily obliged to protect.

Once that data leaves the institutional perimeter and enters the vendor's infrastructure, the vendor is processing it on the institution's behalf. Under Article 28, that processing requires a written contract specifying scope, instructions, security measures, sub-processors, and audit rights. A free-tier acceptance of consumer Terms of Service is not such a contract. An enterprise agreement is not automatically enough either - it has to be evaluated against the institution's risk profile, the data categories actually being processed, and any third-country transfer implications under Schrems II. The data-residency engineering trade-offs of on-premise LLM deployment are where this controller-processor question stops being legal theory and becomes an architectural decision.

"The institution is liable for processing it never knew about, by a processor it never contracted, of data it never authorised to leave."

- Field journal, conversation with university DPO, March 2026

03 - THE THREE ARTICLES AUDITORS QUOTEArt. 5(1)(f), Art. 32, Art. 35.

A regulator does not need to read your AI policy to issue a finding. They read three articles, in order, and ask three questions. The institution either has documented answers or it does not.

  1. Article 5(1)(f) - integrity and confidentiality. Personal data must be processed in a manner that ensures appropriate security against unauthorised or unlawful processing. The auditor's question: "Show me the document trail that proves student data has not been transferred to an unauthorised processor." If the answer involves screenshots and Slack threads, the answer is no.
  2. Article 32 - security of processing. The controller must implement appropriate technical and organisational measures, including pseudonymisation and encryption where appropriate, and must regularly test and evaluate effectiveness. The auditor's question: "What measures prevent staff from pasting personal data into a public AI service, and how is the effectiveness of those measures tested?"
  3. Article 35 - Data Protection Impact Assessment. Required before any new high-risk processing. AI-assisted evaluation of educational records meets every published criterion for high risk: systematic evaluation, automated decision-making with significant effects, large-scale processing of data concerning vulnerable subjects (students). The auditor's question: "Show me the DPIA for AI-assisted grading."
GDPR articles triggered 4minimum Art. 4(7), 5(1)(f), 28, 32. Art. 35 if no DPIA exists. Art. 22 if AI influences the grade.
Maximum administrative fine €20M or 4% turnover Whichever is higher, under Art. 83(5). Public-sector caps vary by member state but the reputational damage does not.
Outbound connections required 0for compliant alternative A locally-hosted LLM removes the controller-processor relationship entirely. No transfer means no Article 28 obligation.

The DPIA is the part most institutions skip - and the part most regulators ask for first. Without it, every other defence collapses; the controller cannot show that risk was identified, mitigated, or accepted at institutional level. With it, the institution is at least defensible when something goes wrong.

04 - WHAT THE DPAS ALREADY SAIDItaly, Hamburg, and the EDPB.

European data-protection authorities have not been quiet on this. The pattern across rulings is consistent enough that any institution still operating without an architectural answer is doing so against published guidance.

  • Italy (Garante per la protezione dei dati personali) issued the first temporary block on ChatGPT in March 2023, citing absence of legal basis for training data and insufficient age verification. The block was lifted after remediation, but the underlying findings - particularly on processor obligations and the inadequacy of consumer-grade Terms of Service for institutional use - were never reversed. A subsequent investigation, concluded in late 2024, confirmed Article 5 and Article 6 violations and resulted in a €15 million GDPR fine against OpenAI for unlawful ChatGPT training data processing.
  • Hamburg (HmbBfDI) published guidance specifically aimed at educational institutions in 2023, updated through 2024 and 2025. The guidance is direct: pasting student data into a public AI service without a contract that meets Article 28 requirements is unlawful, regardless of the staff member's intent. The guidance recommends institutional alternatives - explicitly including locally-hosted models - as the only route to compliant AI use in teaching.
  • The European Data Protection Board (EDPB) has issued opinions confirming that risk assessments must address third-country transfers (Schrems II), processor accountability (Art. 28), and the specific risks of large language models trained on web-scraped corpora. The Board's framing is that AI compliance is not a checkbox; it is an ongoing accountability obligation. The IAPP's analysis of the EDPB Opinion 28/2024 on personal data processing in AI models walks through the four-question framework regulators are now applying to controller-processor accountability for large language models.
Strategic frame

The DPAs are not asking institutions to ban AI. They are asking institutions to use AI in a configuration where the controller-processor chain is documented, the data flow is bounded, and the risk is assessed. Locally-hosted models are the simplest architecture that satisfies all three.

05 - THE FERPA ECHOWhy this matters for US-EU programmes.

Many European universities run joint degree programmes, exchange agreements, or research collaborations with US institutions. As soon as a student record crosses the Atlantic - even as a single line in an evaluation - two regulatory regimes attach at once: GDPR on the European side, and FERPA (the Family Educational Rights and Privacy Act) on the American.

FERPA is narrower than GDPR but the overlap with AI use is precise. FERPA prohibits disclosure of personally identifiable information from education records to third parties without consent, with limited exceptions. A US institution that allows staff to paste student records into a public AI assistant may be making an unauthorised disclosure under FERPA, while the partnered European institution simultaneously becomes a controller for an unauthorised processor under GDPR. Both regimes can trigger off the same screenshot.

The compliance officer's question gets harder when two regulators are watching. The same regulatory pressure already drove a comparable architecture in legal services - documented in our notes on a private RAG deployment for contract review at a regulated law firm. A locally-hosted LLM, by contrast, simplifies both regimes at once. There is no disclosure under FERPA because the data does not leave the institution. There is no third-country transfer under GDPR because there is no transfer at all. The institutional answer to "where did the data go?" is the same in both jurisdictions: it stayed here.

06 - THE ARCHITECTURAL FIXPrivate LLM, not a banning policy.

Banning public AI assistants does not work. Staff use them anyway, on personal devices, on home networks, with the same student data - and the institution loses the ability to even see the problem. Detection becomes impossible, training turns adversarial, and the DPO writes memos nobody reads.

The architectural fix is to give staff something that does the job, sits inside the institutional perimeter, and is governed by the same data-protection regime as every other system on the network. A locally-hosted large language model - Llama 3.3 or Qwen 2.5, served through Ollama, with a familiar interface like Open WebUI or AnythingLLM - does this. The model runs on hardware the institution owns. The data never leaves the network. The Article 28 processor relationship dissolves because there is no third party to process anything. A worked example of this rollout is documented in our notes from a 14-day private LLM deployment.

What this means for the DPIA

  • Data flow becomes mappable on one page. User → internal web UI → local inference node → audit log. No external endpoints, no DNS resolutions to public services, no transfer.
  • Processor obligations dissolve. No Article 28 contract is required because no processor exists. The model weights are static artefacts under the institution's custody, not a service operated by another party.
  • Article 32 measures are documentable. Encryption at rest, network segmentation, role-based access, audit log immutability - all standard institutional IT controls, applied to one more system on the network.
  • Article 35 conclusions are defensible. The DPIA can document the residual risk as "comparable to existing institutional systems," because it is.

What it costs

For a department of 30-50 staff with research and grading workloads, a single Mac Studio appliance with a 70-billion-parameter model handles interactive use comfortably - the throughput numbers behind that claim are in our Apple Silicon inference benchmarks for the M4 Max and M3 Ultra as on-premise LLM nodes. Hardware budget sits in the four-figure range, one-time. Setup is a two-week engagement. Operational overhead is the same as any other internal service - patches, backups, periodic review.

Compare that to a single fine, a single subject-access request that surfaces undocumented processing, or one news cycle in which the institution explains to parents and applicants that student records were pasted into a US service without authorisation. The architecture is not the expensive option. The status quo is. The broader market is moving the same way: Menlo Ventures' 2025 mid-year analysis of enterprise LLM spend tracks a measurable shift toward on-premise and self-hosted inference in regulated sectors, with budgets reallocating away from public-API consumption.

"You do not need to ban the tool. You need to host the tool. The first ends a conversation; the second ends the liability."

- Closing line of every DPO call, 2026

The rest is engineering. Model size for the team, hardware sizing, integrating the document corpus for retrieval - these are problems with known answers. The hard part is naming what the institution is buying: not faster grading, but the legal posture that makes any AI use defensible at all.

07 - NEXT STEPGDPR-compliant private LLM deployment for your institution

The architectural fix above - a private LLM for the university, network-isolated, with documentable Article 32 controls and a DPIA conclusion that survives audit - is not a research project. It is a two-week engagement with a known cost envelope. The harder question is which model size, which corpus to retrieve over, and which staff workflows to migrate first.

That scoping is exactly what a structured engagement covers: workflow analysis, hardware sizing, model selection, RAG over the institutional document corpus, and a written roadmap with timeline and total cost - the artefacts a DPO, a CIO, and a finance director each need before signing off on on-premise LLM compliance as the standing posture, not a pilot. For European universities weighing a GDPR-compliant private LLM deployment, the full scope - DPIA-ready audit, on-premise inference hardware sizing, Article 28 documentation and a written rollout plan - is described on the private LLM deployment services for GDPR-regulated institutions page, alongside indicative pricing for on-premise AI engagements across the EU and UAE.

N
Nikita Chetverikov
Fullstack · Private AI