May 10, 2026

EU AI Act Compliance Requires Data Segregation. Here's What That Actually Means.

Eric Beans, CEO, H33.ai, Inc.

The EU AI Act's high-risk system provisions became fully enforceable in August 2025. Nine months later, most organizations deploying AI in the European Union still do not understand what the regulation actually requires at the technical level. They understand the categories. They understand the risk tiers. They have read the summaries. What they have not done is map the specific articles to specific technical controls and asked the question: does our current infrastructure satisfy these requirements, or are we hoping that our existing SOC 2 certification covers it?

It does not. SOC 2 does not cover it. ISO 27001 does not cover it. Neither framework was designed for AI-specific data segregation, and the AI Act's requirements go significantly beyond what either framework addresses. This post maps the regulation to technical reality, article by article, and explains what data segregation actually means when AI models are in the pipeline.

The Enforcement Timeline Is Not Upcoming. It Is Now.

The AI Act entered into force on August 1, 2024. The prohibitions on unacceptable-risk AI systems applied from February 2, 2025. The provisions for high-risk AI systems, which contain the data governance requirements that most organizations need to worry about, became applicable on August 2, 2025. General-purpose AI model obligations applied from August 2, 2025. We are past every major enforcement date.

The penalties are not theoretical. For violations of the prohibited AI practices provisions, fines can reach 35 million EUR or 7% of total worldwide annual turnover, whichever is higher. For violations of the high-risk AI system requirements, including the data governance requirements we are about to discuss, fines can reach 15 million EUR or 3% of global turnover. For supplying incorrect or misleading information to authorities, fines can reach 7.5 million EUR or 1% of turnover. These are per-violation figures. A systematic failure in data governance across multiple deployed models creates multiplicative exposure.

The AI Act's penalties are calculated against global turnover, not EU revenue. A company with 10 billion EUR in global revenue faces up to 700 million EUR in fines for prohibited practices violations and 300 million EUR for high-risk system violations, regardless of how much of that revenue comes from the EU.

Article 10: Data Governance for Training Data

Article 10 is the core data governance provision. It establishes requirements for training, validation, and testing data sets used in high-risk AI systems. The article requires that these data sets be "relevant, sufficiently representative, and to the best extent possible, free of errors and complete in view of the intended purpose." That is the language most people cite. But the technical requirements are in the details.

Training Data Must Not Leak Into Inference

Article 10(2) requires that data governance practices address "the relevant design choices" and "data collection processes and the origin of data, and in the case of personal data, the original purpose of the data collection." Article 10(3) requires that training data sets "take into account, to the extent required by the intended purpose, the characteristics or elements that are particular to the specific geographical, contextual, behavioural or functional setting."

What this means technically is that training data must be segregated from inference data. A model trained on European patient data must not leak that training data during inference on new patient inputs. This is not just a matter of not returning training examples verbatim. Model inversion attacks can reconstruct training data from model outputs. Membership inference attacks can determine whether a specific data point was in the training set. If your model leaks training data through its outputs, you are in violation of Article 10's data governance requirements because you have failed to maintain the integrity and segregation of the training data set.

Conventional defenses against training data leakage include differential privacy during training and output filtering. Differential privacy introduces accuracy loss that may itself conflict with Article 15's accuracy requirements (discussed below). Output filtering is heuristic and incomplete. Neither approach provides a mathematical guarantee of non-leakage. Fully Homomorphic Encryption provides that guarantee: if inference inputs and outputs are encrypted, and the model operator never holds plaintext, then no training data can be extracted from the inference pipeline because the pipeline never produces plaintext outputs that could be analyzed for training data artifacts.

Data Provenance and Documentation

Article 10(2)(b) requires documentation of "data collection processes and the origin of data." This means you need an auditable record of every data point used in training, where it came from, and how it was processed. Article 10(2)(f) requires "examination in view of possible biases that are likely to affect the health and safety of persons." This means you need provenance records detailed enough to support bias audits.

An H33-74 attestation generates a cryptographic proof at every data processing step. When training data is ingested, processed, augmented, and used for training, each step produces an attestation. These attestations form a tamper-evident chain. When a regulator asks for documentation of your data collection processes, you do not produce a manually maintained spreadsheet. You produce a cryptographic chain that proves, mathematically, which data was processed, when, and in what order. The chain cannot be forged, backdated, or modified after the fact.

Article 15: Accuracy, Robustness, and Cybersecurity

Article 15 requires that high-risk AI systems "achieve an appropriate level of accuracy, robustness, and cybersecurity, and perform consistently in those respects throughout their lifecycle." This is three requirements that interact in ways most compliance teams have not considered.

Accuracy Must Be Maintained Under Security Controls

If your security controls degrade model accuracy, you have a compliance conflict. Differential privacy, as noted above, trades accuracy for privacy. Aggressive input sanitization can remove features that the model depends on. Quantization for secure deployment can reduce prediction precision. Article 15 does not say "achieve accuracy unless your security controls interfere." It says achieve accuracy AND robustness AND cybersecurity. All three. Simultaneously.

FHE-based inference does not degrade accuracy for exact-arithmetic schemes. BFV (Brakerski/Fan-Vercauteren) inference produces results that are bit-for-bit identical to plaintext inference. CKKS (approximate arithmetic) introduces bounded error that is orders of magnitude smaller than the noise already present in neural network weights. Neither scheme requires accuracy-security tradeoffs. You get mathematical data protection with no accuracy cost.

Robustness Against Adversarial Inputs

Article 15(4) specifically addresses robustness: the system must be "resilient as regards attempts by unauthorised third parties to alter its use, outputs or performance by exploiting system vulnerabilities." Adversarial machine learning attacks, where carefully crafted inputs cause models to produce incorrect outputs, are a direct violation of this requirement if the system is not robust against them.

When inference operates on encrypted inputs, adversarial attacks become significantly harder. An attacker cannot craft adversarial perturbations without knowing the plaintext values of the input. The encryption converts the adversarial optimization problem from a tractable gradient-based search to an intractable search over the ciphertext space. This does not eliminate all adversarial risk, but it raises the bar from "graduate student with a GPU" to "break lattice-based cryptography," which is the same bar that protects your TLS connections.

Cybersecurity as a Regulatory Requirement

Article 15(5) requires that "high-risk AI systems shall be resilient against attempts by unauthorised third parties to alter their use, outputs or performance by exploiting system vulnerabilities." This explicitly requires cybersecurity controls around the AI system itself, not just the infrastructure it runs on. Your cloud provider's security certifications cover the infrastructure. They do not cover the AI-specific attack surface: model theft, data extraction, inference manipulation, and training data poisoning.

FHE addresses the data extraction and inference manipulation vectors by eliminating the plaintext attack surface. There is no plaintext to extract because the model operator never holds it. There is no inference to manipulate because the model operates on ciphertext, and any ciphertext modification is either computationally valid (and produces a valid encrypted result) or invalid (and produces gibberish that fails attestation verification).

Article 13: Transparency and Information Provision

Article 13 requires that high-risk AI systems "be designed and developed in such a way as to ensure that their operation is sufficiently transparent to enable deployers to interpret a system's output and use it appropriately." This includes providing information about "the level of accuracy, including its metrics, the robustness and cybersecurity measures, and any known or foreseeable circumstances that may have an impact on that expected level of accuracy."

Transparency does not mean the model must be interpretable in the white-box sense. It means the deployer must have sufficient information to understand what the model does, how it performs, and when it might fail. H33-74 attestations contribute to Article 13 compliance by providing verifiable, per-inference records of model version, configuration, input hash, and output hash. A deployer can verify that every inference was performed by the claimed model version, that inputs were processed in the claimed configuration, and that outputs correspond to the recorded inputs. This is more transparency than any conventional inference pipeline provides, where the deployer must trust the model operator's logs.

Article 14: Human Oversight

Article 14 requires that high-risk AI systems "be designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons during the period in which they are in use." Human oversight means a human can intervene, override, or halt the system. It also means the human has sufficient information to exercise that oversight meaningfully.

Cryptographic attestation enables meaningful human oversight by providing the information substrate. An oversight officer reviewing AI decisions can verify, for each decision, that the correct model was used, that the input was processed without tampering, and that the output matches the attested record. Without cryptographic verification, the oversight officer is reviewing logs that the model operator generated and controls. That is not oversight. That is reviewing the operator's self-assessment.

What "Data Segregation" Actually Means Under the AI Act

The term "data segregation" appears throughout AI Act guidance and supporting documentation. It refers to three distinct requirements that the regulation implies but does not always state in plain language.

Segregation Requirement 1: Training Data Cannot Leak Into Inference Outputs

The model must not reveal its training data through its outputs. This includes verbatim memorization (the model regurgitates training examples), statistical leakage (membership inference attacks succeed), and reconstruction (model inversion attacks recover training data from confidence scores or embeddings). Conventional mitigations include differential privacy (which degrades accuracy), output clipping (which is incomplete), and access controls (which are organizational, not mathematical).

FHE satisfies this requirement by encrypting inference outputs. Even if the model internally memorizes training data, the output is ciphertext. The model operator cannot inspect the output for training data leakage because the output is encrypted. The data owner, who receives the decrypted output, sees only the model's response to their input. They cannot perform membership inference or model inversion attacks without access to the model weights and repeated query access, which the encrypted pipeline restricts.

Segregation Requirement 2: Inference Inputs Must Not Be Accessible to the Model Operator

The model operator processes data on behalf of the data owner. Under the AI Act's data governance requirements, the operator must not have access to the plaintext inference inputs. This is distinct from GDPR's data processing requirements, which permit data processing under appropriate legal bases. The AI Act adds the requirement that the processing itself must preserve data governance properties, which means the operator must not be able to inspect, copy, or exfiltrate inference inputs.

This is the requirement that SOC 2 and ISO 27001 cannot satisfy. SOC 2 Type II attests that an organization has implemented and operates security controls. It does not attest that the organization cannot access the data it processes. ISO 27001 certifies an information security management system. It does not certify that the certified organization is mathematically prevented from accessing client data. Both frameworks trust the organization to follow its own policies. The AI Act requires more: it requires that data governance is enforced, not merely documented.

FHE enforces this requirement by construction. The model operator receives ciphertexts encrypted under the data owner's key. The operator does not have the key. The operator cannot decrypt the inputs. This is not a policy. It is a mathematical fact. The security relies on the hardness of the Learning With Errors (LWE) problem, which is believed to be resistant to both classical and quantum computers.

Segregation Requirement 3: Model Outputs Must Not Reveal Training Data

This is related to Requirement 1 but distinct. Requirement 1 addresses the model's tendency to memorize training data. Requirement 3 addresses the information-theoretic content of the model's outputs. A model's confidence scores, embedding vectors, attention weights, and intermediate representations all contain information about the training distribution. Given enough queries, an attacker can reconstruct training data from these signals.

FHE addresses this by encrypting all outputs, including intermediate representations if the architecture exposes them. The data owner receives only the final decrypted output, not the model's internal state. The model operator sees only ciphertext at every layer. There are no confidence scores to analyze, no embedding vectors to cluster, no attention weights to invert. The information-theoretic leakage channel is closed because the channel is encrypted.

Segregation Requirement	SOC 2 Coverage	ISO 27001 Coverage	FHE Coverage
Training data does not leak into inference	Not addressed	Not addressed	Encrypted outputs prevent analysis
Operator cannot access inference inputs	Policy-based only	Policy-based only	Mathematical impossibility
Outputs do not reveal training data	Not addressed	Not addressed	All outputs encrypted

The SOC 2 and ISO 27001 Gap

Many organizations assume that their existing SOC 2 Type II report or ISO 27001 certification covers AI Act compliance. This assumption is wrong, and it creates significant regulatory exposure.

SOC 2 evaluates controls across five trust service criteria: security, availability, processing integrity, confidentiality, and privacy. None of these criteria include AI-specific controls. SOC 2 does not evaluate whether a model leaks training data. It does not evaluate whether inference inputs are protected from the model operator. It does not evaluate whether model outputs contain information that could be used to reconstruct training data. SOC 2 was designed for cloud service providers processing conventional data. It predates the AI-specific threat models that the AI Act addresses.

ISO 27001 certifies that an organization has implemented an information security management system (ISMS) with appropriate controls from Annex A. The 2022 revision added controls for cloud security and some data protection measures, but it does not include AI-specific data segregation controls. An organization can be ISO 27001 certified while operating AI systems that violate every data segregation requirement in the AI Act.

The gap is not that SOC 2 and ISO 27001 are bad frameworks. They are excellent frameworks for what they cover. The gap is that the AI Act creates new requirements that did not exist when these frameworks were designed, and neither framework has been updated to address them. Relying on SOC 2 or ISO 27001 as evidence of AI Act compliance is like relying on a food safety certification as evidence of building code compliance. Different risks, different requirements, different controls.

H33-74 Attestation as Compliance Evidence

The AI Act requires not just compliance but evidence of compliance. Article 11 requires a technical documentation package. Article 12 requires automatic logging of events throughout the AI system's lifecycle. Article 17 requires a quality management system with documented procedures. All of these requirements demand auditable records.

H33-74 attestation produces compliance evidence as a byproduct of normal operation. Every inference generates a 74-byte cryptographic proof containing the input ciphertext hash, the output ciphertext hash, the model version commitment, the timestamp, and a triple-signed post-quantum signature bundle. These attestations are chained, so any modification to a historical record invalidates all subsequent records. The chain provides the automatic logging required by Article 12, the technical documentation substrate required by Article 11, and the quality evidence required by Article 17.

When a national supervisory authority requests evidence that your high-risk AI system complies with Article 10's data governance requirements, you present the attestation chain. Each attestation proves that a specific inference was performed on encrypted data, that the model operator never held plaintext, and that the output was encrypted. The chain proves that this was true for every inference, not just a sample. The signatures prove that the attestations were not fabricated after the fact. The post-quantum signature bundle ensures that the evidence remains valid even against future quantum computing attacks, which matters for records that may need to be verified years or decades after generation.

Per-Inference Compliance vs. Point-in-Time Audits

SOC 2 Type II reports cover a specific audit period, typically 6 or 12 months. ISO 27001 certifications are valid for three years with annual surveillance audits. Both are point-in-time assessments. They attest that controls were in place during the audit period. They do not attest to what happened between audits or to any specific inference.

H33-74 attestation is per-inference. Every single model invocation produces its own cryptographic proof. There are no gaps between audits. There are no sampling strategies. There is no reliance on the operator's self-reported logs. The attestation is generated by the cryptographic pipeline itself, not by a human operator who might make mistakes, cut corners, or face pressure to misrepresent compliance status.

For organizations operating high-risk AI systems under the AI Act, per-inference compliance evidence is not a luxury. It is the difference between being able to demonstrate compliance for any specific decision that a regulator questions and hoping that your periodic audit report is sufficient. Given the penalties at stake, hope is not a compliance strategy.

Practical Implementation

Integrating FHE-based data segregation into an existing AI pipeline does not require replacing your models, retraining your systems, or migrating your infrastructure. The H33 pipeline operates as a wrapper around your existing inference endpoint. Inputs are encrypted client-side before they reach your infrastructure. Your model operates on the encrypted data using our homomorphic inference engine. Outputs are encrypted and returned to the client for decryption. Attestations are generated automatically at each step.

The integration surface is your model's input and output boundary. If your model accepts a JSON payload and returns a JSON response, the encrypted version accepts an encrypted payload and returns an encrypted response. The model's internal architecture does not change. The training pipeline does not change. The deployment infrastructure does not change. What changes is the data protection guarantee: from policy-based to mathematical.

For organizations that need to demonstrate compliance to national supervisory authorities, we provide an attestation verification API that auditors can access directly. The API accepts an attestation and returns a verification result without revealing the underlying data. Auditors can verify every inference in your system without seeing a single data point. That is what data segregation means when it is implemented correctly.

Map Your AI Act Compliance Gaps

We will review your current AI deployment architecture against the specific articles of the EU AI Act and identify where your existing controls fall short. No sales pitch. Just the technical gap analysis.

Schedule a Demo