The security industry has spent thirty years building walls. Firewalls, network segmentation, VPC isolation, private endpoints, air gaps. These tools work when the threat model is unauthorized network access. They fail completely when the threat model is the computation itself.
AI inference is the computation. When a model runs, the input data must be plaintext in processor memory. Every matrix multiplication, every attention head calculation, every activation function operates on unencrypted numbers. The data is exposed not because the network is insecure, but because the math requires it. No firewall addresses this. No VPC addresses this. No private endpoint addresses this. The data is plaintext because the model needs it to be plaintext.
This is the fundamental disconnect in how the industry discusses AI data protection. Organizations deploy network-layer controls and believe they have achieved data isolation. They have not. They have achieved network isolation. The data itself remains exposed at the point of computation, which is the only point that matters.
The VPC Illusion
Virtual Private Clouds are the most commonly cited data isolation mechanism for enterprise AI deployments. The argument is straightforward: deploy the model inside your VPC, and the data never leaves your network boundary. This is true at the network layer and irrelevant at the data layer.
Inside your VPC, the model still processes plaintext. The inference server's CPU or GPU memory contains your unencrypted data. Every administrator with access to the VPC can access that data. Every process running on the inference server can access that memory space. Every memory dump, core dump, or debugging session captures plaintext data. Every logging framework that captures inference inputs and outputs stores plaintext.
The VPC prevents external network access to the inference server. It does not prevent internal access. It does not encrypt the data during processing. It does not protect against a compromised instance within the VPC. It does not protect against a malicious or negligent administrator. It does not protect against a supply chain attack on the model serving infrastructure.
More importantly, VPC deployment of large language models is often impractical. Running a frontier model requires significant GPU infrastructure. Most organizations do not maintain GPU clusters capable of serving LLMs at production latency. They use hosted APIs precisely because the infrastructure cost of self-hosting is prohibitive. The VPC isolation argument assumes a deployment model that most organizations cannot afford.
Private Endpoints Are Network Controls, Not Data Controls
Cloud providers offer private endpoints that route API traffic over private network connections rather than the public internet. AWS PrivateLink, Azure Private Link, and Google Private Service Connect create network-level connectivity between your VPC and the provider's service. The traffic does not traverse the public internet. This is a network security control. It is not a data security control.
The private endpoint changes the network path. It does not change what happens at the destination. The AI vendor's inference server still receives your plaintext data, processes it in plaintext, and returns a plaintext response. The private endpoint ensures that no one can intercept the traffic on the public internet. It does nothing about the fact that the vendor's infrastructure processes your data without encryption.
If your security architecture equates private endpoints with data isolation, your architecture has a gap at the most critical point: the moment of computation.
The TEE Promise and Its Documented Failures
Trusted Execution Environments represent a more sophisticated approach to data isolation during computation. Intel SGX, AMD SEV, and ARM Confidential Compute Architecture create hardware-enforced enclaves where data is encrypted in memory and decrypted only inside the processor. The operating system, hypervisor, and other processes cannot access data inside the enclave. In theory, this solves the plaintext-during-computation problem.
In practice, TEEs have a troubled security history. The documented side-channel attacks against these technologies undermine the isolation guarantees they claim to provide.
Foreshadow (L1TF, 2018): Exploits speculative execution in Intel processors to extract data from SGX enclaves. An attacker running code on the same physical processor can read enclave memory through L1 cache timing analysis. Intel issued microcode updates, but the attack demonstrated that SGX's isolation model is vulnerable to microarchitectural side channels.
AEPIC Leak (2022): Exploits the Advanced Programmable Interrupt Controller on Intel processors to read stale data from SGX enclaves without any side channel. This is an architectural bug, not a speculative execution vulnerability. It leaks enclave data through legitimate APIC register reads. No microcode fix fully addresses the underlying architectural issue.
CacheOut (2020): Exploits Intel's Transactional Synchronization Extensions to selectively evict data from the L1 cache, enabling targeted extraction of data from SGX enclaves. The attacker can choose which cache lines to evict and read, making the attack precise rather than probabilistic.
These are not theoretical attacks. They are published, peer-reviewed, and demonstrated against production hardware. Intel has issued mitigations for some of these vulnerabilities, but the pattern is clear: hardware-based isolation is subject to hardware-based attacks. Every new processor generation introduces new microarchitectural features, and every new microarchitectural feature introduces new potential side channels.
AMD SEV has its own vulnerability history. The SEVered attack demonstrated plaintext recovery from SEV-encrypted virtual machines. The undeSErVed attack bypassed AMD's Secure Encrypted Virtualization protections. AMD SEV-SNP addresses some of these issues but introduces additional complexity and has not yet accumulated the years of adversarial scrutiny that SGX has endured.
ARM CCA is newer still. It has not been deployed at scale for AI inference workloads. Its security properties are largely unvalidated by the broader research community. Trusting it for high-sensitivity data processing requires faith in ARM's implementation that has not been earned through adversarial testing.
The Hardware Vendor Trust Problem
All TEE approaches share a fundamental trust assumption: you must trust the hardware vendor. Intel's attestation service validates that an enclave is running genuine Intel SGX. AMD's attestation validates genuine AMD SEV-SNP. ARM's attestation validates genuine ARM CCA. If the hardware vendor is compromised, coerced, or negligent, the attestation is meaningless.
This is not a paranoid concern. Hardware vendors are subject to the same legal compulsion as software vendors. A government that can issue a FISA 702 order to a cloud provider can issue one to a hardware vendor. The TEE attestation proves that the hardware is genuine. It does not prove that the hardware has not been modified at the fabrication level to include a backdoor that the attestation does not detect.
The cryptographic community has a term for this: "trusting the manufacturer." It is the weakest form of trust in a security system. True data isolation should not require trusting any single vendor, hardware or software.
FHE: Encryption That Survives Computation
Fully Homomorphic Encryption is the only technology that maintains encryption through computation without trusting the compute infrastructure. The mathematical foundation is different from every approach discussed above. FHE does not isolate the computation environment. It encrypts the data in a way that allows computation to proceed on the ciphertext directly.
When data is encrypted with an FHE scheme, it becomes a set of polynomial coefficients in a lattice. Addition of two ciphertexts produces a ciphertext that, when decrypted, yields the sum of the two plaintexts. Multiplication of two ciphertexts produces a ciphertext that, when decrypted, yields the product. These homomorphic properties extend to the complex operations required for neural network inference: matrix multiplications, convolutions, and polynomial approximations of activation functions.
The critical property is that the compute infrastructure never possesses the decryption key. The data enters the system as ciphertext. Every computation operates on ciphertext. The result exits as ciphertext. Only the data owner, who holds the secret key, can decrypt the result. A breach of the compute infrastructure yields ciphertext that is computationally indistinguishable from random noise without the secret key.
This is not network isolation. This is not process isolation. This is not hardware-enforced memory isolation with documented side-channel vulnerabilities. This is mathematical isolation. The security guarantee derives from the hardness of the Ring Learning With Errors problem, which is believed to be resistant to both classical and quantum computing attacks.
Three FHE Schemes for Three Workload Types
Not all AI workloads have the same computational profile. Neural network inference requires different arithmetic than decision tree evaluation, which requires different arithmetic than boolean classification. H33 operates all three major FHE schemes to match the encryption to the workload.
CKKS: Neural Network Inference
The CKKS scheme, developed by Cheon, Kim, Kim, and Song, supports approximate arithmetic on encrypted data. It encodes floating-point values into polynomial coefficients and supports SIMD-style parallel operations across multiple data slots within a single ciphertext. This makes it natural for neural network inference, where the dominant operations are matrix multiplications and activation functions on floating-point values.
CKKS supports addition, multiplication, and rotation operations on encrypted vectors. A matrix-vector multiplication can be decomposed into a series of rotations, multiplications, and additions on CKKS ciphertexts. Activation functions like ReLU, which are not polynomial, are approximated by low-degree polynomials that can be evaluated homomorphically. The approximation introduces controlled error, but for inference tasks where the model's own floating-point arithmetic already introduces rounding, the CKKS approximation error is within the model's existing noise budget.
The SIMD structure of CKKS is particularly valuable for batching. A single ciphertext with N/2 slots can encode N/2 independent data values. Operations on the ciphertext apply to all slots simultaneously. This means that a single encrypted matrix multiplication processes multiple data points in parallel, amortizing the cost of FHE across the batch.
BFV: Decision Trees and Scoring
The BFV scheme, developed by Brakerski, Fan, and Vercauteren, supports exact integer arithmetic on encrypted data. Unlike CKKS, BFV does not introduce approximation error. The result of a homomorphic computation on BFV ciphertexts decrypts to the exact integer result of the same computation on the plaintexts.
This exactness makes BFV suitable for workloads where precision matters: credit scoring models, rule-based classification, threshold comparisons, and decision tree traversal. A decision tree node compares an input feature against a threshold. This comparison can be expressed as an integer arithmetic operation on BFV ciphertexts. The tree traversal becomes a series of encrypted comparisons and multiplexer operations, producing an encrypted classification result.
H33's BFV implementation operates at production scale. The inner product benchmark, which is the dominant operation in scoring and classification workloads, processes 32 users per ciphertext in 943 microseconds. At sustained throughput, H33 achieves 2,293,766 authentications per second on Graviton4 hardware. This is not a research prototype. It is a production system processing real workloads.
TFHE: Boolean Classification
The TFHE scheme, developed by Chillotti, Gama, Georgieva, and Izabachene, operates at the bit level. Each ciphertext encrypts a single bit, and the homomorphic operations are boolean gates: AND, OR, XOR, NOT. This makes TFHE suitable for binary classification, threshold functions, and any workload that can be expressed as a boolean circuit.
The advantage of TFHE is that boolean gates do not accumulate noise the way polynomial arithmetic does in BFV and CKKS. Each gate includes a bootstrapping step that refreshes the ciphertext, keeping the noise level constant regardless of circuit depth. This means TFHE can evaluate arbitrarily deep circuits without the noise management complexity of the other schemes.
H33's TFHE implementation on Graviton4 achieves 768 TPS for 16-bit equality checks and 372 TPS for 16-bit greater-than comparisons. The 96-channel parallel implementation processes multiple boolean circuits simultaneously, exploiting the wide SIMD capabilities of the ARM Neoverse V2 cores.
| FHE Scheme | Arithmetic | Use Case | H33 Performance |
|---|---|---|---|
| CKKS | Approximate (floating-point) | Neural net inference, embeddings | SIMD batched, per-slot ops |
| BFV | Exact (integer) | Scoring, decision trees, auth | 2,293,766 auth/sec sustained |
| TFHE | Boolean (bit-level) | Binary classification, thresholds | 768 TPS (16-bit EQ, Graviton4) |
Why "Encrypted at Rest" and "Encrypted in Transit" Are Insufficient
The standard security triad of encryption at rest, encryption in transit, and access controls has served the industry well for decades. Data is encrypted on disk with AES-256. Data is encrypted on the network with TLS 1.3. Access to decrypted data is controlled by IAM policies, RBAC, and audit logging. This model works when the computation can be trusted.
AI inference breaks this model because the computation occurs at the vendor. The vendor decrypts the data from its at-rest encryption to load it into GPU memory. The vendor decrypts the data from its in-transit encryption when the API request arrives. During inference, the data is plaintext. The "encrypted at rest" and "encrypted in transit" guarantees have a gap in the middle, and that gap is where the computation happens.
This gap is not theoretical. It is architectural. Every AI inference pipeline in production today has this gap unless it uses FHE. The gap exists whether you deploy on-premises, in a VPC, through a private endpoint, or inside a TEE. The only variation is who can access the plaintext during the gap and how difficult it is for them to do so.
With a public API, the vendor's entire infrastructure can access the plaintext. With a VPC deployment, your administrators can access the plaintext. With a TEE, an attacker with physical access or a side-channel exploit can access the plaintext. With FHE, nobody can access the plaintext because nobody has the decryption key except you.
The progression is clear. Public API: widest exposure. VPC: narrower exposure. TEE: narrowest exposure with documented exceptions. FHE: zero exposure. Each step reduces the trust surface. Only FHE eliminates it entirely.
H33-74: Every Inference Attested
Encryption without attestation is incomplete. If you encrypt data and send it to a compute provider, you need assurance that the provider actually performed the computation you requested on the ciphertext you sent, and that the result you received corresponds to that computation. Without attestation, a malicious provider could return a fabricated result without performing the computation.
H33-74 provides post-quantum attestation for every inference. Each operation generates a 74-byte cryptographic attestation that binds the input ciphertext, the computation performed, and the output ciphertext. The attestation is signed with three independent post-quantum signature schemes based on three independent mathematical hardness assumptions. Forging an attestation requires breaking all three simultaneously: lattice problems, structured-lattice problems, and hash-based constructions.
The 74-byte size is not arbitrary. It is the result of aggressive compression: 32 bytes stored on-chain, 42 bytes in Cachee. This compression represents 285x reduction from the raw signature material while preserving full cryptographic binding. Every attestation is independently verifiable. The verification operation completes in 71 microseconds.
This attestation layer transforms FHE from a privacy tool into a compliance tool. Auditors do not just want to know that data was encrypted. They want cryptographic proof that a specific computation was performed on specific encrypted inputs at a specific time. H33-74 provides exactly that proof, for every operation, without exception.
The Performance Question
The historical objection to FHE has been performance. Early FHE implementations were millions of times slower than plaintext computation. This objection is outdated. Modern FHE implementations with hardware-optimized polynomial arithmetic, SIMD batching, and algorithmic improvements have reduced the overhead to the point where production deployment is viable.
H33 achieves 2,293,766 authentications per second sustained on Graviton4 metal hardware. The per-operation latency is 42 microseconds for a full pipeline that includes FHE encryption, homomorphic computation, batch attestation with post-quantum signatures, and cached ZKP verification. The FHE batch processes 32 users per ciphertext in 943 microseconds. The batch attestation completes in 391 microseconds. The ZKP cached lookup completes in 0.358 microseconds.
These are not benchmarks on specialized hardware. Graviton4 is a commercially available ARM processor from AWS. The c8g.metal-48xl instance provides 192 vCPUs at approximately $2.30 per hour on-demand. The per-authentication cost is approximately $3.8 times ten to the negative tenth power. At this cost structure, FHE encryption for every AI inference is economically viable for any enterprise workload.
The performance gap between FHE and plaintext computation still exists, but it is no longer the orders-of-magnitude penalty that made FHE impractical in 2015. For workloads where data privacy is a requirement rather than a preference, the question is not "can we afford FHE?" The question is "can we afford not to use it?"
Data Isolation That Actually Isolates
The term "data isolation" has been diluted by vendors who use it to describe network segmentation, access controls, and deployment topologies. These are valuable security measures. They are not data isolation. Data isolation means that the data is inaccessible to everyone except its owner, including the compute infrastructure, including the administrators of that infrastructure, including a nation-state adversary who compromises that infrastructure.
Network isolation means an attacker cannot reach the server. Data isolation means that even if an attacker reaches the server, breaches every perimeter, compromises every process, and dumps every byte of memory, they obtain nothing useful. They obtain ciphertext. They obtain polynomial coefficients in a lattice that are computationally indistinguishable from random noise. They obtain exactly what they would obtain by generating random bytes.
That is isolation. Not a wall that can be climbed. Not a moat that can be drained. Not a hardware enclave that can be side-channeled. Mathematical isolation, where the gap between the attacker's position and the plaintext is a computational problem that the attacker cannot solve, not with classical computers, not with quantum computers, not with any foreseeable technology.
H33 delivers this isolation at production scale, with post-quantum attestation for every operation, across three FHE schemes matched to your workload. The data enters encrypted. It stays encrypted through every computation. It exits encrypted. The model never sees the plaintext. The infrastructure never sees the plaintext. Nobody sees the plaintext except you.
That is what data isolation for AI actually means. Everything else is a firewall.
Encrypt Through the Model
See how H33 delivers true data isolation for AI inference with Fully Homomorphic Encryption. Not network isolation. Not hardware isolation. Mathematical isolation at 2,293,766 operations per second.
Schedule a Demo