Encryption is the foundation of data security. Every organization with sensitive data encrypts it—AES-256 for storage, TLS 1.3 for network connections, KMS for key management. These are solved problems and they work. Data sitting on a disk or moving across a network is cryptographically protected. An attacker intercepting encrypted data at rest or in transit gets nothing useful.
But AI does not operate on data at rest or in transit. AI operates on data in use—loaded into memory, fed into model weights, transformed through layers of computation. And every encryption scheme deployed in production today requires decryption before that computation can happen. The data must be plaintext for the model to process it. That requirement creates the exposure window that encryption was supposed to prevent.
The Encryption Gap
Think of data protection as a chain with three links:
| Data State | Standard Encryption | Where It Exists | Risk Level |
|---|---|---|---|
| At Rest | AES-256, dm-crypt, LUKS | Disk, database, object storage | Protected |
| In Transit | TLS 1.3, mTLS, WireGuard | Network, API calls, replication | Protected |
| In Use | None (plaintext required) | GPU VRAM, CPU cache, RAM, logs | Exposed |
The third link is broken. Data is unprotected precisely when AI is processing it. This is not a configuration error or a deployment mistake. It is a fundamental limitation of traditional encryption: the mathematical operations that AES and TLS perform are not compatible with the mathematical operations that AI models need. You cannot run a matrix multiplication on AES ciphertext and get a meaningful result.
Where Plaintext Leaks in AI Systems
The exposure window during AI computation is not a single point of failure. Plaintext data spreads across multiple locations in the compute stack during every inference request.
GPU VRAM
Model inference loads input data directly into GPU memory. For a biometric matching request, the user's face embedding sits in VRAM as a plaintext vector. For a medical imaging model, the patient's scan exists unencrypted in GPU buffers. GPU memory is not cleared between allocations by default—a subsequent process may read residual data from a previous inference call.
KV Cache
Large language models maintain key-value caches that store intermediate representations of input data. These caches persist for the duration of a session and often longer (for efficiency). The KV cache for a single conversation can contain reconstructable representations of every input the user provided. In multi-tenant GPU environments, KV cache isolation failures have been demonstrated in research settings.
CPU Cache and System Memory
Data moves between GPU and CPU during preprocessing, postprocessing, and orchestration. Input validation, tokenization, output formatting—all happen on the CPU with plaintext data in RAM. Side-channel attacks (Spectre, Meltdown, and their successors) have demonstrated the ability to read data from CPU caches across process boundaries.
Observability and Logging
Production AI systems are instrumented. Request logs capture input payloads for debugging. Metrics pipelines record latency distributions with sample data. Error logs dump request context when failures occur. Tracing systems propagate request data across microservices. Every observability layer is a potential plaintext leak. A logging library that captures input tensors for debugging purposes has the same practical effect as a data breach—sensitive data stored in plaintext on a logging backend.
Adding more encryption at rest or in transit does not help. Encrypting the database with a stronger cipher, adding another TLS layer, or rotating keys more frequently—none of these address the fundamental gap. The data is decrypted at the compute layer because the compute layer requires plaintext. The solution must operate at the compute layer, not around it.
Why More Encryption Does Not Help
Organizations that recognize the in-use exposure gap often try to solve it with more of the same technology:
- Encrypt the logging backend. The logs still contain plaintext data—you have just encrypted the container holding the plaintext. Anyone with log access reads the data.
- Use column-level database encryption. The data is still decrypted into the application layer for AI processing. Column-level encryption protects the database administrator, not the AI pipeline.
- Add mTLS between every microservice. Data is encrypted on the wire but decrypted at every service boundary. Each service processes plaintext.
- Rotate keys more frequently. Key rotation limits the blast radius of a key compromise. It does nothing about data that is already decrypted in memory.
Each of these measures improves security at the storage or network layer. None of them address the compute layer. They are applying the right solution to the wrong problem.
FHE: The Missing Layer
Fully homomorphic encryption closes the gap by making decryption unnecessary for computation. FHE encodes data into polynomial rings where addition and multiplication on ciphertext correspond exactly to addition and multiplication on plaintext. The server computes on encrypted data and returns an encrypted result. The plaintext never exists outside the key holder's device.
This is not partial encryption. It is not tokenization with a lookup table. It is mathematically provable: the ciphertext is computationally indistinguishable from random noise to anyone without the decryption key, including the server performing the computation. A complete server compromise—root access, memory dumps, disk images—yields nothing.
With FHE, the encryption chain is complete:
| Data State | With FHE | Risk Level |
|---|---|---|
| At Rest | Encrypted (AES-256 + FHE ciphertext) | Protected |
| In Transit | Encrypted (TLS 1.3 + FHE ciphertext) | Protected |
| In Use | Encrypted (FHE computation on ciphertext) | Protected |
GPU VRAM, KV caches, CPU memory, and logs all contain only ciphertext. There is no plaintext to leak because plaintext never exists on the server.
H33: Production-Speed FHE
The historical objection to FHE has been performance. Early implementations were millions of times slower than plaintext computation. That objection no longer holds. H33's optimized BFV scheme processes encrypted operations at 38.5 microseconds each—fast enough for real-time inference at scale.
The full pipeline: FHE encryption, ZK-STARK proof of correct computation, and Dilithium post-quantum signature attestation. One API call. 2,172,518 operations per second sustained on a single AWS Graviton4 instance. No GPU. Per-operation cost below $0.000001.
Encryption is necessary for AI data protection. It is not sufficient. FHE makes it sufficient—by extending cryptographic protection to the one place traditional encryption cannot reach: the computation itself.
Standard encryption protects data everywhere except where AI processes it. FHE protects data everywhere including where AI processes it. That distinction is the difference between encrypting the container and encrypting the contents.