The Encryption Spectrum
Encryption is not a single technology. It is a spectrum, and where you sit on that spectrum determines what you can and cannot do with protected data.
At one end: traditional encryption (AES-256, RSA, ChaCha20). These algorithms are fast, battle-tested, and ubiquitous. Every HTTPS connection, every encrypted hard drive, every VPN tunnel uses them. They protect data when it is stored (at rest) and when it is moving between systems (in transit). But the moment your application needs to actually process that data—run an AI model, match a biometric, score a transaction—you must decrypt it first. That plaintext window is where breaches happen.
At the other end: fully homomorphic encryption (FHE). FHE allows computation directly on ciphertext. The server never sees plaintext. The result, when decrypted by the key holder, is mathematically identical to what you would get from processing the raw data. No plaintext window. No exposure during computation. The problem has always been speed: historically, FHE was 10,000x to 1,000,000x slower than plaintext operations.
In the middle: H33's approach. Production-optimized FHE combined with ZK-STARK proofs and post-quantum Dilithium signatures. Not generic FHE. Not a research library. A purpose-built pipeline that processes encrypted data at 38.5 microseconds per operation on a single ARM CPU.
Traditional Encryption
Traditional encryption does exactly what it was designed to do: make data unreadable to anyone without the key. AES-256 encrypts a block in nanoseconds. TLS 1.3 secures your connection with sub-millisecond handshakes. These are solved problems.
The failure point is architectural, not algorithmic. Traditional encryption protects data at rest (stored on disk, in a database, on a backup tape) and in transit (moving between your browser and a server, between microservices, across a VPN). But it provides zero protection at the point of use.
When your AI model processes a medical image, the image is decrypted in GPU memory. When your fraud detection system scores a transaction, the transaction details exist in plaintext in RAM. When your biometric system matches a fingerprint, the template is exposed in memory. Every encrypted-at-rest database decrypts rows into memory for every query. The data is protected everywhere except where it is most valuable—during computation.
For many workloads, this tradeoff is acceptable. If computation happens in a trusted environment that you fully control—your own servers, inside a secure enclave, within a compliance boundary—the plaintext window is manageable. You audit the environment, restrict access, monitor memory.
But for AI workloads processing sensitive data at scale—biometrics, medical records, financial data, PII—the plaintext window is a liability. A compromised hypervisor, a rogue employee, a government subpoena, or a side-channel attack can expose data that was supposed to be protected. And biometric data, unlike passwords, cannot be rotated once compromised.
Homomorphic Encryption
Homomorphic encryption eliminates the plaintext window entirely. The server computes on encrypted data and returns an encrypted result. The plaintext never exists outside the key holder's control.
There are three types, and the distinction matters:
- Partially Homomorphic Encryption (PHE) — supports one operation. RSA supports multiplication. Paillier supports addition. Useful for specific protocols (e-voting, private set intersection) but cannot run general computation.
- Somewhat Homomorphic Encryption (SHE) — supports a limited number of operations before noise overwhelms the ciphertext. Useful for shallow circuits but cannot handle the depth required by AI inference.
- Fully Homomorphic Encryption (FHE) — supports arbitrary computation. Addition and multiplication of ciphertexts, unlimited depth (via bootstrapping or careful noise management). This is the only type that can run AI inference on encrypted data.
FHE is based on the Ring Learning With Errors (RLWE) problem—a lattice-based hard problem that is believed to be resistant to both classical and quantum computers. This means FHE is inherently post-quantum secure. You do not need to layer on additional quantum-resistant algorithms for the encryption itself.
Why Generic FHE Is Too Slow
The theoretical power of FHE has never been in question. The practical problem has always been performance.
Zama's TFHE bootstrap takes approximately 800 microseconds per gate on an NVIDIA H100 GPU—hardware that costs over $30,000 per card. Building a full AI inference pipeline on TFHE requires GPU clusters costing $200,000–$400,000, plus PhD-level cryptographers at $400,000–$600,000 per year to manage parameter selection, noise budgets, and circuit optimization.
Generic BFV and CKKS libraries (OpenFHE, Microsoft SEAL) operate at 4–7 milliseconds per homomorphic operation. A biometric matching pipeline that requires dozens of encrypted operations takes tens to hundreds of milliseconds per query. At that speed, running 2 million authentications per second requires thousands of servers.
The bottleneck in generic FHE is not the encryption itself. It is the Number Theoretic Transform (NTT)—the polynomial multiplication engine at the core of every lattice-based scheme. Generic libraries use textbook NTT implementations with modular division in every butterfly. Division is the single most expensive instruction on modern CPUs. Eliminating it is where the 20x–100x speedup lives.
H33's Approach: Optimized FHE at Production Speed
H33 does not ship a generic FHE library. H33 ships a production pipeline where every layer—encryption, proof generation, signature verification—has been optimized at the instruction level for a specific set of operations: encrypted biometric matching, identity verification, and AI inference on sensitive data.
The core optimizations:
- Montgomery NTT with Harvey lazy reduction — eliminates modular division from the hot path entirely. Butterfly values stay in [0, 2q) between stages, deferring reduction. Twiddle factors are stored in Montgomery form.
- NTT-domain fused inner products — a single final INTT instead of per-chunk transforms. Saves 2×M transforms per multiply (where M = number of moduli).
- Pre-NTT public keys at keygen — public key pk0 is stored in NTT form, eliminating a clone and forward NTT per encryption.
- SIMD batching — 32 users packed into a single ciphertext (4096 slots ÷ 128 dimensions). Batch verification is constant-time: ~1.04ms for 1–32 users.
- In-process ZKP caching — DashMap lookups at 0.059 microseconds, replacing TCP-serialized cache layers that caused 11x regressions.
The result: 38.5 microseconds per authentication. That is the full pipeline—FHE encryption, ZK-STARK proof verification, Dilithium signature attestation. No GPU. One ARM CPU (AWS Graviton4). 2,172,518 authentications per second sustained over 120 seconds with ±0.71% variance.
Comparison Table
| Capability | Traditional Encryption | Generic FHE | H33 Privacy-Preserving AI |
|---|---|---|---|
| Protects data at rest | Yes | Yes | Yes |
| Protects data in transit | Yes | Yes | Yes |
| Protects data in use | No | Yes | Yes |
| AI inference on encrypted data | No | Yes (slow) | Yes (38.5µs) |
| Requires GPU | No | Yes ($200K+) | No |
| Post-quantum secure | No | Depends on scheme | Yes (FIPS 203/204) |
| Production-ready | Yes | Limited | Yes (2.17M/sec) |
When to Use What
These are not competing technologies. They solve different problems at different layers of the stack.
Use traditional encryption when:
- Data is at rest or in transit and processing happens in fully trusted environments you control.
- You need disk encryption, TLS, database encryption-at-rest, or VPN tunnels.
- The compute environment is audited, access-controlled, and you accept the plaintext window risk.
Use generic FHE libraries (Zama, OpenFHE, SEAL) when:
- You are building custom cryptographic protocols for research or blockchain applications.
- You need fine-grained control over FHE parameters, noise budgets, and circuit design.
- Latency is not a hard constraint and you have GPU infrastructure available.
- Your team includes cryptographers who can tune parameters and manage bootstrapping.
Use H33 when:
- Production AI workloads must process data that can never be exposed—biometrics, medical records, financial PII, identity verification.
- You need sub-millisecond latency and millions of operations per second.
- Post-quantum security is a requirement (FIPS 203/204 compliance, NIST PQC standards).
- You cannot justify $200K+ GPU clusters and $500K/year cryptographer salaries for an encryption pipeline.
Traditional encryption protects the container. Homomorphic encryption protects the contents during processing. H33 makes that protection fast enough for production. The question is not "which one?" but "where in your stack does plaintext exposure create risk?"