How does FHE allow AI to process encrypted data?

FHE works by encoding data into polynomial rings where addition and multiplication on ciphertext correspond exactly to addition and multiplication on plaintext. Because all computation can be decomposed into these two operations, any AI model — neural networks, decision trees, distance calculations — can run entirely on encrypted inputs and produce encrypted outputs that, when decrypted, match plaintext results.

Why was FHE historically too slow for production use?

Early FHE implementations required bootstrapping operations that took minutes per operation. The overhead came from noise management — each operation adds noise to the ciphertext, and bootstrapping resets it. Modern implementations like H33 use leveled FHE with optimizations like Montgomery NTT, Harvey lazy reduction, and SIMD batching to achieve microsecond-level latency without bootstrapping.

Can AI Work on Encrypted Data?

Q: Can AI work on encrypted data?

Yes. Fully homomorphic encryption (FHE) allows mathematical operations on ciphertext that produce results identical to operating on plaintext. AI models can run biometric matching, fraud detection, and compliance checks on data that remains encrypted throughout processing. H33 achieves this at 38.5 microseconds per operation with no decryption at any point.

This is not theoretical. AI systems have been processing encrypted data in production since 2025. The technology that makes it possible — fully homomorphic encryption — was once considered too slow for real-world use. That constraint no longer holds. Modern FHE implementations operate at microsecond latencies, making encrypted AI inference faster than most plaintext API round trips.

How FHE Enables AI on Encrypted Data

Fully homomorphic encryption works by encoding data into polynomial rings. In this mathematical space, addition and multiplication on ciphertext correspond exactly to addition and multiplication on plaintext. This property — called homomorphism — means any computation that can be expressed as a series of additions and multiplications can run on encrypted data.

Since every AI operation decomposes into these primitives, entire inference pipelines run without decryption. Neural network layers are matrix multiplications and additions. Distance calculations for biometric matching are inner products. Fraud scoring models are weighted sums. All of these work natively on ciphertext.

The result, when decrypted by the data owner, is mathematically identical to running the same model on plaintext. The server performing the computation never sees the raw data, never sees the raw result, and cannot extract information from the encrypted intermediaries.

Why FHE Was Historically Too Slow

When Craig Gentry published the first FHE scheme in 2009, a single encrypted multiplication took approximately 30 minutes. The core problem was noise: every operation on a ciphertext adds a small amount of random noise. After enough operations, the noise overwhelms the signal and decryption fails. Gentry's solution — bootstrapping — periodically resets the noise, but bootstrapping itself was computationally devastating.

By 2020, bootstrapping had dropped to seconds. By 2023, optimized leveled FHE schemes (which avoid bootstrapping entirely by carefully managing noise budgets) achieved millisecond latencies. But even milliseconds were too slow for high-throughput use cases like real-time authentication or streaming fraud detection.

How H33 Solved the Performance Problem

H33's BFV implementation applies six key optimizations that collectively reduce per-authentication latency from milliseconds to 38.5 microseconds:

Montgomery NTT. The Number Theoretic Transform converts polynomial multiplication from O(n²) to O(n log n). H33 stores NTT twiddle factors in Montgomery form, eliminating all modular division from the hot path. Every butterfly operation uses Montgomery reduction — a shift and multiply — instead of expensive division.

Harvey lazy reduction. Standard NTT requires reducing each intermediate value to [0, q) after every butterfly. Harvey's technique allows values to remain in [0, 2q) between stages, cutting reduction operations by half. This is safe because the final INTT performs a single reduction pass.

SIMD batching. BFV supports packing 32 independent user templates into a single ciphertext using CRT-based slot encoding. One encrypted inner product processes 32 authentications simultaneously. The batch completes in 1,232 microseconds — 38.5 microseconds per user.

NTT-domain persistence. Templates are stored in NTT form after enrollment, eliminating a forward NTT per multiply. The secret key stays in NTT form permanently. Only the final result is transformed back to coefficient domain.

Pre-computed delta*m. The encryption step computes delta times the plaintext message. H33 pre-computes this at enrollment time, removing a u128 multiplication from the encrypt hot loop.

Batch attestation. Instead of signing each authentication individually with Dilithium, H33 signs once per 32-user batch. This reduces post-quantum signature overhead by 31x.

What This Means in Practice

An organization sends encrypted biometric data to H33's API. The API runs a full FHE inner product, generates a ZK-STARK proof of correct computation, signs the batch with ML-DSA (Dilithium), and returns an encrypted result — all without the data ever existing in plaintext on H33's servers. The data owner decrypts locally to get the authentication decision.

This eliminates entire categories of risk. There is no plaintext in server memory to leak. There is no unencrypted data in logs. There are no decryption keys on the inference server. A complete server compromise yields only ciphertext that is computationally indistinguishable from random noise.

For regulated industries — healthcare under HIPAA, finance under PCI DSS, any organization under GDPR — this architecture satisfies data protection requirements by construction rather than by policy. The data is never exposed because the mathematics make exposure impossible.

Production Numbers

H33 sustains 2,172,518 authentications per second on a single AWS Graviton4 instance (c8g.metal-48xl, 192 vCPUs). Each authentication takes 38.5 microseconds end-to-end: 939 microseconds for the FHE batch (32 users), 291 microseconds for Dilithium attestation, and 0.059 microseconds for cached ZK-STARK lookup. Variance is ±0.71% over sustained 120-second runs.

The per-authentication cost is less than $0.000001. FHE-encrypted AI inference is not just technically viable — it is economically viable at any scale.

How FHE Enables AI on Encrypted Data

Why FHE Was Historically Too Slow

How H33 Solved the Performance Problem

What This Means in Practice

Production Numbers

Run AI on Encrypted Data Today

Related Articles