Machine learning models often need access to sensitive data. FHE enables ML inference on encrypted data -- the model never sees the plaintext input, yet produces correct predictions. This opens new possibilities for privacy-preserving AI.
The Private ML Challenge
Traditional ML deployment creates privacy tensions:
- Cloud ML services see all your data
- Sensitive inputs (medical, financial, personal) are exposed
- Model providers may learn from your data — a growing concern for AI companies
- Regulatory constraints limit where data can be processed
FHE ML allows you to use powerful cloud models while keeping data completely private. The server performs arbitrary computation on ciphertexts, returns encrypted results, and at no point gains access to the underlying plaintext. This is not differential privacy or secure enclaves -- it is mathematically provable confidentiality rooted in the hardness of lattice problems.
How FHE ML Works
The process involves several steps:
FHE ML Inference Flow
1. Client encrypts input with their FHE key
2. Server receives encrypted input
3. Server evaluates ML model on encrypted data
4. Server returns encrypted prediction
5. Client decrypts to get prediction
The server performs real computation -- matrix multiplications, activations, etc. -- all on encrypted values. Because FHE supports both addition and multiplication on ciphertexts, any polynomial function can be evaluated homomorphically. Neural networks, which are fundamentally sequences of linear transforms and nonlinear activations, map naturally onto this paradigm.
Supported Operations
FHE supports operations needed for ML:
- Linear layers: Matrix multiplication via homomorphic operations
- Convolutions: Implemented as matrix operations
- Activations: Polynomial approximations of ReLU, sigmoid, etc.
- Pooling: Average pooling works directly; max pooling requires approximation
// Example: Encrypted linear layer
// W is plaintext weights, x is encrypted input
encrypted_output = W * encrypted_x + encrypted_b
// Polynomial ReLU approximation
// x^2 / 4 + x / 2 + 1/4 approximates ReLU for small x
encrypted_relu = a*encrypted_x^2 + b*encrypted_x + c
Every multiplication on ciphertexts increases noise. After enough multiplications, decryption fails. This is why multiplicative depth -- the longest chain of sequential multiplications in the computation graph -- is the central constraint when designing FHE-compatible neural networks. Techniques like bootstrapping can refresh noise, but at significant computational cost.
Model Architecture Considerations
Some architectures work better with FHE:
FHE-Friendly:
- Shallow networks (fewer multiplications)
- Polynomial activations
- Convolutional networks
- Square activations
FHE-Challenging:
- Very deep networks
- Attention mechanisms (expensive comparisons)
- Batch normalization (division issues)
- Sparse operations
The key insight: train your model in plaintext with FHE constraints in mind. Replace ReLU with low-degree polynomial activations during training, not after. Models trained this way lose less than 1% accuracy compared to their plaintext counterparts on standard benchmarks.
CKKS for ML
CKKS is the preferred scheme for ML because:
- Native approximate arithmetic suits ML's tolerance for imprecision
- Efficient rescaling after multiplications
- Real number encoding matches neural network weights
- Good vectorization for batched inference
CKKS encodes vectors of real (or complex) numbers into a single polynomial, then encrypts that polynomial. A single ciphertext can hold thousands of values simultaneously, and homomorphic operations apply element-wise across the entire packed vector. This SIMD batching is critical for performance -- without it, encrypted matrix multiplication would be orders of magnitude slower.
H33 uses BFV FHE for encrypted biometric authentication, packing 32 user templates into a single ciphertext via SIMD batching (4,096 slots / 128 dimensions). The result: 2,172,518 authentications per second on Graviton4 hardware at ~42 microseconds per auth -- proving that FHE-based inference is production-viable at scale.
BFV vs. CKKS: Choosing the Right Scheme
While CKKS dominates general-purpose ML inference, BFV excels when computation involves exact integer arithmetic -- classification labels, lookup indices, or biometric template matching. H33's authentication pipeline uses BFV with a single 56-bit modulus (N=4096, t=65537) specifically because biometric inner products are integer operations where approximate arithmetic would introduce unacceptable error in match/no-match decisions.
| Property | CKKS | BFV |
|---|---|---|
| Arithmetic type | Approximate (real/complex) | Exact (integer mod t) |
| Best for ML | Regression, CNNs, embeddings | Classification, template matching |
| Rescaling | Built-in (mod switch = rescale) | Not needed (integer domain) |
| Noise behavior | Absorbed into approximation error | Must stay below decryption threshold |
| H33 usage | H33-CKKS engine | H33-128/H33-256 biometric auth |
Performance Reality
FHE ML is slower than plaintext, but increasingly practical:
- Simple classifiers: Milliseconds
- CNNs on small images: Seconds
- Larger networks: Minutes
- With GPU acceleration: 10-100x faster
The overhead comes from two sources: ciphertext expansion (a single float becomes a polynomial thousands of elements long) and the cost of homomorphic multiplication (which involves NTT transforms, relinearization, and noise management). H33 mitigates both through aggressive optimization -- Montgomery NTT with Harvey lazy reduction eliminates division from the hot path, and NTT-domain persistence keeps intermediate results in transform space to avoid redundant forward/inverse transforms.
Latency Breakdown: What Matters Most
In practice, the performance bottleneck is almost always multiplicative operations, not additions. A single homomorphic multiply can be 100-1000x more expensive than a homomorphic add. This is why network depth (sequential multiplies) matters far more than width (parallel additions). Reducing a network from 10 layers to 5 layers roughly halves the encrypted inference time, while doubling the width of each layer adds negligible overhead.
Optimization Hierarchy for FHE ML
1. Reduce multiplicative depth -- biggest single win
2. Use SIMD batching -- amortize ciphertext overhead across thousands of values
3. Stay in NTT domain -- avoid redundant forward/inverse transforms
4. Batch attestation -- verify results once per batch, not per inference
5. Hardware-aware tuning -- match parameters to cache hierarchy
Securing the Pipeline: Beyond Encryption
Encrypting the data is only half the story. A production FHE ML system also needs to prove that the server actually ran the correct model on the encrypted input, rather than returning a fabricated result. This is where zero-knowledge proofs enter the picture.
H33's production stack pairs FHE computation with ZKP verification and post-quantum attestation in a single API call. After the FHE batch completes (~1,109 microseconds for 32 users), an in-process DashMap lookup verifies the proof in 0.085 microseconds, and a Dilithium signature attests the result in ~244 microseconds. The entire pipeline -- encrypt, compute, verify, attest -- completes in ~42 microseconds per authentication, fully post-quantum secure.
Real-World Applications
FHE ML is being used for:
- Medical diagnosis on encrypted patient data
- Financial fraud detection without exposing transactions
- Face recognition with encrypted templates (H33's specialty)
- Sentiment analysis on encrypted text
- Credit scoring without revealing personal financial history
- Genomic analysis where patient DNA never leaves encrypted form
The common thread across all these applications is the same: the data is too sensitive to expose, the computation is too valuable to forgo, and FHE resolves the tension between the two. As parameter selection improves and hardware catches up, the performance gap between encrypted and plaintext inference will continue to narrow.
FHE ML represents the future of privacy-preserving AI -- powerful models that respect data privacy by design.
Ready to Go Quantum-Secure?
Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.
Get Free API Key →