Private Machine Learning with FHE

Machine learning models often need access to sensitive data. FHE enables ML inference on encrypted data -- the model never sees the plaintext input, yet produces correct predictions. This opens new possibilities for privacy-preserving AI.

The Private ML Challenge

Traditional ML deployment creates privacy tensions:

Cloud ML services see all your data
Sensitive inputs (medical, financial, personal) are exposed
Model providers may learn from your data — a growing concern for AI companies
Regulatory constraints limit where data can be processed

FHE ML allows you to use powerful cloud models while keeping data completely private. The server performs arbitrary computation on ciphertexts, returns encrypted results, and at no point gains access to the underlying plaintext. This is not differential privacy or secure enclaves -- it is mathematically provable confidentiality rooted in the hardness of lattice problems.

How FHE ML Works

The process involves several steps:

FHE ML Inference Flow

1. Client encrypts input with their FHE key
2. Server receives encrypted input
3. Server evaluates ML model on encrypted data
4. Server returns encrypted prediction
5. Client decrypts to get prediction

The server performs real computation -- matrix multiplications, activations, etc. -- all on encrypted values. Because FHE supports both addition and multiplication on ciphertexts, any polynomial function can be evaluated homomorphically. Neural networks, which are fundamentally sequences of linear transforms and nonlinear activations, map naturally onto this paradigm.

Supported Operations

FHE supports operations needed for ML:

Linear layers: Matrix multiplication via homomorphic operations
Convolutions: Implemented as matrix operations
Activations: Polynomial approximations of ReLU, sigmoid, etc.
Pooling: Average pooling works directly; max pooling requires approximation

// Example: Encrypted linear layer
// W is plaintext weights, x is encrypted input
encrypted_output = W * encrypted_x + encrypted_b

// Polynomial ReLU approximation
// x^2 / 4 + x / 2 + 1/4 approximates ReLU for small x
encrypted_relu = a*encrypted_x^2 + b*encrypted_x + c

Every multiplication on ciphertexts increases noise. After enough multiplications, decryption fails. This is why multiplicative depth -- the longest chain of sequential multiplications in the computation graph -- is the central constraint when designing FHE-compatible neural networks. Techniques like bootstrapping can refresh noise, but at significant computational cost.

Model Architecture Considerations

Some architectures work better with FHE:

FHE-Friendly:

Shallow networks (fewer multiplications)
Polynomial activations
Convolutional networks
Square activations

FHE-Challenging:

Very deep networks
Attention mechanisms (expensive comparisons)
Batch normalization (division issues)
Sparse operations

The key insight: train your model in plaintext with FHE constraints in mind. Replace ReLU with low-degree polynomial activations during training, not after. Models trained this way lose less than 1% accuracy compared to their plaintext counterparts on standard benchmarks.

CKKS for ML

CKKS is the preferred scheme for ML because:

Native approximate arithmetic suits ML's tolerance for imprecision
Efficient rescaling after multiplications
Real number encoding matches neural network weights
Good vectorization for batched inference

CKKS encodes vectors of real (or complex) numbers into a single polynomial, then encrypts that polynomial. A single ciphertext can hold thousands of values simultaneously, and homomorphic operations apply element-wise across the entire packed vector. This SIMD batching is critical for performance -- without it, encrypted matrix multiplication would be orders of magnitude slower.

H33 in Production

H33 uses BFV FHE for encrypted biometric authentication, packing 32 user templates into a single ciphertext via SIMD batching (4,096 slots / 128 dimensions). The result: 2,172,518 authentications per second on Graviton4 hardware at ~42 microseconds per auth -- proving that FHE-based inference is production-viable at scale.

BFV vs. CKKS: Choosing the Right Scheme

While CKKS dominates general-purpose ML inference, BFV excels when computation involves exact integer arithmetic -- classification labels, lookup indices, or biometric template matching. H33's authentication pipeline uses BFV with a single 56-bit modulus (N=4096, t=65537) specifically because biometric inner products are integer operations where approximate arithmetic would introduce unacceptable error in match/no-match decisions.

Property	CKKS	BFV
Arithmetic type	Approximate (real/complex)	Exact (integer mod t)
Best for ML	Regression, CNNs, embeddings	Classification, template matching
Rescaling	Built-in (mod switch = rescale)	Not needed (integer domain)
Noise behavior	Absorbed into approximation error	Must stay below decryption threshold
H33 usage	H33-CKKS engine	H33-128/H33-256 biometric auth

Performance Reality

FHE ML is slower than plaintext, but increasingly practical:

Simple classifiers: Milliseconds
CNNs on small images: Seconds
Larger networks: Minutes
With GPU acceleration: 10-100x faster

The overhead comes from two sources: ciphertext expansion (a single float becomes a polynomial thousands of elements long) and the cost of homomorphic multiplication (which involves NTT transforms, relinearization, and noise management). H33 mitigates both through aggressive optimization -- Montgomery NTT with Harvey lazy reduction eliminates division from the hot path, and NTT-domain persistence keeps intermediate results in transform space to avoid redundant forward/inverse transforms.

Latency Breakdown: What Matters Most

In practice, the performance bottleneck is almost always multiplicative operations, not additions. A single homomorphic multiply can be 100-1000x more expensive than a homomorphic add. This is why network depth (sequential multiplies) matters far more than width (parallel additions). Reducing a network from 10 layers to 5 layers roughly halves the encrypted inference time, while doubling the width of each layer adds negligible overhead.

Optimization Hierarchy for FHE ML

1. Reduce multiplicative depth -- biggest single win
2. Use SIMD batching -- amortize ciphertext overhead across thousands of values
3. Stay in NTT domain -- avoid redundant forward/inverse transforms
4. Batch attestation -- verify results once per batch, not per inference
5. Hardware-aware tuning -- match parameters to cache hierarchy

Securing the Pipeline: Beyond Encryption

Encrypting the data is only half the story. A production FHE ML system also needs to prove that the server actually ran the correct model on the encrypted input, rather than returning a fabricated result. This is where zero-knowledge proofs enter the picture.

H33's production stack pairs FHE computation with ZKP verification and post-quantum attestation in a single API call. After the FHE batch completes (~1,109 microseconds for 32 users), an in-process DashMap lookup verifies the proof in 0.085 microseconds, and a Dilithium signature attests the result in ~244 microseconds. The entire pipeline -- encrypt, compute, verify, attest -- completes in ~42 microseconds per authentication, fully post-quantum secure.

Real-World Applications

FHE ML is being used for:

Medical diagnosis on encrypted patient data
Financial fraud detection without exposing transactions
Face recognition with encrypted templates (H33's specialty)
Sentiment analysis on encrypted text
Credit scoring without revealing personal financial history
Genomic analysis where patient DNA never leaves encrypted form

The common thread across all these applications is the same: the data is too sensitive to expose, the computation is too valuable to forgo, and FHE resolves the tension between the two. As parameter selection improves and hardware catches up, the performance gap between encrypted and plaintext inference will continue to narrow.

FHE ML represents the future of privacy-preserving AI -- powerful models that respect data privacy by design.

Ready to Go Quantum-Secure?

Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.

Get Free API Key →

Private Machine Learning with FHE:
Running AI on Encrypted Data

The Private ML Challenge

How FHE ML Works

FHE ML Inference Flow

Supported Operations

Model Architecture Considerations

CKKS for ML

BFV vs. CKKS: Choosing the Right Scheme

Performance Reality

Latency Breakdown: What Matters Most

Optimization Hierarchy for FHE ML

Securing the Pipeline: Beyond Encryption

Real-World Applications

Ready to Go Quantum-Secure?

Build With Post-Quantum Security

The Private ML Challenge

How FHE ML Works

FHE ML Inference Flow

Supported Operations

Model Architecture Considerations

CKKS for ML

BFV vs. CKKS: Choosing the Right Scheme

Performance Reality

Latency Breakdown: What Matters Most

Optimization Hierarchy for FHE ML

Securing the Pipeline: Beyond Encryption

Real-World Applications

Ready to Go Quantum-Secure?

Build With Post-Quantum Security

Related Articles