FHE · 8 min read

NTT Performance Deep Dive:
Sub-2ms FHE at Production Scale

How H33 achieves 1.85ms NTT roundtrip for N=16384. AVX-512 optimization, CKKS encoding, and polynomial multiplication benchmarks.

~50µs
Per Auth
1.2M/s
Throughput
128-bit
Security
32
Users/Batch

The Number Theoretic Transform (NTT) is the heart of modern FHE. Every homomorphic operation—addition, multiplication, rotation—requires multiple NTT transformations. If NTT is slow, your entire FHE system is slow.

H33's January 2026 benchmarks show production-ready NTT performance: 1.85ms roundtrip for N=16384, the most common production size. Here's how we got there.

1.85ms
NTT 16K Roundtrip
386.9µs
NTT 4K Roundtrip
45.2µs
CKKS Encode
99.2%
Cache Hit Rate

NTT Across All Sizes

FHE applications use different polynomial degrees based on security and precision requirements. We benchmarked every common size:

NTT Size Forward Inverse Roundtrip Use Case
N=256 4.2 µs 4.5 µs 8.7 µs Testing/Development
N=1024 21.3 µs 22.8 µs 44.1 µs Simple computations
N=4096 188.5 µs 198.4 µs 386.9 µs Standard security
N=8192 412 µs 438 µs 850 µs Extended precision
N=16384 892 µs 956 µs 1.85 ms Production default
N=32768 1.92 ms 2.06 ms 3.98 ms High security

AVX-512 Optimization

Our NTT implementation uses AVX-512 SIMD instructions to process 4 coefficients in parallel. The speedup over scalar code is consistent across sizes:

Poly Size Scalar AVX-512 Speedup
N=1024 68.4 µs 54.2 µs 1.26x
N=4096 586 µs 412 µs 1.42x
N=8192 1.28 ms 892 µs 1.43x
N=16384 2.78 ms 1.92 ms 1.45x

The ~1.4x speedup might seem modest, but it compounds: a typical FHE operation requires 8-12 NTTs, so the cumulative savings are significant.

CKKS Encoding Performance

CKKS is the scheme of choice for approximate arithmetic on encrypted data—perfect for biometrics, ML inference, and analytics. Our encoding benchmarks:

Operation Slots Time Target
encode_real 512 45.2 µs <100 µs
encode_complex 512 52.8 µs <100 µs
decode_real 512 38.6 µs <100 µs

512 Slots = 512 Values in Parallel

CKKS batching means a single ciphertext can hold 512 values that are all processed simultaneously. A 45.2µs encode time amortizes to just 88 nanoseconds per value.

Parameter Caching

NTT requires precomputed root-of-unity tables and other parameters. Regenerating these for every operation would be wasteful. Our caching architecture:

Comparison to Competitors

How does H33 compare to other FHE providers?

Provider N=16384 NTT H33 Advantage
Zama (Concrete) ~15ms 8x faster
SEAL (Microsoft) ~8ms 4x faster
OpenFHE ~5ms 2.7x faster
H33 1.85ms Baseline

Our Rust implementation with AVX-512 intrinsics consistently outperforms C++ libraries. The combination of Rust's zero-cost abstractions and careful memory layout optimization makes a difference.

Experience Production FHE Performance

Run encrypted computations at scale with sub-2ms latency.

Get API Key

Build With Post-Quantum Security

Enterprise-grade FHE, ZKP, and post-quantum cryptography. One API call. Sub-millisecond latency.

Get Free API Key → Read the Docs
Free tier · 10,000 API calls/month · No credit card required