The Number Theoretic Transform (NTT) is the heart of modern FHE. Every homomorphic operation—addition, multiplication, rotation—requires multiple NTT transformations. If NTT is slow, your entire FHE system is slow.
H33's January 2026 benchmarks show production-ready NTT performance: 1.85ms roundtrip for N=16384, the most common production size. Here's how we got there.
NTT Across All Sizes
FHE applications use different polynomial degrees based on security and precision requirements. We benchmarked every common size:
| NTT Size | Forward | Inverse | Roundtrip | Use Case |
|---|---|---|---|---|
| N=256 | 4.2 µs | 4.5 µs | 8.7 µs | Testing/Development |
| N=1024 | 21.3 µs | 22.8 µs | 44.1 µs | Simple computations |
| N=4096 | 188.5 µs | 198.4 µs | 386.9 µs | Standard security |
| N=8192 | 412 µs | 438 µs | 850 µs | Extended precision |
| N=16384 | 892 µs | 956 µs | 1.85 ms | Production default |
| N=32768 | 1.92 ms | 2.06 ms | 3.98 ms | High security |
AVX-512 Optimization
Our NTT implementation uses AVX-512 SIMD instructions to process 4 coefficients in parallel. The speedup over scalar code is consistent across sizes:
| Poly Size | Scalar | AVX-512 | Speedup |
|---|---|---|---|
| N=1024 | 68.4 µs | 54.2 µs | 1.26x |
| N=4096 | 586 µs | 412 µs | 1.42x |
| N=8192 | 1.28 ms | 892 µs | 1.43x |
| N=16384 | 2.78 ms | 1.92 ms | 1.45x |
The ~1.4x speedup might seem modest, but it compounds: a typical FHE operation requires 8-12 NTTs, so the cumulative savings are significant.
CKKS Encoding Performance
CKKS is the scheme of choice for approximate arithmetic on encrypted data—perfect for biometrics, ML inference, and analytics. Our encoding benchmarks:
| Operation | Slots | Time | Target |
|---|---|---|---|
| encode_real | 512 | 45.2 µs | <100 µs |
| encode_complex | 512 | 52.8 µs | <100 µs |
| decode_real | 512 | 38.6 µs | <100 µs |
512 Slots = 512 Values in Parallel
CKKS batching means a single ciphertext can hold 512 values that are all processed simultaneously. A 45.2µs encode time amortizes to just 88 nanoseconds per value.
Parameter Caching
NTT requires precomputed root-of-unity tables and other parameters. Regenerating these for every operation would be wasteful. Our caching architecture:
- Pre-warmed sizes: 1024, 4096, 8192, 16384, 32768 are always hot
- Cache hit rate: 99.2% in production
- Redis backend: Parameters persist across process restarts
- Memory footprint: ~50MB for all common sizes
Comparison to Competitors
How does H33 compare to other FHE providers?
| Provider | N=16384 NTT | H33 Advantage |
|---|---|---|
| Zama (Concrete) | ~15ms | 8x faster |
| SEAL (Microsoft) | ~8ms | 4x faster |
| OpenFHE | ~5ms | 2.7x faster |
| H33 | 1.85ms | Baseline |
Our Rust implementation with AVX-512 intrinsics consistently outperforms C++ libraries. The combination of Rust's zero-cost abstractions and careful memory layout optimization makes a difference.
Experience Production FHE Performance
Run encrypted computations at scale with sub-2ms latency.
Get API Key