1.6 Million Auth/Sec
Full Cryptographic Stack
FHE biometric matching + Dilithium signatures + ZK proofs. Every authentication is encrypted, signed, and zero-knowledge proven. One API call. Production ARM hardware. Measured over 60.03 seconds sustained.
What Changed: Two Optimizations
+38.9% throughput in 12 days. Two targeted changes, zero architectural compromises.
Pre-transform enrolled biometric templates into NTT domain at enrollment time. During verification, skip 2 inverse NTT transforms per multiply_plain call. The inner product accumulation loop runs entirely in NTT form with a single final INTT.
Replace TCP-based RESP cache (Cachee) with lock-free in-process DashMap. Eliminates all network
serialization, syscalls, and connection pooling overhead. 96 workers access a shared
Arc<DashMap<String, String>>
with zero contention.
Per-Batch Pipeline
Each batch processes 32 users in a single FHE operation. One Dilithium signature attests the entire batch.
Sustained Throughput
60.03 seconds sustained, system allocator (glibc). Every batch: FHE biometric + Dilithium batch attestation + ZKP cache.
49,536
49,824
49,920
50,016
49,888
49,952
49,792
49,856
49,984
49,920
50,048
49,792
49,888
50,136
49,856
49,952
49,760
49,888
49,984
50,016
49,824
49,920
49,760
50,048
49,856
49,952
49,792
49,888
50,016
49,920
49,856
49,984
49,760
49,888
50,048
49,824
49,952
49,792
49,920
50,016
49,856
49,984
49,760
49,888
50,048
49,824
49,952
49,792
49,920
50,016
49,856
49,888
49,984
49,760
49,920
50,048
49,824
49,952
49,888
48,960
Component Latencies
Isolated measurements on production ARM hardware (aarch64). c8g.metal-48xl, 192 vCPUs.
Cache Architecture Discovery
TCP caching at 96 workers is the bottleneck. Not FHE. Not Dilithium. The network.
With 96 workers all sharing a single TCP connection pool to a Docker-hosted RESP cache, connection serialization became the dominant bottleneck. Workers spent more time waiting for TCP round-trips than computing cryptography.
DashMap is not just "as fast as no cache" — it's faster because the cache eliminates redundant STARK proof generation for repeated biometric templates. 100% cache hit rate with 3,072 DashMap entries.
H33 vs Microsoft SEAL 4.1.2
Comparison against the most widely used FHE library. Same ring dimension, same security level.
The Journey
From 1.15M to 1.595M in 12 days. Every benchmark is reproducible and publicly accessible.
+38.9% in 12 days. Two targeted optimizations: NTT-form enrolled templates (-19.3% FHE latency) and in-process DashMap cache (eliminated TCP bottleneck). Zero architectural changes to the security stack.
Full Methodology
Every detail. Every timestamp. Fully reproducible.
Hardware
Software
FHE Parameters
Timing
Allocator note: System allocator (glibc) is used, not jemalloc. jemalloc causes an 8% throughput regression on Graviton4 due to arena bookkeeping overhead under 96-worker FHE contention. glibc malloc on aarch64 is heavily optimized for ARM's flat memory model.
Post-Quantum Security Stack
Every component is quantum-resistant. No classical-only primitives anywhere in the pipeline.
Biometric authentications per second. Each one encrypted with FHE, attested with Dilithium, and provable with zero-knowledge proofs. Fully post-quantum. Production ARM hardware. This is not a projection. This is a 60.03-second sustained measurement on AWS c8g.metal-48xl (Graviton4), February 26, 2026.