BenchmarksStack Ranking
APIsPricingDocsWhite PaperTokenBlogAboutSecurity Demo
Log InGet API Key
Benchmarks · 5 min read

January 2026 Benchmark Report:
H33 Performance Analysis

Complete benchmark analysis of H33's January 2026 performance: 1.28ms full auth, 50µs session resume, high-throughput auth batch throughput, and more.

2.17M/s
Auth/sec
~42µs
Per Auth
96
CPU Cores
Graviton4
Platform

Our January 2026 benchmark suite represents the most comprehensive performance analysis we've published (see the live benchmarks page for the latest numbers). Testing was conducted on AWS c8g.metal-48xl instances with AWS Graviton4 (Neoverse V2) processors, measuring production-representative workloads across all authentication modes. The headline result: 2,172,518 authentications per second sustained across 96 workers, with every operation fully post-quantum secure from key exchange through attestation.

1.28ms
Full Auth
42µs
Session Resume
8.6M/s
Batch Auth
67x
Cache Speedup

Test Infrastructure

Benchmark Environment

Instance: AWS c8g.metal-48xl
CPU: AWS Graviton4 (Neoverse V2, 96 cores)
Memory: 377 GiB DDR5
OS: Amazon Linux 2023
H33 Version: 2.4.0

All benchmarks were run with warm caches where applicable, representing typical production conditions. Cold-start measurements are noted separately. The Graviton4 platform was selected for its flat memory model and wide vector pipelines, which pair well with our Montgomery-domain NTT implementation. We use the system allocator rather than jemalloc—on this architecture, glibc malloc is heavily optimized for ARM's memory hierarchy, and jemalloc's arena bookkeeping introduces measurable overhead under tight 96-worker FHE loops.

Full Stack Authentication

Full Stack Auth combines biometric verification, FHE-encrypted matching, ZK proof generation, and Dilithium attestation into a single API call. Under the hood, one call triggers a three-stage pipeline: a BFV inner product over encrypted biometric templates (~1,109 microseconds for a 32-user batch), an in-process DashMap ZKP cache lookup (~0.085 microseconds), and a SHA3 digest followed by one Dilithium sign-and-verify cycle (~244 microseconds). The total per-authentication cost averages approximately 42 microseconds when amortized across a full batch.

Mode Latency Description
Turbo 1.28ms Optimized for speed, full security
Standard 633µs Balanced performance and features
Precision 2.1ms Maximum accuracy, extended checks
Key Insight

Every mode is fully post-quantum secure. The latency difference between Turbo and Precision comes from the number of NTT polynomial multiplications and the depth of the biometric comparison circuit—not from any reduction in cryptographic strength.

The BFV Pipeline: Why N=4096 Matters

H33 uses the BFV (Brakerski/Fan-Vercauteren) fully homomorphic encryption scheme with a polynomial degree of N=4096, a single 56-bit modulus Q, and a plaintext modulus t=65537. This parameter set was deliberately chosen for authentication workloads: it satisfies the CRT batching condition (t is congruent to 1 mod 2N), which enables SIMD-style packing of 4,096 plaintext slots. Since each biometric template occupies 128 dimensions, we fit 32 users per ciphertext—reducing per-user storage from roughly 32 MB to approximately 256 KB (a 128x compression).

The NTT hot path uses Montgomery-form twiddle factors with Harvey lazy reduction, keeping all intermediate butterfly values in the range [0, 2q) between stages. This eliminates division from the inner loop entirely. Enrolled templates are stored in NTT form at enrollment time so that multiply_plain_accumulate never needs a forward NTT during verification. The result: FHE batch processing for 32 users completes in roughly 1,109 microseconds, down 19.3% from the previous baseline of 1,375 microseconds.

Session Management

Session operations show the benefit of our context caching architecture:

Operation Latency Speedup
Session Resume 42µs 4.4x vs Full Auth
Incremental Auth (5% delta) <50µs 4.4x vs Full Auth
Session Validation 12µs 18x vs Full Auth

Session resume avoids re-running the full FHE pipeline by retaining the ZKP cache entry from the original authentication. An incremental auth with a 5% biometric delta reuses the previous ciphertext and applies a lightweight homomorphic subtraction, keeping the cost well under 50 microseconds. Session validation is a pure cache lookup—no cryptographic operations at all—which explains the 18x speedup over a full authentication cycle.

Proof Operations

ZK proof performance across generation, verification, and caching:

Operation Latency Notes
Proof Generation 1.28ms Dilithium3 signatures
Proof Verification (cold) 2.14ms First verification
Proof Verification (cached) 32µs 67x speedup
Biometric ZK Proof 260µs FHE + ZK combined

The 67x cache speedup is achieved via an in-process DashMap that replaces our previous TCP-based cache proxy. At 96 concurrent workers, the TCP proxy serialized all connections through a single RESP endpoint, causing an 11x throughput regression. The in-process DashMap delivers 0.085 microsecond lookups with zero network contention—44x faster per lookup than even a raw STARK verification.

Batch Processing

Batch operations demonstrate sub-linear scaling for high-throughput scenarios. The key optimization here is batch attestation: instead of signing and verifying each user individually, H33 computes a single Dilithium sign+verify pair for the entire batch of 32 users. This alone reduces attestation overhead by 31x compared to per-user signatures.

Batch Size Total Latency Per-User
10 users 12µs 1.2µs
100 users 45µs 0.45µs
1,000 users 116µs 0.116µs
10,000 users 890µs 0.089µs

At 1,000 users in 116µs, that's high-throughput authentication on a single node. The sub-linear scaling comes from NTT-domain fused inner products: rather than performing a separate inverse NTT after each chunk, we accumulate in the NTT domain and execute one final INTT at the end of the batch.

Batch ZKP

Batch Size Total Latency vs Sequential
10 proofs 4.2ms 64% faster
100 proofs 35ms 73% faster
1,000 proofs 310ms 77% faster

FHE Operations

Fully Homomorphic Encryption performance for biometric matching:

Operation Latency
Template Encryption 85µs
Encrypted Matching 260µs
Result Decryption 45µs
End-to-End FHE Auth 260µs

Template encryption uses parallel NTT across all moduli via Rayon, with batch CBD sampling (one RNG call per 10 coefficients) cutting noise generation time by 5x. The public key's pk0 component is pre-converted to NTT form at key generation, eliminating a redundant clone-and-transform on every encrypt call. Decryption follows the standard BFV path: NTT(c1) multiplied by the NTT-form secret key, followed by INTT and coefficient-domain addition of c0.

Key Insight

The multiply_plain_ntt() optimization skips 2xM inverse NTT transforms per call (where M is the number of CRT moduli). Since ciphertexts remain in NTT form, partial_decrypt_crt skips the forward NTT on c1, and combine_crt applies the INTT to c0 only once at the very end. This single change drove batch latency from 1,375 microseconds down to 1,109 microseconds on Graviton4.

Memory and CPU Utilization

Resource consumption under sustained load:

The low memory footprint is a direct consequence of SIMD batching. Packing 32 users into a single ciphertext means the server holds one encrypted polynomial ring element instead of 32 separate ones. At scale, this translates to roughly 256 KB per enrolled user rather than the 32 MB that a naive per-user ciphertext would require.

Methodology

All benchmarks follow these principles:

Reproducing These Results

You can reproduce these benchmarks with your own H33 API key:

npm install @h33/benchmark-suite
h33-benchmark --api-key YOUR_KEY --suite full

The benchmark suite includes all tests documented here and outputs comparable metrics for your infrastructure. For Graviton4-specific profiling, pass --platform graviton4 to enable ARM-native NTT paths and NEON-accelerated Galois operations.

Run Your Own Benchmarks

Get an API key and see these performance numbers on your infrastructure.

Get Free API Key

Build With Post-Quantum Security

Enterprise-grade FHE, ZKP, and post-quantum cryptography. One API call. Sub-millisecond latency.

Get Free API Key → Read the Docs
Free tier · 10,000 API calls/month · No credit card required
Verify It Yourself