BenchmarksStack Ranking
APIsPricingDocsWhite PaperTokenBlogAboutSecurity Demo
Log InGet API Key
Performance · 5 min read

Key Pool Architecture:
104x Speedup for Post-Quantum Crypto

How H33 achieves 104x faster key generation through pre-computed key pools. From 36.6µs to 0.35µs per operation.

2.17M/s
Auth/sec
~42µs
Per Auth
96
CPU Cores
Graviton4
Platform

Post-quantum key generation is expensive. ML-KEM (Kyber) keygen takes 33.3µs. Dilithium keygen takes 36.6µs. For high-throughput applications processing millions of requests per second, these microseconds add up fast.

Our solution: pre-generated key pools. By generating keys in advance during idle time, we reduce effective keygen latency to sub-microsecond levels—a 79-104x improvement.

104x
Signature Pool Speedup
96x
Dilithium Pool Speedup
79x
ML-KEM Pool Speedup

The Problem with On-Demand Keygen

Consider a typical authentication flow:

  1. User initiates authentication
  2. Generate ephemeral Dilithium keypair (36.6µs)
  3. Sign challenge (45µs)
  4. Verify signature (36.9µs)
  5. Generate ML-KEM keypair for session (33.3µs)
  6. Establish encrypted channel (72.6µs)

Key generation alone accounts for 70µs—nearly 30% of a complete authentication. At scale, this becomes the bottleneck. In H33's production pipeline, every authentication must complete BFV fully homomorphic encryption over 128-dimensional biometric vectors, a ZKP lookup verification, and a Dilithium attestation signature—all in ~42µs per user. There is zero budget for on-demand keygen in that hot path. See our production benchmarks for the full throughput analysis.

Key Insight

Post-quantum key generation is dominated by lattice sampling: generating structured noise from centered binomial distributions, then applying Number Theoretic Transforms. These operations are CPU-intensive but independent of user identity—which means they can be pre-computed without any loss of security.

How Key Pools Work

The key insight is that cryptographic key generation doesn't need user-specific input. Keys are random—we can generate them in advance.

// Traditional approach: generate on demand
let keypair = dilithium::keygen();  // 36.6µs

// Pool approach: grab pre-generated key
let keypair = key_pool.acquire();   // 0.35µs

The acquire() call is a single atomic pop from a lock-free MPMC (multi-producer, multi-consumer) queue. No mutex contention, no allocation, no lattice sampling. The cost is bounded by a single CAS (compare-and-swap) instruction plus cache-line transfer—typically under 400 nanoseconds even under heavy contention across 96 Graviton4 cores.

Our pool architecture:

Pool Internals: Replenishment Strategy

A naive pool risks two failure modes: exhaustion under burst traffic, and wasted CPU cycles pre-generating keys that expire unused. H33 solves both with a watermark-based replenishment strategy.

struct KeyPool<K> {
    queue: Arc<ArrayQueue<K>>,  // lock-free bounded queue
    high_watermark: usize,       // start draining background threads
    low_watermark: usize,        // trigger urgent replenishment
    ttl: Duration,               // max key age before discard
}

// Background replenisher (per-core pinned thread)
fn replenish_loop(pool: &KeyPool<DilithiumKeypair>) {
    loop {
        if pool.queue.len() < pool.low_watermark {
            // Urgent: generate in tight loop
            while pool.queue.len() < pool.high_watermark {
                let kp = dilithium::keygen(); // 36.6µs each
                pool.queue.push(kp).ok();
            }
        }
        std::thread::park_timeout(Duration::from_millis(1));
    }
}

On a 96-core Graviton4 instance, H33 dedicates 4 cores to background key generation. Each core produces approximately 27,300 Dilithium keypairs per second. With a pool capacity of 50,000 keys per type and a low watermark at 10,000, the system can absorb traffic bursts of up to 109,000 keys/second for 370ms before any thread blocks—far longer than real-world burst durations at 2.17M auth/sec, where most authentications reuse session keys.

Benchmark Results

January 2026 benchmarks on AWS c8g.metal-48xl (AWS Graviton4, 96 cores):

Operation Direct Pool Speedup
ML-KEM Keygen 33.3 µs 0.42 µs 79x
Dilithium Keygen 36.6 µs 0.38 µs 96x
Signature Pool Keygen 36.6 µs 0.35 µs 104x

Throughput Impact

The throughput improvements are dramatic:

Operation Single Thread With Pool 64-Core Max
ML-KEM Keygen 30,030 ops/sec 2.38M ops/sec 152M ops/sec
Dilithium Keygen 27,322 ops/sec 2.63M ops/sec 168M ops/sec

152 Million Keys Per Second

With key pooling and 64 cores, H33 can generate 152 million ML-KEM keypairs per second. That's enough to handle the authentication needs of virtually any application at any scale.

Integration with the H33 Auth Pipeline

Key pools are not a standalone optimization—they are tightly integrated into H33's full authentication stack. Each authentication request flows through three stages: BFV fully homomorphic encryption (N=4096, batching 32 users per ciphertext), a ZKP STARK lookup for identity verification, and a Dilithium attestation signature. The total pipeline completes in ~42µs per user on Graviton4.

Without key pools, the Dilithium keygen for attestation would add 36.6µs—nearly doubling the per-auth latency. With pools, that cost drops to 0.35µs, keeping key generation below 1% of the total pipeline budget. This is what makes 2.17M authentications per second possible on a single c8g.metal-48xl instance.

Key Insight

Key pooling converts a latency-bound operation (36.6µs synchronous keygen) into a throughput-bound operation (background generation on idle cores). The hot path never waits for lattice sampling, NTT transforms, or CSPRNG draws. It simply pops a pre-built keypair off a lock-free queue.

Security Considerations

Pre-generating keys doesn't compromise security:

Additionally, pool keys carry a time-to-live (TTL). Any key that remains in the pool beyond its TTL window is discarded and replaced. This bounds the exposure window and ensures that even if an attacker could observe pool state at time t, keys generated at t − TTL are already gone. Combined with mlock() to prevent swap-to-disk and madvise(MADV_DONTDUMP) to exclude keys from core dumps, the pool's memory surface is hardened against both local and physical adversaries.

When to Use Key Pools

Key pools are most beneficial when:

For long-term identity keys that are generated once and used repeatedly, direct generation is fine—the 33-37µs overhead happens only once.

The pattern also generalizes beyond post-quantum primitives. Any cryptographic operation that is stateless and expensive—RSA keygen (orders of magnitude slower), elliptic curve point generation, even FHE parameter setup—benefits from the same pre-computation approach. The lock-free queue is the universal accelerator; the lattice math is just the most compelling use case because the per-key cost is high enough to matter at scale, but low enough that a small number of background threads can keep the pool saturated.

Experience 104x Faster Key Generation

Key pooling is enabled by default on all H33 API endpoints.

Get Started

Build With Post-Quantum Security

Enterprise-grade FHE, ZKP, and post-quantum cryptography. One API call. Sub-millisecond latency.

Get Free API Key → Read the Docs
Free tier · 10,000 API calls/month · No credit card required
Verify It Yourself