BenchmarksStack RankingAPIsPricingDocsWhite PaperTokenBlogAbout
Log InGet API Key
Engineering NIST · 9 min read

Post-Quantum Key Exchange
ML-KEM/Kyber in Production

ML-KEM (FIPS 203) is the NIST-standardized post-quantum key encapsulation mechanism that replaces ECDH and RSA key exchange. This is a deep technical dive into how it works, what the performance looks like in production, and what you need to know to implement it correctly.

Why Key Exchange Is the First Thing to Migrate

Of all the cryptographic primitives in your stack, key exchange is the most urgent to migrate to post-quantum algorithms. The reason is straightforward: key exchange is the only primitive vulnerable to the Harvest Now, Decrypt Later attack. Digital signatures can be forged only in real-time (an attacker needs a quantum computer at the moment of forgery). But key exchange sessions recorded today can be decrypted retroactively once a quantum computer is available.

This is why NIST prioritized ML-KEM (originally CRYSTALS-Kyber) as the first post-quantum standard to finalize. FIPS 203 was published in August 2024, and CNSA 2.0 mandates its adoption for National Security Systems by January 2027. The message from every standards body is the same: migrate key exchange first, migrate it now.

How Lattice-Based Key Encapsulation Works

ML-KEM is based on the Module Learning With Errors (MLWE) problem, which is a structured variant of the Learning With Errors (LWE) problem introduced by Oded Regev in 2005. The security assumption is that given a matrix A and a vector b = As + e (where s is a secret vector and e is a small error vector), it is computationally infeasible to recover s -- even with a quantum computer.

The "Module" in MLWE means the matrix entries are elements of a polynomial ring R_q = Z_q[X]/(X^n + 1), rather than individual integers. This provides a compact representation that keeps key sizes manageable while maintaining strong security guarantees. The ring structure enables efficient Number Theoretic Transform (NTT) operations for polynomial multiplication, which is where most of the computational work happens.

The key encapsulation flow works in three steps:

  1. KeyGen: Generate a random matrix A (from a seed), a secret vector s, and an error vector e. Compute the public key t = As + e. The public key is (A, t); the secret key is s.
  2. Encapsulate: The sender samples a random message m, derives randomness from H(m), and computes ciphertext components using the public key. The shared secret K is derived from m and H(ciphertext). The ciphertext is sent to the key holder.
  3. Decapsulate: The key holder uses their secret key to recover m from the ciphertext, re-derives the randomness, re-encapsulates, and checks that the result matches. If it matches, K is output as the shared secret. If not, a pseudorandom rejection value is returned (Fujisaki-Okamoto transform for CCA2 security).
Why not just Diffie-Hellman on a lattice? Unlike classical DH, there is no known efficient way to do a non-interactive key exchange on lattices. ML-KEM is a KEM (Key Encapsulation Mechanism), not a key agreement protocol. This means the flow is asymmetric: one party encapsulates, the other decapsulates. In TLS, this maps naturally to the client encapsulating with the server's public key.

Parameter Sets: ML-KEM-512, 768, and 1024

FIPS 203 defines three parameter sets, corresponding to NIST security levels 1, 3, and 5:

Parameter SetSecurity LevelClassical EquivalentPublic KeyCiphertextShared Secret
ML-KEM-512Level 1AES-128800 bytes768 bytes32 bytes
ML-KEM-768Level 3AES-1921,184 bytes1,088 bytes32 bytes
ML-KEM-1024Level 5AES-2561,568 bytes1,568 bytes32 bytes

For comparison, X25519 (the most common classical key exchange in TLS 1.3) uses 32-byte public keys and produces 32-byte shared secrets. The key size increase is significant -- ML-KEM-768 public keys are 37x larger than X25519. This has real implications for bandwidth, especially in high-frequency API environments or IoT deployments with constrained links.

H33 uses ML-KEM-768 as the default for all API key exchange operations, providing NIST Level 3 security (equivalent to AES-192 against classical attacks, and resistant to all known quantum algorithms). For customers requiring NIST Level 5 compliance (defense, intelligence, critical infrastructure), ML-KEM-1024 is available via configuration.

Performance: ML-KEM vs. ECDH in Production

One of the most common concerns about post-quantum migration is performance. The good news: ML-KEM is fast. In many benchmarks, it is actually faster than ECDH for the computational operations, though the larger key and ciphertext sizes add bandwidth overhead.

OperationX25519 (ECDH)ML-KEM-768Difference
Key Generation~50 us~30 usML-KEM 40% faster
Encapsulation / DH~120 us~40 usML-KEM 67% faster
Decapsulation~120 us~45 usML-KEM 63% faster
Public Key Size32 bytes1,184 bytes37x larger
Ciphertext Size32 bytes1,088 bytes34x larger
Combined Wire Overhead64 bytes2,272 bytes+2,208 bytes

The computational performance advantage of ML-KEM comes from the NTT-based polynomial arithmetic, which maps efficiently to modern CPU architectures. On ARM platforms like AWS Graviton4, the NTT operations benefit from the wide pipelines and high memory bandwidth. On x86_64, AVX2 and AVX-512 provide additional vectorization opportunities for the modular arithmetic.

The 6-14% TLS handshake overhead reported in recent large-scale studies (including Cloudflare's and Google's hybrid PQ deployments) comes primarily from the additional bytes on the wire, not from computational cost. On fast links (datacenter-to-datacenter, broadband), the overhead is at the lower end. On constrained links (mobile, satellite, IoT), the overhead can be higher due to the impact of additional round-trip bytes on congestion windows.

Hybrid Mode: ML-KEM + X25519 for the Transition Period

During the transition period, many deployments use hybrid key exchange that combines ML-KEM with X25519. The shared secret is derived from both the classical and post-quantum key exchanges, so the connection is secure as long as either algorithm remains unbroken. This provides a safety net: if a flaw is discovered in ML-KEM, the classical X25519 component still protects the session (against classical adversaries). If quantum computers arrive, the ML-KEM component protects against quantum attack.

TLS hybrid key exchange is defined in draft-ietf-tls-hybrid-design and is already deployed at scale by Cloudflare (X25519Kyber768Draft00) and Google (X25519Kyber768). Chrome and Firefox both support hybrid PQ key exchange in production builds.

H33 supports hybrid mode for TLS termination but recommends pure ML-KEM for API-to-API communication where both endpoints are under your control. The hybrid overhead is small (~2,300 additional bytes per handshake), but for high-frequency API calls with session resumption, eliminating the classical component simplifies the key schedule and reduces the attack surface.

Implementation Considerations

Key Sizes and Session Management

The larger key sizes in ML-KEM have implications beyond bandwidth. TLS session tickets and resumption tokens that embed key material will be larger. DNS-based certificate retrieval (DANE) and OCSP stapling payloads increase. If your infrastructure uses UDP-based protocols (QUIC, DTLS), the larger handshake may exceed typical MTU sizes and require fragmentation.

H33 mitigates these issues through aggressive session resumption. After the initial ML-KEM handshake establishes a shared secret, subsequent API calls within the session window use symmetric-key authenticated encryption (AES-256-GCM) derived from the PQ-established key. The PQ handshake cost is amortized across hundreds or thousands of API calls per session.

Constant-Time Implementation

ML-KEM implementations must be constant-time to prevent side-channel attacks. The Fujisaki-Okamoto transform in the decapsulation step is particularly sensitive: the comparison between the re-encapsulated ciphertext and the received ciphertext must not leak timing information, and the rejection path (returning a pseudorandom value instead of the real shared secret) must be indistinguishable from the success path in execution time.

H33's ML-KEM implementation is written in pure Rust with no external dependencies. All comparison operations use constant-time primitives (bitwise OR accumulation, not early-exit comparison). The rejection sampling for NTT coefficients uses rejection-free CBD (Centered Binomial Distribution) sampling with batch RNG calls -- one call per 10 coefficients, eliminating variable-time rejection loops.

Seed Expansion and Deterministic Key Generation

ML-KEM generates the public matrix A from a 32-byte seed using SHAKE-128 (an extendable output function from the SHA-3 family). This means the matrix does not need to be transmitted or stored -- both parties can regenerate it from the seed. This is a significant space optimization: the matrix A would otherwise be k*k*n*log2(q) bits (tens of kilobytes), but the seed is always 32 bytes.

However, the seed expansion is a non-trivial computational cost. In H33's production benchmarks on Graviton4, SHAKE-128 expansion accounts for approximately 15% of the total keygen time. We pre-expand and cache matrices for frequently-used parameter sets, reducing repeated keygen operations to a single CBD sampling plus NTT transform.

H33's Pure Rust ML-KEM Implementation

H33 implements ML-KEM natively, without wrapping OpenSSL, liboqs, or any external library. The implementation sits in our src/pqc/kyber.rs module alongside the hybrid key exchange logic in src/pqc/hybrid.rs. Key design decisions include:

The result is ML-KEM-768 encapsulation in under 40 microseconds on Graviton4, with the full key exchange adding less than 100 microseconds to an API call. At H33's production throughput of 2.17 million authentications per second, the ML-KEM overhead is invisible in the pipeline -- the BFV FHE operations dominate at 939 microseconds per 32-user batch.

Ready to implement post-quantum key exchange? H33 handles ML-KEM for you on every API call. No library integration, no parameter tuning, no constant-time verification. Read the API docs or start with the free tier.

Further Reading