ML-KEM and ML-DSA Implementation Guide

ML-KEM and ML-DSA
Implementation Guide

Related · tier-1 reading. For how to migrate before the NIST deadline and stay verifiable, see Post-Quantum.

NIST finalized FIPS 203 (ML-KEM) and FIPS 204 (ML-DSA) in August 2024. This is a practical guide to implementing these post-quantum algorithms covering parameter selection, key management, performance, and common pitfalls.

FIPS 203: ML-KEM Overview

ML-KEM (Module Lattice-based Key Encapsulation Mechanism), formerly CRYSTALS-Kyber, is NIST's standardized post-quantum key encapsulation mechanism replacing ECDH and RSA-KEM. Security relies on the Module Learning With Errors (MLWE) problem, resistant to both classical and quantum attacks.

ML-KEM Parameter Comparison

Parameter	ML-KEM-512	ML-KEM-768	ML-KEM-1024
Security Level	NIST Level 1	NIST Level 3	NIST Level 5
Public Key Size	800 bytes	1,184 bytes	1,568 bytes
Ciphertext Size	768 bytes	1,088 bytes	1,568 bytes
Shared Secret	32 bytes	32 bytes	32 bytes

For most applications, ML-KEM-768 (Level 3) provides the best balance of security and performance.

FIPS 204: ML-DSA Overview

ML-DSA (Module Lattice-based Digital Signature Algorithm), formerly CRYSTALS-Dilithium, replaces RSA, ECDSA, and Ed25519. It uses the "Fiat-Shamir with Aborts" paradigm for secure signature generation.

ML-DSA Parameter Comparison

Parameter	ML-DSA-44	ML-DSA-65	ML-DSA-87
Security Level	NIST Level 2	NIST Level 3	NIST Level 5
Public Key	1,312 bytes	1,952 bytes	2,592 bytes
Signature	2,420 bytes	3,293 bytes	4,595 bytes

Implementation Considerations

Constant-Time Implementation

Both ML-KEM and ML-DSA require constant-time implementations to prevent timing side-channel attacks. Critical operations that must be constant-time include polynomial multiplication, rejection sampling in ML-DSA, ciphertext comparison in ML-KEM decapsulation, and all private key operations.

Key Management

Post-quantum keys are larger than classical keys. An ML-DSA-65 public key is 1,952 bytes compared to 32 bytes for Ed25519. Plan for key rotation with clear expiration dates. Implement key versioning so old signatures can be verified with the correct key version. Consider hybrid schemes during transition.

Hybrid Deployment

During transition, combine classical and post-quantum algorithms. For key exchange, concatenate ECDH and ML-KEM shared secrets. For signatures, create nested signatures: sign with Ed25519, then sign the result with ML-DSA. H33 uses this approach with three signature families providing three independent hardness assumptions.

Performance Optimization

ML-KEM and ML-DSA performance is dominated by polynomial arithmetic over the ring Z_q[X]/(X^n + 1). Properly optimized implementations achieve signing speeds of tens of thousands of operations per second. On ARM processors, NEON SIMD accelerates polynomial operations. On x86, AVX2 and AVX-512 provide similar benefits.

Common Pitfalls

Weak random number generators. Both algorithms require high-quality randomness. Use the OS CSPRNG (getrandom, /dev/urandom).
Missing key validation. Always validate public keys before use. Malformed keys can cause undefined behavior.
Forgetting memory zeroization. Private keys must be zeroized after use. In Rust, use the zeroize crate with ZeroizeOnDrop.
Hardcoding algorithm parameters. Design with crypto agility. Use abstract interfaces for all cryptographic operations.

H33's Implementation

H33 uses ML-DSA-65 (FIPS 204) as one of three signature families in production, contributing to the 391-microsecond batch attestation latency for 32 authentications. ML-KEM-768 handles key exchange. All implementations are constant-time Rust with memory zeroization and hardware-specific ARM Graviton4 optimizations.

ML-KEM and ML-DSAImplementation Guide