BenchmarksStack RankingAPIsPricingDocsWhite PaperTokenBlogAbout
Log InGet API Key
Engineering NIST · 12 min read

FIPS 204 (ML-DSA/Dilithium):
From Zero to Production

CRYSTALS-Dilithium, now standardized as ML-DSA under FIPS 204, is the post-quantum replacement for RSA and ECDSA digital signatures. This guide covers the algorithm internals, all three parameter sets, production performance numbers, implementation pitfalls, and how H33 deploys Dilithium at 2.17 million operations per second.

What ML-DSA Replaces

Every digital signature in production today relies on one of three algorithms: RSA (factoring hardness), ECDSA (elliptic curve discrete logarithm), or Ed25519 (twisted Edwards curve discrete logarithm). All three are broken by Shor's algorithm running on a cryptographically relevant quantum computer (CRQC). The timeline to a CRQC is debated, but the consensus among cryptographers is that it falls within the useful lifetime of systems being deployed today. Signatures applied to 30-year mortgage documents, medical records, or government certificates must remain valid for decades. If the signing algorithm is broken during that period, the signatures become forgeable retroactively.

ML-DSA (Module Lattice-Based Digital Signature Algorithm), standardized as FIPS 204 in August 2024, is NIST's primary recommendation for post-quantum digital signatures. It was previously known as CRYSTALS-Dilithium during the NIST post-quantum competition. The algorithm's security is based on the hardness of two lattice problems: Module Learning With Errors (MLWE) and Module Short Integer Solution (MSIS). No known quantum algorithm efficiently solves either problem.

How Module-LWE Digital Signatures Work

At the highest level, ML-DSA signatures work by proving knowledge of a short secret vector without revealing it. The signer holds a secret key consisting of short polynomial vectors. The public key is a compressed representation of the product of these vectors with a public matrix. To sign a message, the signer generates a random masking vector, computes a commitment, derives a challenge hash from the commitment and the message, and then produces a response that combines the masking vector with the secret key scaled by the challenge.

The critical property is rejection sampling. If the response reveals too much information about the secret key (specifically, if its coefficients are too large), the signer discards the attempt and tries again with a new random masking vector. This rejection loop ensures that the distribution of valid signatures is independent of the secret key. On average, ML-DSA-65 requires approximately 4-5 attempts before producing a valid signature, though each attempt is computationally inexpensive.

Verification is straightforward: the verifier reconstructs the commitment from the signature and public key, recomputes the challenge hash, and checks that the response coefficients are within the required bounds. Verification is deterministic and always succeeds or fails in a single pass, which is why verify is consistently faster than sign.

Parameter Sets: ML-DSA-44, ML-DSA-65, ML-DSA-87

FIPS 204 defines three parameter sets targeting NIST security levels 2, 3, and 5. The naming convention (44, 65, 87) refers to the dimensions of the module lattice matrices: (k, l) pairs of (4,4), (6,5), and (8,7) respectively. Higher dimensions mean larger keys, larger signatures, and more computation, but stronger security guarantees.

Parameter ML-DSA-44 ML-DSA-65 ML-DSA-87 RSA-2048 ECDSA-P256
NIST Level 2 3 5 ~1 ~2
Public Key 1,312 B 1,952 B 2,592 B 256 B 64 B
Signature Size 2,420 B 3,309 B 4,627 B 256 B 64 B
Secret Key 2,560 B 4,032 B 4,896 B ~2,048 B 32 B
Sign Time ~120 µs ~180 µs ~280 µs ~1,500 µs ~70 µs
Verify Time ~45 µs ~75 µs ~110 µs ~35 µs ~90 µs
Quantum Safe Yes Yes Yes No No
Performance Context

ML-DSA-44 sign is roughly 2x slower than ECDSA-P256 sign but 12x faster than RSA-2048 sign. ML-DSA-65 verify is comparable to ECDSA-P256 verify. The performance cost of post-quantum signatures is modest and dominated by signature size, not computation time.

Implementation Considerations

Deterministic vs. Hedged Signing

FIPS 204 supports two signing modes. Deterministic signing derives all randomness from the secret key and the message via a PRF. Given the same key and message, it always produces the same signature. This eliminates the risk of nonce-reuse attacks (which famously broke Sony's PS3 ECDSA implementation) and makes testing reproducible.

Hedged signing mixes additional random bytes from an RNG into the PRF input. This provides defense-in-depth against fault injection attacks: even if an adversary can induce identical internal states through voltage glitching, the RNG contribution prevents repeated nonces. NIST recommends hedged signing for environments where physical attacks are plausible (smart cards, HSMs, embedded devices). For server-side signing in controlled environments, deterministic mode is typically sufficient and simplifies testing.

Key Serialization

ML-DSA keys use a packed binary format defined in the FIPS 204 specification. The secret key includes the seed (rho), the signing key (K), the hash of the public key (tr), and the polynomial vectors s1, s2, and t0 in a compressed representation. The public key contains rho and the high-order bits of the polynomial vector t. Implementors must follow the exact bit-packing specified in the standard; deviations will produce interoperability failures.

Signature Encoding

Signatures encode the challenge hash (c_tilde), the response vector (z), and hint bits (h) in a deterministic packed format. The hint vector uses a compressed encoding that stores only the positions of non-zero entries, which is why ML-DSA-87 signatures vary slightly in size depending on the hint weight. In practice, implementations allocate the maximum size (4,627 bytes for ML-DSA-87) and treat signatures as fixed-length.

Constant-Time Implementation

Like all signature algorithms, ML-DSA implementations must be constant-time to prevent timing side-channel attacks. The rejection sampling loop is the primary concern: the number of iterations before a valid signature is produced must not leak information about the secret key. In practice, this means the rejection check and the loop counter must not influence observable timing. H33's implementation uses fixed-iteration loops with conditional moves (CMOV) to ensure that the execution path is identical regardless of the rejection count.

How H33 Uses Dilithium in Production

H33's production pipeline uses ML-DSA-87 (Dilithium-5, NIST Level 5) for batch attestation. Rather than signing each authentication individually, we apply a single Dilithium signature to an entire batch of 32 users processed through our BFV FHE engine. This batch attestation model delivers enormous efficiency gains.

The FHE engine processes 32 biometric templates simultaneously using SIMD batching (4096 polynomial slots divided by 128 feature dimensions equals 32 users per ciphertext). After the encrypted computation completes and the STARK zero-knowledge proof verifies the result, a single Dilithium-5 signature attests to the entire batch. The total sign + verify time is 291 microseconds per batch, which amortizes to approximately 9.1 microseconds per user for the signature component. Compare this to signing each authentication individually, which would cost 280 microseconds times 32, or 8,960 microseconds per batch. Batch attestation delivers a 31x reduction in signature overhead.

Why ML-DSA-87 (Not ML-DSA-44)

H33 uses the highest security level (NIST L5) despite the computational overhead because the signature is applied once per 32-user batch, not once per user. The marginal cost of ML-DSA-87 over ML-DSA-44 is approximately 160 microseconds per batch, which is 5 microseconds per user. At that cost, there is no engineering justification for using a lower security level.

Nested Hybrid Signatures: Algorithm-Family Independence

Beyond batch attestation, H33 offers nested hybrid signatures that chain Ed25519, Dilithium-5, and FALCON-512 into a sequential commitment chain. The outer signature commits to the inner signatures, creating a structure where an adversary must break all three algorithms from three independent mathematical families (elliptic curves, MLWE lattices, and NTRU lattices) to forge a signature.

This is the H33-3-Key product. The full triple-sign completes in approximately 450 microseconds and triple-verify in approximately 240 microseconds. The combined signature is approximately 5,390 bytes. For applications where a single signature algorithm represents unacceptable concentration risk, nested hybrid signatures provide cryptographic diversification at sub-millisecond latency.

Comparison to Other Post-Quantum Signature Schemes

FIPS 204 (ML-DSA) is not the only post-quantum signature standard. FIPS 205 (SLH-DSA, formerly SPHINCS+) provides hash-based signatures that rely only on the security of the underlying hash function. SLH-DSA signatures are conservative and well-understood, but they are significantly larger (7,856 to 49,856 bytes depending on parameter set) and slower to generate. FIPS 206 (FN-DSA, formerly FALCON) provides NTRU-lattice-based signatures with the smallest signature sizes (666 bytes for FALCON-512) but requires complex floating-point sampling during key generation and signing.

For most applications, ML-DSA is the right default. It offers the best balance of performance, key size, signature size, and implementation simplicity. Its rejection sampling is straightforward to implement correctly. It does not require floating-point arithmetic. It is the most extensively analyzed of the three standards, having been the primary target of cryptanalytic research throughout the NIST competition.

Getting Started with FIPS 204

For teams implementing ML-DSA directly, the reference implementation is available from NIST and multiple optimized implementations exist in every major language. In Rust, the ml-dsa crate provides a pure-Rust implementation. In Go, the standard library includes experimental support. OpenSSL 3.2+ includes a provider for ML-DSA.

For teams that want production-grade ML-DSA without managing the cryptographic implementation, H33's API exposes Dilithium signing and verification as a single endpoint. You send the data; we sign it with ML-DSA-87, return the signature, and handle key management, rotation, and side-channel resistance. The free tier includes enough credits for development and testing. See the documentation for integration examples in Python, Node.js, Go, and Rust.