April 20, 2026 Cryptography 14 min read

NIST FIPS 204: Post-Quantum Proof for Financial Systems

ML-DSA is finalized. FALCON-512 is finalized. SLH-DSA is finalized. Three independent mathematical hardness assumptions, one 74-byte attestation. Here is how H33 implements all three for financial infrastructure.

Eric Beans CEO, H33.ai

On August 13, 2024, NIST published FIPS 204 — the Federal Information Processing Standard for ML-DSA, the Module-Lattice-Based Digital Signature Algorithm. This was not a draft. Not a candidate. Not a "for comment" publication. It was a finalized federal standard with assigned parameter sets, reference implementations, and a clear mandate: this is what post-quantum digital signatures look like for systems that must comply with federal cryptographic requirements. Alongside it, NIST published FIPS 205 (SLH-DSA, the stateless hash-based signature scheme formerly known as SPHINCS+) and is finalizing FIPS 206 (FN-DSA, based on FALCON-512). Three standards. Three mathematical foundations. Three independent bets on which hardness assumptions survive future cryptanalysis.

For financial systems — banks, payment processors, clearinghouses, broker-dealers, insurance companies, any institution that processes transactions, makes compliance decisions, or maintains audit trails — these three standards represent the end of a long waiting period. The post-quantum transition is no longer a research problem. It is an implementation problem. The algorithms are specified. The parameters are fixed. The security margins are analyzed. What remains is building systems that use them correctly, at production scale, without breaking existing infrastructure.

This post is a deep technical dive into FIPS 204 specifically — what ML-DSA is, how it works, why it matters for financial systems — and how H33 implements it alongside FALCON-512 (FIPS 206) and SLH-DSA (FIPS 205) in a three-family construction that keeps the persistent audit trail footprint at 74 bytes despite the combined signature weight of approximately 21,000 bytes.

What ML-DSA is

ML-DSA (Module Learning with Errors Digital Signature Algorithm) is a digital signature scheme whose security is based on the hardness of two related problems over module lattices: the Module Learning With Errors (MLWE) problem and the Module Short Integer Solution (MSIS) problem. Both are believed to be hard for both classical and quantum computers. Unlike RSA (whose security rests on integer factoring) and ECDSA (whose security rests on the elliptic curve discrete logarithm problem), there is no known quantum algorithm that solves MLWE or MSIS in polynomial time.

The scheme was originally published as CRYSTALS-Dilithium during NIST's post-quantum competition. After three rounds of public cryptanalysis, parameter refinement, and implementation review, it was standardized as FIPS 204 under the name ML-DSA with three parameter sets: ML-DSA-44 (NIST Security Level 2), ML-DSA-65 (NIST Security Level 3), and ML-DSA-87 (NIST Security Level 5). H33 uses ML-DSA-65, which provides 128-bit post-quantum security — meaning a quantum adversary requires approximately 2^128 operations to forge a signature.

The concrete sizes for ML-DSA-65 are: public key 1,952 bytes, secret key 4,032 bytes, signature 3,293 bytes. These are larger than classical ECDSA (public key 33 bytes, signature 64 bytes) by roughly two orders of magnitude. This size increase is inherent to lattice-based cryptography and is the fundamental engineering challenge that any post-quantum signature deployment must address. We will return to how H33 handles this below.

How ML-DSA works (simplified)

At a high level, ML-DSA signing works through a rejection sampling process over structured lattice elements. The signer has a secret key consisting of small polynomial vectors (s1, s2) and a public key consisting of a matrix A and the vector t = As1 + s2. To sign a message m, the signer:

1. Generates a random masking vector y with coefficients bounded by a parameter gamma1.

2. Computes w = Ay and extracts the high-order bits of w to produce w1.

3. Computes a challenge hash c = H(tr || m || w1), where tr is a hash of the public key and H is the SHAKE-256 extendable output function.

4. Computes z = y + c*s1 (the response vector).

5. Checks whether z has coefficients that are too large (which would leak information about s1). If so, the signer rejects this attempt and starts over with a new y. This is the rejection sampling step — it ensures that the distribution of z does not depend on s1.

6. Also checks that the low-order bits of Az - c*t are small enough. If not, the signer rejects and retries.

7. If both checks pass, the signature is (z, c, h) where h encodes hint bits for the verifier.

Verification reconstructs w1 from the signature and public key, recomputes the challenge hash, and checks that it matches. The security argument is that without knowing s1, an adversary cannot produce a z that simultaneously satisfies the rejection bounds and produces the correct challenge hash — because doing so would require solving an MSIS instance, which is believed to be hard.

The rejection sampling step is why ML-DSA signing is not constant-time in iterations (though each iteration is constant-time internally). On average, signing requires approximately 4.25 iterations of the rejection loop for ML-DSA-65. Each iteration involves polynomial multiplication over the ring Z_q[X]/(X^n + 1) with n=256 and q=8380417, performed using the Number Theoretic Transform (NTT). This is computationally inexpensive on modern hardware — H33's implementation completes ML-DSA-65 signing in under 200 microseconds on ARM Graviton4.

Why financial systems need ML-DSA now

Financial systems have three properties that make the post-quantum transition urgent rather than merely important. First, they have long-lived records. Audit trails, compliance records, and transaction histories must be retained for seven to ten years or longer. Records signed with classical algorithms today must survive verification attempts within that window, and the most informed estimates place cryptographically relevant quantum computers within that timeframe. Second, they face sophisticated adversaries. Nation-state intelligence services, organized crime syndicates, and insider threats all target financial infrastructure. These adversaries have the resources and motivation to execute harvest-now-decrypt-later attacks — capturing signed records today and forging them once quantum computers arrive. Third, they face regulatory mandates. OMB M-23-02, CNSA 2.0, and sector-specific guidance from OCC, Fed, and FDIC all point toward mandatory post-quantum migration within the next four years.

ML-DSA (FIPS 204) is the primary NIST recommendation for general-purpose post-quantum digital signatures. It is the algorithm that federal agencies are being directed to adopt first. It is the algorithm with the most implementation experience, the most third-party analysis, and the broadest vendor support. For financial systems that need to begin the post-quantum transition today, ML-DSA is the starting point.

But it should not be the ending point. This is where H33's three-family approach diverges from a naive FIPS 204 deployment.

Why three families, not one

ML-DSA's security rests on the MLWE and MSIS problems over module lattices. These problems have been studied extensively, but not for nearly as long as integer factoring (which has been studied since at least Fermat, 400 years ago) or the discrete logarithm problem (studied intensively since Diffie-Hellman in 1976). Lattice problems in their modern formulation date to Ajtai's 1996 work — thirty years of cryptanalytic attention, compared to centuries for factoring and fifty years for discrete log.

Thirty years is substantial. But it is not long enough to be certain. Lattice cryptanalysis is an active field. The BKZ algorithm and its variants continue to improve. Sieving algorithms for the Shortest Vector Problem are being refined. It is possible — not likely, but possible — that a structural breakthrough against module lattices emerges within the next decade. If that happens, and your entire audit trail is signed with ML-DSA alone, every record becomes forgeable simultaneously.

H33's approach is to use ML-DSA as one of three signature families in the H33-74 substrate. The three families are:

1. ML-DSA-65 (FIPS 204) — Security based on MLWE and MSIS over module lattices. Public key: 1,952 bytes. Signature: 3,293 bytes.

2. FALCON-512 (FIPS 206) — Security based on the Short Integer Solution problem over NTRU lattices. Public key: 897 bytes. Signature: approximately 666 bytes. FALCON uses a different lattice structure than ML-DSA (NTRU lattices versus module lattices), meaning a breakthrough against one does not automatically imply a breakthrough against the other.

3. SLH-DSA-SHA2-128f (FIPS 205) — Security based entirely on the collision resistance and pre-image resistance of SHA-256. Public key: 32 bytes. Signature: 17,088 bytes. This is a hash-based signature scheme whose security assumption is strictly weaker than the other two — it relies only on hash function properties that have been studied for decades, not on any algebraic structure. If SHA-256 remains pre-image resistant (and there is no credible threat to this), SLH-DSA remains secure regardless of any advance in lattice cryptanalysis.

The three families rely on three independent hardness assumptions: MLWE lattices, NTRU lattices, and hash function pre-image resistance. A breakthrough against MLWE does not break FALCON (different lattice structure) and does not break SLH-DSA (no lattice at all). A breakthrough against NTRU lattices does not break ML-DSA (different lattice structure) and does not break SLH-DSA. Only a breakthrough against hash function pre-image resistance threatens SLH-DSA, and that would break essentially all of modern cryptography simultaneously.

This is assumption diversity. It is the cryptographic equivalent of portfolio diversification: spreading risk across independent mathematical bets so that a single catastrophic failure does not wipe out the entire portfolio. For financial institutions that understand risk management, this should be a natural fit. You would not put your entire balance sheet in a single asset class. You should not put your entire audit trail's integrity in a single mathematical assumption.

The size problem and the 74-byte solution

The combined size of the three-family signature bundle is approximately 21,047 bytes (3,293 + 666 + 17,088). For a single record, 21KB is manageable. For an audit trail with millions of records, 21KB per record represents terabytes of storage overhead. This is the practical objection to multi-family signatures, and it is legitimate.

H33-74 solves it architecturally. The full 21KB signature bundle exists ephemerally — it is produced during signing and consumed during verification, but what persists in the audit trail is a 74-byte substrate. The substrate contains:

A 32-byte SHA3-256 hash of the attested data (the content hash). A 1-byte version identifier. A 1-byte computation type identifier. An 8-byte millisecond timestamp. A 16-byte nonce. These form the 58-byte substrate proper. Plus a 42-byte compact receipt: 1-byte version, 1-byte family bitfield, a 32-byte SHA3-256 hash of the concatenated signature bundle, and an 8-byte timestamp. Total persistent footprint: 74 bytes.

The 32-byte commitment in the receipt binds the full 21KB signature bundle without carrying it. To verify, a verifier needs the substrate (74 bytes), the original data (to recompute the content hash), and the full signature bundle (21KB, retrieved from ephemeral storage or a verification service). The verifier checks: (1) the content hash matches the data, (2) each of the three signatures verifies against the corresponding public key on the substrate's canonical encoding, (3) the SHA3-256 hash of the concatenated signatures matches the receipt's commitment. If all three checks pass, the attestation is valid.

For financial systems, this means the audit database stores 74 bytes per record — not 21KB. The signature bundles can be stored in separate cold storage, in a verification service, or in any system optimized for large-blob storage that is consulted only when verification is needed. The hot path — the audit trail itself, the records that examiners browse and that compliance officers query — carries only the 74-byte substrates. The storage overhead is effectively zero.

Performance characteristics for financial workloads

Financial systems care about three performance metrics: signing latency (how long it takes to produce an attestation), verification latency (how long it takes to verify one), and throughput (how many attestations can be produced per second at sustained load).

H33's production numbers on ARM Graviton4 hardware: signing latency is dominated by SLH-DSA (the hash-based signature, which requires computing many hash chains), at approximately 391 microseconds for the full three-family batch attestation of 32 records. Per-record signing cost is approximately 12 microseconds when amortized across the batch. Verification of a single three-family attestation completes in under 1 millisecond. Sustained throughput is 2,216,488 attestations per second on a single ARM server running the full pipeline.

For context: Fedwire processes approximately 800,000 transfers per day. The Clearing House's RTP network processes approximately 1 million real-time payments per day. Visa processes approximately 65,000 transactions per second at peak. H33's attestation throughput exceeds all of these workloads by substantial margins on a single server. The performance overhead of post-quantum attestation is not a barrier to deployment in any financial system currently operating.

ML-DSA implementation details

H33's ML-DSA-65 implementation is written in Rust with architecture-specific optimizations for ARM (Graviton4, Apple Silicon) and x86-64 platforms. Key implementation choices:

NTT implementation: Polynomial multiplication uses the Number Theoretic Transform with Montgomery reduction. All twiddle factors are precomputed in Montgomery form, eliminating division from the hot path. The forward and inverse NTT implementations use radix-4 butterflies where the architecture supports it, with Harvey lazy reduction (values held in [0, 2q) between stages to defer modular reduction).

Rejection sampling: The rejection loop uses constant-time coefficient comparison to avoid timing side channels. The expected number of iterations is 4.25 for ML-DSA-65; in practice, H33's implementation completes signing in 1-8 iterations with the median at 4.

Batch optimization: For the H33-74 batch attestation pipeline, multiple substrates share a single ML-DSA signing operation via Merkle aggregation. The batch root is signed once, and individual substrates carry Merkle inclusion proofs. This amortizes the ML-DSA signing cost across the entire batch.

Key storage: ML-DSA-65 secret keys (4,032 bytes) are stored in encrypted form within the Trusted Execution Environment. Key material never exists in plaintext outside the TEE boundary. Key generation uses the deterministic key generation path specified in FIPS 204 Section 6.1, seeded from a hardware random number generator.

FALCON-512 implementation details

FALCON-512 is the more complex implementation of the three families. Its signing algorithm requires sampling from a discrete Gaussian distribution over NTRU lattices using a fast Fourier sampling tree (the "FALCON tree"). This is algorithmically intricate but produces the smallest signatures of the three families — approximately 666 bytes.

H33's FALCON-512 implementation handles the constant-time floating-point requirements that make FALCON implementation notoriously difficult. The sampling tree uses double-precision floating-point arithmetic, and all comparisons and branches are implemented in constant time to prevent timing side channels. The implementation passes the NIST Known Answer Tests and produces signatures that verify against the FALCON reference implementation.

FALCON-512's contribution to the three-family construction is twofold: it provides a second lattice-based signature (diversifying within the lattice family by using NTRU lattices instead of module lattices), and its compact signature size (666 bytes versus ML-DSA's 3,293 bytes) partially offsets SLH-DSA's large signature size in the total bundle.

SLH-DSA-SHA2-128f implementation details

SLH-DSA (formerly SPHINCS+) is conceptually the simplest of the three families and provides the strongest security argument: its security relies only on the pre-image resistance of SHA-256, a property that has been studied for over twenty years with no meaningful degradation. The "f" in the parameter name stands for "fast" — the signing-optimized variant that trades larger signatures (17,088 bytes) for faster signing (compared to the "s" variant's 7,856-byte signatures with slower signing).

H33 uses SLH-DSA-SHA2-128f-simple specifically. The "simple" variant uses a simpler message compression scheme than the "robust" variant, gaining approximately 2x signing speed with a slightly different security argument (security under multi-target attacks rather than single-target). For the H33-74 use case, where each substrate has a unique nonce preventing multi-target attacks, the simple variant is appropriate and faster.

The implementation uses SHA-256 hardware acceleration (ARM SHA extensions on Graviton4, SHA-NI on x86-64) to accelerate the extensive hash chain computations that dominate SLH-DSA signing time. On Graviton4, SLH-DSA-SHA2-128f signing completes in approximately 300 microseconds — the slowest of the three families and the dominant contributor to overall signing latency.

The construction that ties them together

The three signatures are produced over a common signing message: the SHA3-256 hash of the substrate's canonical 58-byte encoding. This ensures that all three signatures attest to exactly the same data. The three signatures are concatenated in a fixed order (ML-DSA || FALCON || SLH-DSA) and hashed with SHA3-256 to produce the 32-byte commitment stored in the compact receipt.

Verification requires all three signatures to pass. If any single signature fails verification, the entire attestation is rejected. This AND construction (as opposed to an OR or threshold construction) ensures that the security of the substrate is the AND of all three families' security — meaning an adversary must break all three to forge a substrate, not just one.

For financial systems specifically, this means: your audit trail's cryptographic integrity survives even if one of the three mathematical families is broken. A lattice breakthrough compromises ML-DSA and possibly FALCON (if the breakthrough extends to NTRU lattices), but SLH-DSA remains intact because it uses no lattice mathematics at all. A hash function breakthrough compromises SLH-DSA but leaves both lattice-based schemes intact. Only a simultaneous breakthrough against all three independent mathematical foundations — MLWE lattices, NTRU lattices, and hash pre-image resistance — compromises the audit trail. That is three independent mathematical catastrophes, not one.

Deployment for financial institutions

H33 provides the three-family attestation primitive through a single API endpoint. A financial institution integrating H33-74 into its audit trail workflow makes one API call per record (or per batch of records), receives a 74-byte substrate, and stores it alongside the existing audit record. No cryptographic library integration required. No key management infrastructure to build. No NTT implementation to write. No FALCON sampling tree to debug. One API call, one 74-byte response, and the record is attested with three-family post-quantum security at production pricing.

For institutions that require on-premises deployment — which includes most systemically important financial institutions — H33 provides the attestation runtime for deployment inside the institution's own Trusted Execution Environment. The three-family signing keys are generated on the institution's hardware, stored in the institution's HSM, and never leave the institution's security boundary. H33's role in this model is providing the software and the operational support, not holding keys or processing data.

The standards are finalized. The implementation is production-ready. The performance exceeds any financial workload. The persistent footprint is 74 bytes. The security rests on three independent mathematical foundations rather than one. For financial systems that take their audit trail integrity seriously — and regulators are going to insist that all of them do — the question is not whether to deploy post-quantum signatures. It is whether to deploy one family or three. H33's answer is three, because the records you sign today need to survive an uncertain cryptanalytic future, and assumption diversity is the only responsible approach to that uncertainty.

Three Families. One API Call. 74 Bytes.

H33 implements FIPS 204, FIPS 205, and FIPS 206 in a single attestation primitive for financial systems. Production-ready today.

Get API Key Read the Docs