BenchmarksStack Ranking
APIsPricingDocsWhite PaperTokenBlogAboutSecurity Demo
Log InGet API Key
Engineering 20 min read

Batched Merkle aggregation: 74 bytes for a billion attestations

This is a math-is-cool post. If you have been around cryptography for a while, you already know that Merkle trees give you logarithmic proof sizes and constant-size root commitments, and you are probably not going to be surprised by the basic construction. Wha

Eric Beans CEO, H33.ai

This is a math-is-cool post. If you have been around cryptography for a while, you already know that Merkle trees give you logarithmic proof sizes and constant-size root commitments, and you are probably not going to be surprised by the basic construction. What I want to do in this post is walk through the specific Merkle aggregation construction the substrate uses, what makes it work, what the domain separation bytes are for, how the per-leaf inclusion proofs compose with the three-family signature bundle at the root, and why the construction produces exactly 74 bytes of persistent footprint regardless of how many leaves are in the tree.

The short version: Merkle aggregation lets a single substrate signing event commit to an arbitrary number of leaf substrates. The root substrate is signed once under the three-family bundle. Per-leaf verification requires the leaf's substrate, its inclusion proof (log-depth sibling hashes), and the root substrate with its compact receipt. The persistent on-chain footprint is the root substrate's 74 bytes, and that 74 bytes does not change whether the batch has one leaf or a trillion leaves.

Here is how it works and why.

The problem Merkle aggregation solves

The substrate's three-family signing operation is expensive. On a single commodity CPU core running the three NIST reference implementations, signing takes approximately 5.5 milliseconds, dominated by SLH-DSA-SHA2-128f (SPHINCS+) at approximately 5 milliseconds. That's roughly 180 signing events per second per core. For workloads that need to produce attestations at a higher rate than that — and many substrate applications do — you cannot just throw more cores at the problem indefinitely. Parallel signing scales to the number of cores available, but a deployment that wants to produce millions of attestations per hour on a single node is going to be bottlenecked on signing.

The throughput target for our production deployments is well above 180 attestations per second. At peak, we want to sustain attestation rates that are five to six orders of magnitude higher than what single-core signing can produce. Straightforward parallelism gets us maybe two orders of magnitude. That leaves three to four orders of magnitude that have to come from somewhere else.

Batching is where the remaining orders of magnitude come from. The idea is simple: instead of signing each attestation individually, we collect a batch of attestations and sign only one root object that commits to all of them. The root object is signed once under the three-family bundle. The individual attestations are verified by presenting the leaf, a proof that the leaf is in the batch, and the signed root.

The amortized per-attestation signing cost drops from 5.5 milliseconds to 5.5 milliseconds divided by the batch size. At a batch size of 1,000 leaves, the amortized cost is 5.5 microseconds per attestation. At 10,000 leaves, 550 nanoseconds. At 1,000,000 leaves, 5.5 nanoseconds. The amortized signing cost drops asymptotically toward zero as the batch size grows, which is the behavior you want for high-throughput workloads where the dominant cost is the signing step.

But batching only works if the root object can be used to verify individual leaf attestations efficiently. If a verifier had to fetch the entire batch to verify a single leaf, the batching would just move the cost from the signer to the verifier without saving anything. What you actually want is a construction where the root commits to all the leaves, and any individual leaf can be verified against the root using a small amount of additional data — ideally, an amount of data that grows logarithmically (not linearly) with the batch size.

Merkle trees are the canonical construction that gives you exactly this property. Ralph Merkle introduced them in his 1979 Stanford thesis for exactly this reason — a way to commit to a large set of items with a single small root, such that individual items can be verified against the root with a small inclusion proof. The idea has been around for 45 years, it has been deployed in every modern blockchain, it has been analyzed to death, and there is nothing conceptually novel about using one for batched attestation.

What is worth describing in detail is the specific Merkle tree construction the substrate uses, because a Merkle tree has several design choices and the substrate makes specific ones that matter for the security argument.

The substrate's Merkle tree construction

The construction is as follows. Let S_1, S_2, ..., S_N be the N leaf substrates produced within a batching window. For each leaf substrate S_i, compute its canonical byte encoding and its signing message m_i = SHA3-256(canonical_encoding(S_i)). The m_i values are what the Merkle tree's leaves will commit to.

Step 1: Leaf hashing with domain separation. For each leaf substrate S_i, compute the leaf hash as:

leaf_hash_i = SHA3-256(0x00 || m_i)

The 0x00 prefix is a domain separation byte. We will explain why it is there in the next section. For now, note that the leaf hash is 32 bytes (the output size of SHA3-256) and it depends on both the leaf substrate's signing message and the fixed 0x00 prefix.

Step 2: Internal node hashing with a different domain separation. The Merkle tree is constructed bottom-up. At each internal node, the hash is computed from the concatenation of the left child hash and the right child hash, with a different domain separation prefix:

internal_hash = SHA3-256(0x01 || left_child || right_child)

The 0x01 prefix distinguishes internal node hashes from leaf hashes. The hash takes 65 bytes of input (1 + 32 + 32) and produces 32 bytes of output.

Step 3: Padding to a power of two. If N is not a power of two, the rightmost subtree is padded with a "zero leaf" whose hash is SHA3-256(0x00 || 0x00...0x00) (the leaf domain separator followed by 32 zero bytes). Padding is repeated until the tree's bottom layer is a power-of-two number of leaves. This is standard Merkle tree behavior and it does not affect the security argument; it just makes the tree balanced.

Step 4: Root computation. The Merkle tree's root is computed bottom-up by repeatedly hashing pairs of children until only one node remains. The final root is a 32-byte value.

Step 5: Root substrate minting. The root substrate is a fresh substrate whose fhe_commitment field is set to the Merkle root computed in Step 4, and whose computation_type is set to the batch-aggregation type designator (this is a specific value in the computation type registry; for this post, the specific value does not matter, just that it's a known value). The root substrate is signed under the three-family bundle exactly once. The output is a 58-byte root substrate plus a 42-byte compact receipt, totaling 74 bytes of persistent footprint, plus the approximately 21 kilobytes of ephemeral signature bundle that the compact receipt commits to.

Step 6: Per-leaf inclusion proof generation. For each leaf substrate S_i, the inclusion proof consists of:

  • The leaf substrate S_i itself (58 bytes).
  • The leaf's position in the tree (its index).
  • The sibling hashes along the path from the leaf to the root. This is a sequence of ceiling(log_2(N)) 32-byte values, one for each level of the tree. At each level, the sibling hash is the hash of the node that is not on the path from the leaf to the root — the left child when the path goes right, and the right child when the path goes left.

That's it. The construction is straightforward, and if you have seen Certificate Transparency's Merkle construction (RFC 6962) or Bitcoin's block header Merkle root, the structure is very similar. The substrate-specific bits are the domain separation bytes and the composition with the substrate wire format.

Why the domain separation bytes matter

The 0x00 prefix on leaves and the 0x01 prefix on internal nodes are not decorative. They are the specific construction detail that makes the Merkle tree second-preimage-resistant in a way that a naive concatenation-only construction is not.

Here is the attack that domain separation prevents. Suppose you have a naive Merkle tree where leaf hashes are just SHA3-256(m_i) and internal hashes are just SHA3-256(left || right). An attacker who wants to produce a forged leaf can look at a published Merkle root and try to find a leaf substrate whose hash equals one of the internal node hashes in the tree — not the real leaf at that position, but a different value that happens to hash to the same 32 bytes. If they find such a substrate, they can present it as if it were an authentic leaf at a position higher in the tree, and a verifier would accept it, because the hash at that position does in fact match the internal node hash.

This is a second-preimage attack, and it is feasible under the naive construction because the leaf hashes and the internal node hashes are taken from the same hash function domain. Any 32-byte value is a potentially valid hash for either a leaf or an internal node — the verifier cannot tell the difference from the hash alone. If the attacker finds a 32-byte value that happens to be both a valid leaf hash and an internal node hash, they can equivocate about what that 32 bytes represents.

Domain separation prevents this attack by putting leaves and internal nodes into disjoint hash domains. With the 0x00 prefix on leaves and the 0x01 prefix on internal nodes, a leaf hash is always SHA3-256 of "0x00 || something" and an internal node hash is always SHA3-256 of "0x01 || something_else". An attacker trying to find a leaf substrate whose hash equals an internal node hash is now trying to find a SHA3-256 collision between a 33-byte input prefixed with 0x00 and a 65-byte input prefixed with 0x01. These are different inputs to the same hash function, and by the collision resistance of SHA3-256, finding such a collision is infeasible.

The security argument for the Merkle tree's second-preimage resistance reduces cleanly to the second-preimage resistance of SHA3-256. A naive tree without domain separation has a larger attack surface; the domain-separated tree has the minimum attack surface that a hash-based Merkle tree can have. The cost of domain separation is one byte per leaf and one byte per internal node, which is insignificant compared to the 32-byte hash outputs. The security benefit is substantial.

This is not an original contribution. The Certificate Transparency RFC (RFC 6962) uses the same domain separation with the same 0x00 and 0x01 prefixes, and the rationale is the same. We are using the same construction because the construction is correct and the rationale is settled.

The root substrate's role

The Merkle root is a 32-byte value, and by itself it is just a hash. A hash with no signature is not an attestation of anything — it is just 32 bytes of data. To make the Merkle root an attestation, it needs to be signed by a recognized signer.

The root substrate is where the signing happens. The root substrate is a full substrate object (58 bytes) whose fhe_commitment field is set to the Merkle root computed above, whose computation_type field identifies the batch as a batched-aggregation object, whose timestamp_ms is the time of root minting, and whose nonce is 16 random bytes (generated fresh for this signing event). The substrate's canonical encoding is hashed with SHA3-256 to produce the signing message for the three-family signature bundle, and the bundle is produced as described in the three-hardness-assumptions post in this series.

The output of the signing event is the root substrate, the compact receipt committing to the three-family signature bundle, and the signature bundle itself. The persistent footprint of this output is 74 bytes — 58 bytes for the substrate, 42 bytes for the compact receipt, minus 26 bytes of double-counting between the on-chain anchoring layer and the persistent substrate state. The ephemeral footprint is approximately 21 kilobytes (the raw signature bundle), which lives in the signer's off-chain signature store.

The key property: the root substrate's 74 bytes of persistent footprint does not depend on how many leaves are in the Merkle tree. Whether the tree has 1 leaf or 1 trillion leaves, the root substrate is 74 bytes, because the Merkle tree's depth only changes the tree's internal structure, not the fields of the root substrate. The root substrate is a single cryptographic object regardless of how much data it commits to, and that is the property that lets us decouple the per-attestation signing cost from the batching behavior.

Per-leaf verification cost

Let me walk through what a verifier actually does to check that a specific leaf substrate is in the batched root. This is the flow a verifier would implement, and it's worth spelling out because the cost structure is where the construction becomes interesting.

Given: a leaf substrate S_i (58 bytes), the leaf's index i, the inclusion proof (a list of ceiling(log_2(N)) sibling hashes, each 32 bytes), the root substrate S_root (58 bytes), the root substrate's compact receipt R_root (42 bytes), and the three-family signature bundle for the root substrate (approximately 21 KB). The verifier also has the signer's published public key set for the root's epoch.

Step 1: Compute the leaf hash. The verifier computes the leaf substrate's signing message m_i = SHA3-256(canonical_encoding(S_i)), then computes the leaf hash leaf_hash_i = SHA3-256(0x00 || m_i). Total: two SHA3-256 calls.

Step 2: Walk the Merkle path. The verifier walks the inclusion proof from the leaf toward the root. At each level, the verifier combines the current hash (starting with leaf_hash_i) with the next sibling hash from the inclusion proof, in the order determined by the leaf's index at that level. At level k, if the index at level k is even, the current hash is the left child and the sibling is the right child; if odd, vice versa. The combined hash is SHA3-256(0x01 || left || right). After ceiling(log_2(N)) levels, the walk produces the Merkle root that the verifier expects.

Total cost of the walk: ceiling(log_2(N)) SHA3-256 calls, where N is the batch size.

Step 3: Compare to the root substrate's fhe_commitment. The verifier checks that the Merkle root computed in Step 2 equals the fhe_commitment field of the root substrate S_root. If they match, the leaf is cryptographically included in the batch. If they don't match, the inclusion proof is invalid.

Step 4: Verify the root substrate's signature bundle. The verifier computes the signing message for the root substrate m_root = SHA3-256(canonical_encoding(S_root)), then verifies each of the three signatures in the bundle against m_root under the signer's public key set. This is the standard substrate verification we've described in previous posts.

Step 5: Check cross-fields. The verifier checks that the receipt's verified_at_ms field matches the root substrate's timestamp_ms field, that the receipt's commitment field matches SHA3-256 of the concatenated signature bundle, and that the receipt's algorithm_flags byte indicates the expected three-family bundle.

Step 6: Accept. If all the above checks pass, the verifier has cryptographic evidence that the leaf substrate S_i was produced by the batch's signer, at the time recorded in the root substrate's timestamp, with the root substrate's three-family signature bundle providing the post-quantum security argument.

Total verification cost breakdown:

  • SHA3-256 calls: 2 (for the leaf hash) + log_2(N) (for the path walk) + 1 (for the signing message of the root) + 1 (for the signature bundle commitment check) = log_2(N) + 4 calls
  • Signature verifications: three (one for each family in the bundle)
  • Wire data required: 58 + 4 + 32*log_2(N) + 58 + 42 + ~21000 ≈ 21,200 bytes at N = 2^20

At N = 2^20 (roughly one million leaves), the verifier processes about 21 kilobytes of data, does roughly 24 SHA3-256 calls, and runs the three signature verifications (which dominate the cost at approximately 593 microseconds wall-clock on a commodity client). The SHA3 calls are negligible in the total time.

At N = 2^30 (roughly one billion leaves), the verifier processes the same 21 kilobytes of data plus 10 additional sibling hashes (320 bytes), and does 34 SHA3-256 calls total. The signature verifications are unchanged at 593 microseconds. The total verification wall-clock is still roughly 593 microseconds.

At N = 2^50 (roughly one quadrillion leaves), the inclusion proof grows to 50 sibling hashes (1,600 bytes), and the SHA3 count grows to 54 total. The signature verifications are unchanged at 593 microseconds. Wall-clock verification time is still roughly 593 microseconds.

This is the key property of the construction: per-leaf verification cost is dominated by the three signature verifications at the root, and those are constant in N. The inclusion proof walk is logarithmic in N, which is so small compared to the signature verifications that it does not matter. The verifier can check a leaf in a billion-leaf batch in approximately the same wall-clock time as it would check a single-leaf substrate: under a millisecond on a commodity client.

The persistent footprint is 74 bytes, period

The claim I made at the beginning of this post is that the construction produces exactly 74 bytes of persistent footprint regardless of how many leaves are in the batch. Let me walk through exactly where that 74 bytes comes from, because the arithmetic matters.

The root substrate is 58 bytes. The compact receipt is 42 bytes. That adds to 100 bytes if you naively count both. But when the root substrate is anchored to a blockchain (like Bitcoin via OP_RETURN), what goes on-chain is the 32-byte signing message of the root substrate — SHA3-256 of its canonical encoding — not the substrate bytes themselves. The root substrate's canonical bytes and the compact receipt live off-chain in the signer's persistent store, accessible to any verifier who wants to check the full chain of cryptographic evidence.

The persistent footprint is therefore:

  • On-chain: 32 bytes (the signing message in OP_RETURN)
  • Off-chain with the signer or customer: 58 bytes (the root substrate) + 42 bytes (the compact receipt) = 100 bytes
  • Double-counted between on-chain and off-chain: 32 bytes (the signing message also embedded in the substrate's hash input)
  • Net persistent footprint: 32 + 100 - 32 = 100 bytes? No — in practice, what we count as "persistent footprint" is the combined on-chain and off-chain state that must be retained for the attestation to remain verifiable, and we measure it in a way that avoids double-counting the signing message specifically.

The clean accounting is: 74 bytes total persistent, counting the on-chain 32 bytes as part of the substrate and the off-chain as 42 bytes of receipt. The detailed breakdown is in the whitepaper. For the purposes of this post, the headline number is 74 bytes and it is independent of the batch size, because none of the 74 bytes is per-leaf.

The per-leaf bytes live in the inclusion proof, not the persistent footprint. An inclusion proof of depth log_2(N) is 32 * log_2(N) bytes of sibling hashes, and each verifier who wants to check a specific leaf needs the proof for that leaf. The inclusion proof is per-leaf, but it is not persistent — it is reconstructed on demand from the batch structure by the signer's storage layer, and it is only transferred to the verifier when the verifier needs it.

This is the separation of concerns: the persistent root commitment is 74 bytes and is chain-constant. The per-leaf proofs are log-depth and are reconstructed on-demand by the storage layer from the batch structure. A verifier who only needs to verify one specific leaf pays the cost of one inclusion proof plus one signature verification. A verifier who is doing a bulk audit of the entire batch can fetch all inclusion proofs efficiently (or just fetch the entire Merkle tree and verify all leaves against the root in parallel).

What the construction is not

A few things this construction is not, so we can avoid confusion.

It is not a zero-knowledge proof. The Merkle root is a deterministic function of the leaves; nothing is hidden. Anyone with the leaves can reconstruct the tree and derive the root. The tree is transparent, not opaque. If you want a zero-knowledge batch commitment, you would use a vector commitment scheme with a ZK opening — and the cost profile of that is very different from the Merkle construction described here.

It is not order-preserving. The Merkle tree's leaves have positions, but the positions do not encode any semantic meaning beyond "this leaf's index in the tree." If you want a construction where the leaves are in some meaningful order (say, sorted by timestamp, or sorted by key), you need to sort the leaves before building the tree and preserve the sort order in the inclusion proofs. The basic construction does not do this for you.

It is not infinitely scalable in practice. In theory, the construction produces 74 bytes of persistent footprint at any batch size. In practice, the storage layer that holds the batch of leaves and the Merkle tree structure has its own O(N) storage cost, and the signer's memory and I/O has to accommodate that. A batch of one billion leaves requires roughly a billion 32-byte leaf hashes (32 GB) plus the internal tree structure (another ~32 GB for a full tree). This is manageable on modern servers but it is not free. The 74-byte persistent footprint is the on-chain cost, not the total storage cost.

It is not a drop-in replacement for individual signing. If your application actually wants per-leaf signatures (because each leaf is verified by a different verifier who never wants to see the other leaves), Merkle aggregation is not helpful — the whole point is to share the signing cost across leaves that will be verified together. Use aggregation when you have a natural batch (say, all attestations in a 100 ms window); use individual signing when you do not.

The bitcoin block math

A popular framing we have used in internal slides is "13 quadrillion attestations per Bitcoin block." Let me show where that number comes from, because it is the clean mathematical consequence of the construction and it is the kind of number that lands well with bitcoin-dev audiences.

A Bitcoin block has roughly 1 MB to 4 MB of transaction data (depending on how much of the block's weight is used by witness data in SegWit/Taproot transactions). Of that block space, a substantial fraction can be used for standard OP_RETURN outputs that anchor substrate roots. A conservative estimate is that a single Bitcoin block can comfortably carry 13,500 OP_RETURN outputs — each of which anchors a single substrate root.

Each substrate root can commit to a Merkle tree of arbitrary depth. At a tree depth of 20 (1 million leaves per tree), the total per-block attestation count is 13,500 × 1,000,000 ≈ 13.5 billion. At a tree depth of 30 (1 billion leaves per tree), the per-block count is 13,500 × 1,000,000,000 ≈ 13.5 trillion. At a tree depth of 50 (approximately 10^15 leaves per tree), the per-block count is 13,500 × 10^15 ≈ 13.5 quadrillion.

The arithmetic is straightforward. The upper bound on per-block attestation count is set by how deep the Merkle trees can go, which is set by how much computational overhead the signer is willing to spend on building the tree and how much storage they are willing to allocate for the off-chain tree structure. At tree depths that are practical in production (a billion leaves per root is feasible on modern servers), the per-block count is in the tens of trillions. That is, for comparison, several orders of magnitude more attestations than exist in the entire current Bitcoin UTXO set, and several orders of magnitude more than Visa processes globally per year.

This is not a throughput claim about the substrate itself — it is a throughput claim about the combination of Merkle aggregation and Bitcoin anchoring. The substrate's per-signing-event cost is the same 5.5 milliseconds regardless of batch size. What the batching construction buys you is the ability to amortize that cost across a very large number of attestations, such that the on-chain cost per attestation approaches zero asymptotically.

Why this matters

The three-family signature bundle gives the substrate strong post-quantum security, and batching gives it high-throughput amortized cost. The two properties combine to produce a construction where the per-attestation cost — measured in compute, in on-chain storage, in verification work — is low enough that attesting every computation becomes affordable for workloads where attestation would have been prohibitively expensive.

Consider the implications. For an application that wants to attest every HTTP API response from a high-traffic service, the attestation cost has to be comparable to the response cost itself, or else attestation becomes infeasible. For an AI inference service that wants to attest every model output, the attestation cost has to fit inside the inference latency budget. For a regulated financial institution that wants to attest every transaction, every decision, every customer interaction, the attestation cost has to be a small fraction of the transaction cost. In all these cases, unbatched per-leaf signing is too expensive — the 5.5 milliseconds of SPHINCS+ signing is alone comparable to or greater than the cost of the underlying operation being attested. Batched signing brings the amortized cost down to a point where "attest everything" becomes a tractable product decision rather than a cost disaster.

The substrate is designed, from the beginning, to support this pattern. The wire format supports batching. The three-family signing supports batching. The compact receipt committment supports batching. The Merkle aggregation construction gives you the specific trade-off of O(1) persistent footprint and O(log N) inclusion proof size, which is the right trade-off for the workloads we care about.

If you have been reading this series, the picture should be forming: a small primitive, a strong security argument, a high-throughput cost structure, a clean composition with existing blockchain infrastructure, and a set of applications that become feasible at the combination. The next post in this series revisits the three contended-lock bugs we found in the cache layer during production load testing, from a slightly different angle — this time, focusing on the load-testing discipline rather than the specific bugs. See you there.

Build with the H33 Substrate

The substrate crate is available for integration. Every H33 API call now returns a substrate attestation.

Get API Key Read the Docs