Cryptographic Document Version Control in 74 Bytes
Every document version attested with a 74-byte post-quantum substrate. Three hardness assumptions per save. Anchored to Bitcoin. No trust in the document server required.
The trust problem with document versioning
Every mainstream document collaboration system stores revision history on a server controlled by the operator. Google Docs, SharePoint, Notion, Confluence, Git-hosted wikis -- the architecture is the same everywhere. The server records who changed what, when, and the server presents that history to you when you ask for it. The implicit trust assumption is that the server operator has not altered, deleted, reordered, or fabricated any of those records.
That assumption is wrong structurally, not because any particular operator is dishonest, but because the architecture makes honesty unverifiable. A Google Workspace administrator can delete a document revision and no external party can detect the deletion. A SharePoint tenant admin can restore a previous version over a current one, rewriting the effective history. A Git server operator who controls the bare repository can force-push a rewritten branch and, if they also control the backup infrastructure, leave no trace. These are not theoretical attacks. They are features of the systems, documented in their admin guides.
Courts accept server-generated audit logs as evidence in litigation, regulatory proceedings, and compliance audits. But server logs are written by the server. The entity producing the evidence is the same entity whose conduct the evidence is supposed to document. In adversarial contexts -- employment disputes, contract disagreements, regulatory investigations, fraud examinations -- this is a structural vulnerability. The party with administrative access to the document system can, in principle, alter the evidentiary record before producing it. The opposing party has no independent mechanism to detect the alteration.
For unregulated collaboration -- drafting a blog post, iterating on a slide deck, editing shared notes -- this does not matter. The stakes are low and trust in the platform operator is reasonable. But for regulated industries the stakes are categorically different. A hospital altering the timestamp on a medical record. A bank modifying the version history of a compliance filing. A law firm changing the metadata on a document produced in discovery. A government agency backdating a policy memo. In each case, the current architecture provides no cryptographic defense. The only barriers are procedural controls, access policies, and the assumption that audit logs on the same server are themselves trustworthy.
The fundamental issue is that document versioning systems conflate storage with attestation. The same system that stores the document also attests to its history. When the storage operator is also the attestation authority, the attestation is only as trustworthy as the operator. For a version history to be independently verifiable, the attestation must be cryptographically bound to the document content and anchored to something the operator does not control.
What substrate-based version control looks like
The H33 substrate is a 74-byte cryptographic primitive. It binds a computation result -- in this case, the exact bytes of a document at a specific point in time -- to a three-family post-quantum signature. The primitive has two parts: a 32-byte canonical commitment and a 42-byte compact receipt. The canonical commitment contains a SHA3-256 hash of the document content, a computation type byte, a millisecond timestamp, and a nonce. The compact receipt contains a cryptographic commitment to a signature bundle produced by three independent post-quantum signature families. The total persistent footprint per version: 74 bytes.
Here is what happens when a user saves a document in a substrate-integrated system. The application serializes the document to its canonical byte representation. It calls Mint(0x10, canonical_document_bytes), where 0x10 is the computation type for document attestation. The substrate layer computes SHA3-256 over the canonical bytes, producing the 32-byte content hash. It constructs the canonical commitment structure. It signs the commitment with three post-quantum signature algorithms -- ML-DSA-65 (NIST FIPS 204), FALCON-512 (NIST Round 3 finalist), and SLH-DSA-SHA2-128f-simple (NIST FIPS 205). It distills the three signatures into a 42-byte compact receipt via a 285:1 distillation. The result is a 74-byte substrate attestation that cryptographically binds the exact document state to a specific moment in time.
The document itself is not stored in the substrate. The substrate stores only the binding. This is a critical design property. The substrate does not increase document storage costs. It does not require the document to be transmitted to an external service. The SHA3-256 computation runs locally on the document bytes. The signing runs locally. The 74-byte result is stored alongside the document metadata in whatever storage system the application already uses -- a database row, a file system attribute, an object store tag. The substrate is metadata, not a copy.
Once the 74-byte attestation exists, it can be independently anchored to Bitcoin via OP_RETURN. A single Bitcoin transaction can anchor a batch of substrates (more on this in the Merkle aggregation section below). The Bitcoin anchor provides a timestamp that no party controls -- not the document operator, not H33, not anyone. The document version is now bound to a specific block height on the most durable append-only ledger in existence.
Substrate-referencing-substrate: hash chains
A single substrate attestation proves that a specific document state existed at a specific time. But document version control is not about individual snapshots. It is about the ordered sequence of changes: version 1 preceded version 2, version 2 preceded version 3, and no versions were inserted, deleted, or reordered between them.
The substrate supports this through a hash-chaining mechanism. When a new version of a document is attested, the computation result r passed to Mint includes not just the SHA3-256 of the new document bytes, but also the 32-byte canonical commitment from the prior version's substrate. The new substrate's content hash therefore commits to both the current document state and the entire prior history. The structure is a hash chain -- each link binds forward to the next and backward to the previous, creating a directed acyclic graph (DAG) of commitments.
Consider a document with five versions. The substrate chain looks like this:
V1: SHA3-256(doc_v1) -> Substrate_1
V2: SHA3-256(doc_v2 || Substrate_1.commitment) -> Substrate_2
V3: SHA3-256(doc_v3 || Substrate_2.commitment) -> Substrate_3
V4: SHA3-256(doc_v4 || Substrate_3.commitment) -> Substrate_4
V5: SHA3-256(doc_v5 || Substrate_4.commitment) -> Substrate_5
An attacker who wants to alter version 3 must produce a forged Substrate_3 with a different content hash. But Substrate_4 commits to Substrate_3.commitment, so the attacker must also forge Substrate_4. And Substrate_5 commits to Substrate_4.commitment, so the attacker must forge that too. The cascade continues forward through every subsequent version. Each forgery requires producing a valid three-family post-quantum signature -- breaking not just one signature scheme but all three simultaneously. If any single substrate in the chain has been anchored to Bitcoin, the attacker must also rewrite the Bitcoin blockchain from that block forward.
The chain is self-authenticating. Any verifier who possesses the substrate sequence can independently verify the entire version history without trusting the document server. The verifier does not need access to the server's database, the server's audit logs, or the server's cooperation. The verification is a local computation: hash the document bytes, check the chain linkage, verify the three-family signatures, confirm the Bitcoin anchor. If all checks pass, the history is authentic. If any check fails, the specific link where tampering occurred is identified.
This is structurally different from Git. Git commits form a hash chain using SHA-1 (or, in newer versions, SHA-256). But Git commits are not signed by default, and when they are signed, they use a single classical signature scheme (typically GPG with RSA or ECDSA). A compromised Git server can produce valid-looking commits for fabricated content because the signing key is a single point of failure. The substrate chain requires forging three independent post-quantum signatures per link -- one from each of three distinct mathematical families.
Why 74 bytes per version is viable
The most common objection to cryptographic attestation of document versions is overhead. If every save produces attestation data, and documents are saved frequently, the attestation metadata will eventually dominate the storage budget. This objection is well-founded for attestation schemes that produce large artifacts -- a raw ML-DSA-65 signature is 3,309 bytes, a raw FALCON-512 signature is 690 bytes, and a raw SLH-DSA-SHA2-128f-simple signature is 17,088 bytes. Three raw signatures per version would cost 21,087 bytes per save. A document with 10,000 versions would accumulate 210 MB of signature overhead. That is prohibitive.
The substrate's 285:1 distillation eliminates this problem. The three raw signatures (21,087 bytes) are distilled into a 42-byte compact receipt that cryptographically commits to all three. The full signature bundle is stored once in hot cache and can be retrieved on demand for full verification, but the persistent per-version cost is 74 bytes.
At 74 bytes per version, the numbers are straightforward:
- 10 versions:
740 bytes-- less than a single tweet. - 1,000 versions:
74 KB-- smaller than a typical JPEG thumbnail. - 10,000 versions:
740 KB-- smaller than a single high-resolution photograph embedded in the document. - 100,000 versions:
7.4 MB-- smaller than a single PowerPoint slide with a background image. - 1,000,000 versions:
74 MB-- for a corpus with a million attested edits, the total attestation overhead is smaller than a short video file.
The substrate's constant-size property means the attestation cost is independent of the document size. A 100-byte text file and a 100 MB PDF both produce the same 74-byte substrate. A one-line code change and a 10,000-line refactor both produce the same 74-byte substrate. The attestation overhead scales linearly with the number of versions, not with the size of the content being versioned. For any realistic document versioning workload, the attestation metadata is a rounding error in the total storage budget.
Three hardness assumptions per version
Each substrate attestation is signed by three post-quantum signature algorithms from three independent mathematical families:
ML-DSA-65(FIPS 204) -- security rests on the hardness of the Module Learning With Errors (MLWE) problem over structured lattices.FALCON-512(NIST Round 3 finalist) -- security rests on the hardness of the Short Integer Solution (SIS) problem over NTRU lattices, a mathematically distinct lattice family from MLWE.SLH-DSA-SHA2-128f-simple(FIPS 205) -- security rests on the hardness ofSHA-256pre-image resistance, a hash-based construction with no algebraic structure to attack.
These are not three implementations of the same assumption. They are three independent mathematical bets. An adversary who breaks MLWE lattices (compromising ML-DSA-65) still faces NTRU lattices (FALCON-512) and hash pre-image resistance (SLH-DSA). An adversary who breaks all structured lattices still faces the hash-based scheme, whose security relies only on the pre-image resistance of SHA-256 -- the same assumption that underpins Bitcoin itself. Forging a substrate attestation requires simultaneously breaking all three.
Compare this to classical document version control. Git uses SHA-1 (or SHA-256) commit hashes for integrity, but commit hashes are not signatures -- they are collision-resistant digests. A compromised server that controls the repository can produce a new commit with any content and a valid hash. Git's signed commits add a single classical signature (RSA or ECDSA), creating a single point of failure: one key compromise, one algorithm break, and the entire commit history is forgeable. GPG-signed Git commits are better than unsigned ones, but they rest on a single hardness assumption that a sufficiently capable quantum computer can break in polynomial time via Shor's algorithm.
The substrate's three-family construction means that each document version is protected by three independent hardness assumptions. The practical consequence: even if one family is broken by a future cryptanalytic advance or quantum algorithm, the remaining two families continue to protect every previously attested version. The attestation degrades gracefully rather than failing catastrophically. This is not a theoretical hedge -- it is a structural property of the signature bundle, enforced at mint time by the substrate's binary.
Integration pattern
The substrate is infrastructure, not a user-facing feature. Users do not interact with substrates, do not see 74-byte hex strings, and do not need to understand post-quantum cryptography. The integration happens at the application layer, invisible to the end user, in three places: save, export, and encrypt.
On save: When a user saves a document, the application serializes the document to its canonical byte representation and calls the substrate API. The API returns a 74-byte attestation. The application stores the attestation alongside the document metadata -- typically as a column in the version history table or an attribute on the object in the document store. The save operation is not blocked by the attestation; the substrate mint completes in microseconds. On our production infrastructure, the H33 pipeline sustains 1,667,875 attestations per second on a single Graviton4 instance. Document save latency is not measurably affected.
On export: When a user exports a document (PDF, DOCX, signed package), the export includes the substrate attestation as embedded metadata. A PDF exported from a substrate-integrated system carries its attestation in the document metadata fields. Any recipient can verify the attestation independently using the open-source verifier without contacting the originating server.
On encrypt: When a document is encrypted for storage or transmission, the substrate attests the plaintext content before encryption. This ensures that the attestation binds to the actual document content, not to the ciphertext (which changes with each encryption operation due to random IVs and nonces). The attestation remains verifiable even if the encryption scheme changes, the key is rotated, or the ciphertext is re-encrypted.
The application calls a single API endpoint:
POST /api/v1/substrate/attest
Content-Type: application/octet-stream
{canonical_document_bytes}
Response: {
"substrate": "74-byte hex",
"commitment": "32-byte hex",
"receipt": "42-byte hex",
"computation_type": "0x10",
"timestamp_ms": 1744531200000
}
Verification is a local operation. The open-source verifier (Apache 2.0 licensed) takes the document bytes and the substrate, recomputes the SHA3-256 hash, checks the chain linkage if a prior substrate is provided, and verifies the compact receipt against the three-family signature commitment. No network call is required. No API key is required. No trust in H33 is required. The verification is deterministic and reproducible by any party with the document and its substrate.
Legal and compliance applications
The combination of substrate attestation and Bitcoin anchoring produces a specific kind of evidence: cryptographic proof that a document existed in a specific state at a specific time, signed by three independent post-quantum signature families, anchored to an immutable public ledger. This has direct applications in regulated industries where the integrity of document history is a legal requirement, not merely a convenience.
SEC filings and financial compliance. Public companies are required to maintain accurate records of financial disclosures. The preparation of an SEC filing involves multiple drafts, reviews, and revisions before the final submission. A substrate-attested version history provides cryptographic evidence of the drafting process -- which version was reviewed by counsel, which version was approved by the audit committee, which version was submitted to EDGAR. If a regulator later questions whether a disclosure was backdated or a revision was fabricated, the substrate chain provides independently verifiable evidence that does not depend on the company's own server logs.
HIPAA-covered records. The HIPAA Security Rule requires covered entities to maintain the integrity of electronic protected health information (ePHI). The current standard of practice is access controls and audit logs stored on the same system as the records. A substrate-attested medical record provides a stronger guarantee: the record's content at each point in time is cryptographically bound to a post-quantum signature and a Bitcoin timestamp. An alteration to a medical record after attestation is detectable by any party with the substrate, without requiring access to the hospital's audit system.
Legal discovery. Document production in litigation is governed by rules that require producing documents in their original form. Parties routinely dispute the authenticity of produced documents, and courts resolve these disputes based on metadata, server logs, and expert testimony about the reliability of the producing party's document management system. A substrate-attested document carries its own proof of authenticity. The producing party provides the document and its substrate chain. The receiving party verifies the chain locally. Disputes about document authenticity become disputes about mathematics, not about the trustworthiness of the producing party's IT department.
Title insurance and real estate. Property records, title documents, and closing packages involve dozens of documents that must be maintained in their exact form for decades. A substrate-attested closing package provides cryptographic proof that the title commitment, the survey, the deed, and the settlement statement existed in their exact attested form on the closing date. Twenty years later, when a title dispute arises, the substrate chain provides evidence that does not depend on the survival or integrity of the title company's document management system.
Court filings and judicial records. Court documents filed electronically are stored on systems operated by court clerks. A substrate-attested filing provides an independent anchor: even if the court's electronic filing system is compromised, the parties retain their substrate attestations as proof of what was filed and when.
Audit trails. SOX, SOC 2, ISO 27001, FedRAMP -- every major compliance framework requires audit trails. Current audit trails are server-generated logs stored alongside the data they audit. A substrate-attested audit trail is self-authenticating: the trail's integrity can be verified by any auditor without trusting the system that generated the trail.
The common thread across all of these applications: the substrate removes the operator from the trust chain. The document server stores the documents and the substrates, but the server cannot forge a substrate without breaking three independent post-quantum signature families and rewriting the Bitcoin blockchain. The integrity guarantee transfers from "we trust the server operator" to "we trust the mathematics."
Batched Merkle aggregation for high-frequency saves
Real document collaboration systems generate saves at high frequency. A team of 1,000 concurrent users with auto-save every 30 seconds produces approximately 2,000 save events per minute. If each save required an independent three-family signature, the signing cost would be substantial -- not because individual signatures are slow (our pipeline sustains 1,667,875 attestations per second), but because anchoring each individual substrate to Bitcoin would require one OP_RETURN transaction per save, which is neither economical nor necessary.
The substrate supports batched Merkle aggregation. The mechanism works as follows. During a configurable batch window (e.g., 10 seconds), individual document saves produce individual substrate attestations as described above. Each attestation is complete and independently verifiable -- the user's save is not delayed by batching. At the end of the batch window, the system collects all substrates produced during the window and arranges them as leaves in a Merkle tree. The Merkle root is signed with a single three-family signature. The resulting 74-byte root attestation covers the entire batch.
The structure is:
Batch window: 10 seconds
Saves in window: 333 (from 1,000 users, ~2 saves/min each)
Merkle tree:
Level 0: [S_1, S_2, S_3, ..., S_333] (individual substrates)
Level 1: [H(S_1||S_2), H(S_3||S_4), ...] (pairwise hashes)
...
Root: SHA3-256(...) (single 32-byte root)
Root attestation: Mint(0x11, merkle_root) -> 74-byte root substrate
Bitcoin anchor: OP_RETURN(root_substrate) -> 1 transaction per batch
To verify that a specific document version was included in the batch, the verifier needs the individual substrate, the Merkle proof (a sequence of sibling hashes from the leaf to the root), and the root substrate. The verification is logarithmic in the batch size: for a batch of 333 saves, the Merkle proof is ceil(log2(333)) = 9 sibling hashes, or 9 * 32 = 288 bytes. Combined with the individual substrate (74 bytes) and the root substrate (74 bytes), the total verification payload per document version is 436 bytes -- still far smaller than a single raw SLH-DSA signature.
The per-save signing cost under Merkle aggregation is effectively the cost of one SHA3-256 hash operation (to produce the individual substrate's content hash) plus the amortized cost of one three-family signature divided by the batch size. For a batch of 333 saves, the per-save signing overhead is approximately 1/333 of a three-family signature -- microseconds, not milliseconds. The Bitcoin anchoring cost is similarly amortized: one OP_RETURN transaction per batch window, regardless of how many saves occurred in that window.
This design has a specific property that matters for compliance: individual substrates are produced synchronously at save time, before the batch window closes. The user's document version is attested the moment it is saved. The batching affects only the Bitcoin anchor, not the attestation. If the system fails between save and batch close, the individual substrate is still valid and independently verifiable -- it simply lacks a Bitcoin anchor until the next successful batch. The attestation is never lost; only the anchor is deferred.
Patent pending -- H33 substrate Claims 124-125 cover batched Merkle response attestation.
What this means in practice
The version control systems we use today were designed to track changes. They were not designed to prove that the tracked changes are authentic. Tracking and proving are different engineering problems with different threat models. Tracking requires a database. Proving requires cryptography.
The substrate adds the cryptographic layer without replacing the tracking layer. Google Docs continues to store revision history in its database. SharePoint continues to maintain its version metadata. Git continues to build its commit DAG. The substrate runs alongside these systems, producing a parallel chain of 74-byte attestations that are independently verifiable, post-quantum secure, and anchored to Bitcoin. If the tracking system is honest, the substrate confirms it. If the tracking system is dishonest, the substrate exposes it.
For most document workflows, this is invisible infrastructure. The user saves, the substrate mints, the attestation is stored, and nobody thinks about it until somebody needs to prove that a specific version of a specific document existed at a specific time in its exact form. When that moment arrives -- in a courtroom, in a regulatory examination, in an audit, in a dispute -- the substrate provides an answer that does not depend on the credibility of the party presenting the evidence. It depends on SHA3-256, ML-DSA-65, FALCON-512, SLH-DSA-SHA2-128f-simple, and the Bitcoin blockchain. Three hardness assumptions and an append-only ledger. That is the trust model.
Seventy-four bytes per version. No trust in the server. No trust in the operator. Verification is local, open-source, and post-quantum secure. The document speaks for itself.
Build with the H33 Substrate
The substrate API is live. Attest document versions with three-family post-quantum signatures in a single API call.
Get API Key Read the Docs