April 20, 2026 AI 13 min read

NIST Compliant AI Audit Trails — Prove Every Decision

AI agents make thousands of decisions that affect people's lives. Nobody can prove what any agent actually produced. NIST post-quantum attestation changes that permanently.

Eric Beans CEO, H33.ai

An AI agent denied your mortgage application this morning. The denial letter cites insufficient income documentation and elevated risk indicators from automated underwriting. You want to challenge it. The bank says the AI model produced that output. But here is the question nobody can answer with certainty: did it? Can the bank prove — mathematically, independently, without trusting their own logging infrastructure — that the specific output cited in your denial letter is exactly what the model produced? That the input fed to the model was exactly what they claim? That nothing was altered between model inference and the decision that affected your life?

They cannot. Not today. Not with any AI system deployed in production anywhere in the world. The gap between what AI agents produce and what can be proven about their outputs is the accountability crisis hiding beneath every enterprise AI deployment. It is not a theoretical concern. It is a practical gap that regulators are beginning to notice, that litigation is beginning to exploit, and that no amount of logging — traditional logging, structured logging, immutable append-only logging — can close. Because logs are assertions. They are notes a system writes about itself. They are not proof.

Proof requires cryptography. Specifically, it requires digital signatures — a signer attesting to specific bytes at a specific time, verifiable by anyone with the public key and zero trust in the signer's infrastructure. And for AI audit trails that need to remain verifiable for five, seven, or ten years, those signatures need to be post-quantum. Classical signatures (RSA, ECDSA) will be forgeable by quantum computers within the retention window of AI audit records being created today.

The AI accountability gap

Consider the lifecycle of an AI decision in a typical enterprise deployment. A request arrives — a loan application, a fraud screening, a medical triage, a content moderation decision. The request is preprocessed: normalized, tokenized, enriched with contextual data from other systems. The preprocessed input is fed to a model. The model produces an output. The output is postprocessed: thresholded, formatted, enriched with explanations. The postprocessed result is delivered to a downstream system or a human operator who acts on it.

At every handoff in this pipeline, data can be altered — deliberately by a malicious actor, accidentally by a software bug, silently by a version mismatch between pipeline components. The preprocessing step could inject or omit data. The model could be swapped between the version that was tested and a different version in production. The postprocessing step could modify thresholds, suppress certain outputs, or add explanations that misrepresent what the model actually produced. The delivery step could alter the output before it reaches the action point.

Traditional logging records what each component claims happened. But every log entry is written by the component itself. A compromised preprocessing step logs that it faithfully passed through all input data — even if it silently dropped a critical field. A tampered postprocessing step logs that it applied the standard threshold — even if it used a different one for this particular request. The logs are consistent, complete, and wrong. Nobody can detect the discrepancy because the logs are the only record, and the logs are written by the systems being audited.

This is not a hypothetical. This is the current state of every enterprise AI deployment. The audit trail is self-reported by the audited system. It is the equivalent of a company preparing its own financial audit with no external auditor and no signed source documents. We would never accept this for financial reporting. We are accepting it for AI decisions that deny mortgages, flag fraud, approve medical treatments, and moderate speech.

What cryptographic attestation provides

Cryptographic attestation breaks the self-reporting loop. Instead of each pipeline component logging what it claims to have produced, each component signs what it actually produced — at the exact point of production, using a signing key that the component controls. The signature is a mathematical commitment that cannot be forged without the signing key and cannot be altered after the fact. A downstream verifier — a regulator, an auditor, an affected individual's legal counsel — can check the signature directly against the signer's public key without any trust in the signer's logging infrastructure, database integrity, or operational honesty.

For an AI inference pipeline, this means: the preprocessing step signs its output (the exact bytes fed to the model). The model runtime signs its output (the exact bytes the model produced). The postprocessing step signs its output (the exact bytes delivered downstream). Each signature is independently verifiable. If any component's signed output does not match the next component's signed input, the discrepancy is detectable and provable. The chain of attestations creates a verifiable record of exactly what happened at each stage, signed by the component that did it, verifiable by anyone.

This is what H33 provides through the H33-74 substrate. Every computation — every model inference, every preprocessing step, every postprocessing transformation — produces a 74-byte attestation that binds the computation output to a three-family post-quantum signature. The attestation is independently verifiable by anyone with the public key set. No API call to H33 required. No trust in the AI vendor's infrastructure required. Pure cryptographic verification.

Why post-quantum matters for AI audit trails

AI audit trails have a uniquely long verification horizon. Consider the EU AI Act, which requires that high-risk AI systems maintain audit logs for the lifetime of the system plus a reasonable period afterward. Consider the proposed US AI accountability frameworks, which suggest retention periods of five to ten years for AI decisions affecting individuals. Consider litigation timelines — a class action challenging AI-driven lending discrimination may not be filed until years after the decisions were made, and discovery may require verification of AI outputs from half a decade earlier.

Classical digital signatures (RSA-2048, ECDSA-P256) will be broken by quantum computers running Shor's algorithm. The timeline is uncertain, but the most informed estimates suggest the early 2030s for cryptographically relevant quantum computers. An AI audit record signed with ECDSA today and challenged in litigation in 2032 may be unverifiable — worse, it may be forgeable. An adversary with quantum computing access could create alternative AI outputs, sign them with forged classical signatures, and present them as the original record. The audit trail that was supposed to prove what the AI did becomes a liability.

Post-quantum signatures eliminate this risk. H33-74 uses three NIST-standardized post-quantum signature families: ML-DSA-65 (FIPS 204, lattice-based), FALCON-512 (FIPS 206, NTRU-lattice-based), and SLH-DSA-SHA2-128f (FIPS 205, hash-based). Each relies on a different mathematical hardness assumption. To forge an H33-74 attestation, an adversary must simultaneously break MLWE lattices, NTRU lattices, and hash function pre-image resistance — three independent mathematical catastrophes, not one. AI audit records signed with H33-74 today remain verifiable and unforgeable regardless of quantum computing advances.

The H33-Agent-74 delegation chain

Multi-agent AI systems — where multiple specialized agents collaborate on a task, passing outputs between each other — create a particularly challenging audit problem. Agent A produces an intermediate result. Agent B consumes that result and produces its own output. Agent C takes Agent B's output and makes a final decision. If the final decision is challenged, the verifier needs to trace the entire chain: what did A actually produce? Did B receive exactly that? What did B produce? Did C receive exactly that?

H33-Agent-74 is the delegation chain protocol built on the H33-74 substrate specifically for multi-agent pipelines. Each agent in the chain produces an attestation of its output. Each attestation includes, in its canonical data, the content hash of the previous agent's attestation — creating a cryptographic chain where each link is independently verifiable and the chain as a whole proves the exact sequence of transformations from initial input to final output.

The construction works as follows. Agent A processes input data and produces output O_A. Agent A signs O_A with its three-family key set, producing substrate S_A with content hash H(O_A). Agent B receives O_A and S_A. Before processing, Agent B verifies S_A to confirm that O_A is authentic and untampered. Agent B then processes O_A to produce O_B. Agent B signs O_B with its own key set, but includes H(S_A) in the canonical data that forms S_B. This creates a cryptographic link: S_B commits to O_B and also commits to S_A, which commits to O_A. Agent C repeats the pattern, including H(S_B) in its canonical data.

The result is a verifiable chain of custody for AI outputs. A regulator or auditor can start at any point in the chain and verify backward to the original input. Every link in the chain is independently verifiable. No single agent can retroactively alter its output without invalidating all downstream attestations. No downstream agent can claim it received different input than what the upstream agent actually produced. The cryptographic chain makes the entire pipeline's behavior provable, not just loggable.

Practical deployment: attesting model inference

The most common integration point is model inference attestation. When a model produces an output, the attestation captures the essential audit fields: a SHA3-256 hash of the model input, a SHA3-256 hash of the model output, a model version identifier, and a millisecond timestamp. These fields are canonically encoded and form the content that the substrate attests. The resulting 74-byte substrate is stored alongside the model output in whatever storage system the organization already uses.

The attestation does not store the input or output themselves — those remain wherever they already live. The substrate binds them. Given the original input and output, any verifier can recompute the SHA3-256 hashes and check them against the substrate. If they match, the verifier knows that the specific input produced the specific output at the specific time, as attested by the holder of the signing keys. If they do not match, something was altered.

For organizations running models at scale, H33's batch attestation aggregates multiple inference attestations into a single Merkle tree, signed once per batch with the three-family bundle. At one million inferences per batch, the per-inference signing cost is amortized to effectively zero. The production infrastructure processes 2,216,488 attestations per second — sufficient for any enterprise AI workload.

Regulatory alignment: what auditors will ask

The regulatory landscape for AI accountability is converging on a common set of requirements across jurisdictions. The EU AI Act requires that high-risk AI systems maintain logs sufficient to enable monitoring and post-hoc analysis of decisions. The proposed US AI frameworks (NIST AI RMF, proposed FTC AI rules, proposed SEC AI disclosure rules) emphasize accountability and auditability. Financial regulators (OCC, Fed, FDIC) are extending existing model risk management guidance (SR 11-7) to cover AI/ML models with explicit expectations around audit trail integrity.

What all these frameworks have in common is the expectation that AI decisions are reconstructable and verifiable after the fact. Not just logged — verifiable. An examiner should be able to take a specific AI decision, request the supporting records, and independently confirm that the decision was produced by the claimed model on the claimed inputs at the claimed time. Classical logging meets the "logged" requirement. It does not meet the "independently verifiable" requirement because logs can be altered by anyone with administrative access to the logging system.

Post-quantum attestation via H33-74 meets both requirements. The attestation is logged (stored alongside existing records) and independently verifiable (checkable with public keys, no infrastructure trust required). When an examiner asks "can you prove this AI decision was produced by the model you claim, on the inputs you claim, at the time you claim?" — the answer is yes, mathematically, with a 74-byte proof that survives quantum computing advances.

The multi-model problem

Modern AI systems rarely use a single model. A fraud detection pipeline might use an embedding model, a scoring model, a rules engine, and a decision model. A medical triage system might use a symptom extraction model, a differential diagnosis model, and a severity scoring model. Each model in the pipeline can be updated independently. Each model may be provided by a different vendor. Each model's output feeds the next model's input.

Without attestation, version coordination across multi-model pipelines is operationally challenging and unverifiable. Did the scoring model receive output from embedding model v2.3 or v2.4? Was the rules engine updated between the time the scoring model ran and the time the decision model ran? These questions matter because model version changes can materially affect outputs, and regulators evaluating contested AI decisions need to know exactly which models produced which outputs.

H33-74 attestations include model version in the canonical data. The delegation chain construction links each model's attestation to the previous model's attestation. The result is a verifiable record of exactly which model version produced exactly which output, in exactly which sequence, with exactly which inputs at each stage. Model version coordination becomes cryptographically provable, not just operationally claimed.

Tamper detection across organizational boundaries

AI decisions frequently cross organizational boundaries. A bank uses a third-party AI vendor for credit scoring. An insurance company uses an external model for claims assessment. A healthcare provider uses a vendor's model for clinical decision support. In each case, the AI vendor produces an output, transmits it to the customer, and the customer acts on it. If the decision is challenged, both parties need to agree on what the model actually produced.

Without attestation, this is a he-said-she-said problem. The vendor says the model produced output X. The customer says they received output Y. Neither can prove their claim because the transmission channel is not cryptographically attested. Logs on both sides may be inconsistent, and neither party's logs are trustworthy to the other party because each party controls their own logging infrastructure.

H33-74 resolves this by having the vendor attest the output at the point of production. The vendor signs the model output with their three-family key set. The customer receives the output and the 74-byte attestation. The customer can independently verify the attestation against the vendor's published public keys. If the customer later claims they received a different output, the attestation proves otherwise. If the vendor later claims they produced a different output, the customer's copy of the attestation proves otherwise. The cryptographic attestation creates a shared, independently verifiable record that neither party can unilaterally alter.

Performance at AI scale

AI inference pipelines operate at machine speed. A large language model serving API requests handles thousands of inferences per second. A real-time fraud detection system evaluates millions of transactions per day. A content moderation pipeline processes billions of items per month. Any attestation system that introduces meaningful latency or throughput constraints is unusable at these scales.

H33's attestation infrastructure is engineered for exactly this workload. The production pipeline processes 2,216,488 attestations per second on a single ARM server. Per-attestation latency is under 42 microseconds — invisible at the timescales of model inference (which typically takes milliseconds to seconds). The 74-byte persistent footprint means storage overhead is negligible: one billion AI decisions per day adds 74 gigabytes of attestation data — less than the model weights of a single large language model.

Batch attestation further reduces overhead for high-volume workloads. Within a configurable time window, individual attestations are aggregated into a Merkle tree and signed once with the three-family bundle. For AI workloads that do not require per-inference real-time verification — batch processing, offline analysis, periodic compliance reporting — this reduces the signing cost per inference to effectively zero while maintaining independent verifiability for any individual inference via its Merkle inclusion proof.

The five-year verification window

An AI audit trail created today must remain verifiable for the duration of its retention period. For high-risk AI decisions, that retention period is five to ten years. Within that window, the cryptographic signatures protecting the audit trail must remain unforgeable. Any signature scheme that may be broken within the retention window creates a gap between the record's retention requirement and its actual integrity guarantee.

Classical signatures (RSA-2048, ECDSA-P256) have an uncertain remaining lifespan against quantum adversaries. The post-quantum transition is not optional for records that need to survive five to ten years. H33-74's three-family construction provides defense in depth: even if one of the three mathematical assumptions is broken within the retention window, the remaining two families maintain the attestation's integrity. An adversary would need three simultaneous mathematical breakthroughs — against MLWE lattices, NTRU lattices, and hash function pre-image resistance — to forge a single attestation. This is the strongest guarantee currently achievable against an uncertain cryptanalytic future.

For AI systems specifically, this means every attested decision remains provable regardless of quantum computing advances. A model inference attested in April 2026 is verifiable in April 2036 with the same mathematical certainty it had on the day it was created. No re-signing required. No key rotation required. No infrastructure changes required. The 74 bytes persist, and the proof persists with them.

Implementation path

Integrating H33-74 attestation into an existing AI pipeline requires minimal architectural change. The attestation API accepts the canonical data (model input hash, model output hash, model version, timestamp) and returns a 74-byte substrate. The integration point is a single API call after model inference, before the output is delivered downstream. For organizations that want on-premises deployment, the attestation runtime runs inside a Trusted Execution Environment on the organization's own hardware with no external dependency.

The verification path is equally straightforward. Given the original model input and output, a verifier recomputes the hashes, checks them against the substrate, and verifies the three-family signatures against the signer's published public keys. The verification crate is source-available for independent audit. No runtime dependency on H33 infrastructure. No API call required. Pure offline cryptographic verification.

AI decisions that affect people's lives deserve better than self-reported logs. They deserve proof — mathematical, independently verifiable, quantum-resistant proof that what was recorded is what actually happened. H33-74 provides that proof at machine speed, at negligible storage cost, with three-family post-quantum security that survives the entire retention window. The technology exists. The standards are finalized. The only remaining question is whether your AI audit trails will be provable when someone finally asks for proof.

Prove Every AI Decision

H33-74 provides NIST-compliant post-quantum attestation for AI inference pipelines. 74 bytes per decision. Independently verifiable. Quantum-resistant.

Get API Key Read the Docs