Agent Audit Trails

Cryptographic evidence for every action in an autonomous agent sequence.

Autonomous AI agents make sequences of decisions, call external tools, modify state, and produce outcomes that no single human reviewed. The audit problem is harder than single-shot AI: the agent's reasoning chain, tool invocations, and state changes each need verifiable evidence. H33 produces audit trails for each step in the agent sequence and for the sequence as a whole — tamper-evident, replayable, portable.

Why agentic AI is a harder audit problem

A single AI decision is one prompt and one response. An agent is different. An agent plans a sequence of steps; calls external tools (search, code execution, API requests, database queries); reads intermediate results and decides what to do next; maintains state across steps; may call other agents; and produces a final outcome that depends on the whole sequence. The audit surface is no longer one decision. It is many decisions with dependencies, executed over seconds to hours. Each step requires its own audit trail. The sequence as a whole requires an audit trail that links the steps. Standard logging is insufficient.

How H33 audits an agent sequence

For each step in the agent's sequence, H33 produces an evidence bundle. The bundle documents the step's input (the prior state, the prior tool result, or the initial prompt); the step's planning output; the tool called (if any), the parameters passed, the result returned; the policy that governed the step; the authority of the agent at the time; and the confidence reported by the agent for the step's decision. The bundle is signed with three independent post-quantum algorithm families. The bundle's commitment can be anchored. The sequence as a whole is linked by hash chain. Each bundle references the prior bundle's digest. Tampering with any bundle in the chain — or removing a bundle from the middle — breaks the chain at the modified point.

The tool-call audit problem

The hardest part of agent auditing is the tool call. An agent decides to invoke an external tool, the tool executes, the agent processes the result. Each component needs auditable evidence: the agent's decision to invoke the tool, the parameters passed, the tool's identity and version, the tool's result, and the agent's processing of the result. H33 captures each. The agent's decision is documented in PolicyBind and AuthorityBind. The tool's identity is documented in ModelFingerprint (extended to cover tools). The tool's input and output are documented in PipelineDag, with input and output digests linked into the hash chain.

Multi-agent systems

When agents call other agents, the audit problem compounds. H33 handles multi-agent systems by linking the sub-agent's bundle chain into the calling agent's bundle. The CorpusBind and EvidenceAttestation objects extend to capture sub-agent outputs as evidence rows. The calling agent's sequence is verifiable; the sub-agent's sub-sequence is verifiable; the combined trajectory is verifiable. The verifier can verify the top-level agent sequence, or recursively verify the sub-agent sequences, depending on the audit's depth requirement.

Use cases

Customer service agent audit. A customer service AI agent handles a complaint, accessing customer records, ordering a refund, escalating to a human. The audit trail documents each step. Code-execution agent audit. A developer-assistance agent generates code, executes it, observes results, modifies the code, and presents output. The audit trail documents each generation, execution, and modification. Research-assistant agent audit. A research-assistant agent reads documents, summarizes findings, queries databases, and produces a report. The audit trail documents what was read, queried, summarized, and how it fed the report. Financial-decision agent audit. A trading agent analyzes market data, consults policy, generates a recommendation, and executes the trade.

Common questions

Does this slow down the agent?
Bundle generation runs alongside agent execution and adds milliseconds per step.

Can I audit only some steps?
Yes. The audit policy specifies which steps require full bundles, which require summarized evidence, and which require no audit trail.

Does this work with LangChain, AutoGen, Crew AI, etc.?
Yes. The bundle generation is framework-agnostic. Integration with specific agent frameworks is supported via standard hooks.

How is this different from agent observability tools?
Observability tools track agent execution for debugging. H33 audit trails produce tamper-evident, verifiable, portable evidence.

Can the audit trail be summarized for executive review?
The full audit trail is the canonical evidence. Summaries can be generated from it for human review.

Get Started

Run the demo Download the verifier Download a bundle

Related: Agent Attestation · AI Agent Attestation · AI Evidence Chains · Agent Replay