Replayable AI Execution

Q: How does deterministic policy evaluation work?

Before every action, the agent governance engine evaluates the proposed action against five policy scopes: organization, team, project, session, and action. Each scope produces a deterministic boolean result. The evaluation is not advisory -- it is a gate. If any scope denies the action, the action does not execute, and the denial itself is attested as a structured rejection node in the execution DAG. The 5-scope evaluation is hash-chained, meaning the policy decision is as replayable as the action itself. You can prove not just what the agent did, but what it was allowed to do and what it was denied.

Operational Workflow

Eight Steps from Registration to Verified Replay

Each step produces a cryptographic artifact. The execution DAG grows with every action. At session close, the entire DAG is committed as a single H33-74 proof bundle. Any session can be replayed deterministically.

Register agent with canonical name

Every agent receives a canonical name following the H33 naming convention: h33.agent.<org>.<domain>.<role>.<env>.<seq>. The registration creates the agent's identity in the governance graph and binds it to the organization's policy hierarchy. The registration itself is attested with H33-74.

Start session -- every action becomes a node in the execution DAG

Opening a session creates the root node of the execution DAG. The root contains the agent identity, the session timestamp, the initial policy snapshot, and the parent session hash (if this is a continuation). Every subsequent action appends a node to the DAG. The DAG is append-only by construction: each node includes the hash of its parent, so modifying any node invalidates every descendant.

Agent calls tools -- each tool call attested with ToolEnvelope

Every external tool call is wrapped in a ToolEnvelope containing the tool name, request payload hash, response payload hash, latency measurement, and the policy scope that authorized the call. The ToolEnvelope is attested and appended to the DAG. An auditor can verify exactly what the agent sent, what it received, and whether the call was authorized.

Policy evaluation -- deterministic 5-scope check before each action

Before every action, the governance engine evaluates five policy scopes: organization, team, project, session, and action. Each scope produces a deterministic boolean. This is a gate, not advisory. Denials are attested as structured rejection nodes in the DAG -- proving not just what the agent did, but what it was prevented from doing and why.

Memory checkpoints -- hash-chained state snapshots

At configurable intervals, the agent's working memory is hashed and committed as a checkpoint node in the DAG. The checkpoint captures the agent's state at that moment without exposing the state contents. During replay, the checkpoint hashes serve as intermediate verification points -- if the replayed state hash matches the checkpoint, the execution up to that point is confirmed correct.

Session closes with H33-74 commitment

When the session closes, the entire execution DAG is committed to a final H33-74 proof bundle. The bundle contains the root hash, the final node hash, the total action count, the total tool call count, and the three PQ signatures. The 74-byte bundle is the entire session's audit trail in a single independently verifiable object.

Replay any session deterministically

Any session can be replayed using the H33 Verifier CLI. The replay engine re-executes the decision sequence, re-derives every intermediate hash, and compares against the original attestation. If every node matches, the session is confirmed authentic. If any node diverges, the CLI identifies the exact point of divergence.

Compare pre/post with diff

The h33 diff command compares two session receipts node-by-node. This is used for regression testing (did the agent's behavior change between versions?), compliance auditing (did the agent's behavior change between policy updates?), and incident investigation (what exactly changed between the known-good session and the suspected-bad session?).

Verify this workflow

# Replay a session with 100 iterations for statistical confidence
h33 replay session session.json --iterations 100

# Verify a session receipt
h33 verify receipt session-receipt.json

# Diff two session receipts
h33 diff receipt before.json after.json

# Validate conformance vectors
h33 verify conformance agent-vectors-v1.json

SDK Integration

Full Agent Lifecycle in TypeScript

The @h33/agent SDK provides the full lifecycle: registration, session management, action attestation, tool envelopes, policy evaluation, memory checkpoints, and replay verification.

TypeScript agent-lifecycle.ts

import { H33Agent, Session, ToolEnvelope } from '@h33/agent';

// Step 1: Register agent with canonical name
const agent = await H33Agent.register({
  name: 'h33.agent.acme.finance.analyst.prod.001',
  apiKey: process.env.H33_API_KEY,
  endpoint: 'https://api.h33.ai/v1',
  policy: {
    scopes: ['organization', 'team', 'project', 'session', 'action'],
  },
});

// Step 2: Start session
const session: Session = await agent.startSession({
  purpose: 'quarterly-revenue-analysis',
  parentSession: null,          // no continuation
  memoryCheckpointInterval: 10, // checkpoint every 10 actions
});

console.log(`Session ${session.id} started`);
console.log(`  Root hash: ${session.rootHash}`);

// Step 3: Execute actions with tool calls
const dataResult = await session.action({
  type: 'tool_call',
  tool: 'database.query',
  input: { sql: 'SELECT revenue FROM q1_2026 WHERE region = $1', params: ['EMEA'] },
});

// Every tool call is wrapped in a ToolEnvelope
console.log(`  Tool envelope: ${dataResult.toolEnvelope.hash}`);
console.log(`  Policy authorized: ${dataResult.policyResult.authorized}`);
console.log(`  DAG node: ${dataResult.nodeHash}`);

// Step 4: Agent performs analysis (attested action)
const analysis = await session.action({
  type: 'computation',
  description: 'Calculate YoY growth rate for EMEA revenue',
  input: { currentRevenue: dataResult.output.encrypted, priorRevenue: '...' },
});

// Step 5: Memory checkpoint (automatic at interval, or manual)
const checkpoint = await session.checkpoint();
console.log(`  Checkpoint hash: ${checkpoint.hash}`);
console.log(`  Actions since last checkpoint: ${checkpoint.actionCount}`);

// Step 6: Close session with H33-74 commitment
const receipt = await session.close();

console.log(`Session closed`);
console.log(`  Total actions: ${receipt.actionCount}`);
console.log(`  Total tool calls: ${receipt.toolCallCount}`);
console.log(`  Final DAG hash: ${receipt.dagHash}`);
console.log(`  H33-74 bundle: ${receipt.attestation.bundleSize} bytes`);
console.log(`  Replay-deterministic: ${receipt.replayDeterministic}`);

// Step 7: Replay the session programmatically
const replayResult = await agent.replay({
  sessionReceipt: receipt,
  iterations: 100,
});

console.log(`Replay: ${replayResult.allMatch ? 'PASS' : 'FAIL'}`);
console.log(`  Iterations: ${replayResult.iterations}`);
console.log(`  Divergence point: ${replayResult.divergenceNode ?? 'none'}`);

// Step 8: Diff two sessions
const diff = await agent.diff({
  before: previousReceipt,
  after: receipt,
});

console.log(`Diff: ${diff.identical ? 'identical' : diff.changes.length + ' changes'}`);
for (const change of diff.changes) {
  console.log(`  Node ${change.index}: ${change.type} -- ${change.description}`);
}

curl verify-session.sh

# Verify a session receipt via the public API
curl -X POST https://api.h33.ai/v1/verify/session \
  -H "Content-Type: application/json" \
  -d @session-receipt.json

# Replay a session via the public API
curl -X POST https://api.h33.ai/v1/replay/session \
  -H "Content-Type: application/json" \
  -d '{"receipt": "session-receipt.json", "iterations": 100}'

Applicable Specifications

Specs, Schemas, and Conformance Vectors

Every component in the agent governance workflow maps to a published specification, a machine-readable schema, or a conformance vector.

Component	Specification	Conformance Vector
Agent registration	Agent Governance Spec	`agent-register-v1`
Session lifecycle	Agent Governance Spec	`session-lifecycle-v1`
ToolEnvelope	Agent Governance Spec	`tool-envelope-v1`
5-scope policy evaluation	Agent Governance Spec	`policy-5scope-v1`
Memory checkpoints	Agent Governance Spec	`memory-checkpoint-v1`
Execution DAG	Agent Governance Spec	`execution-dag-v1`
Session attestation	H33-74 Proof Bundle Spec	`session-attest-v1`
Deterministic replay	Governance Replay Spec	`replay-agent-v1`
Adversarial validation	H33-Chaos Spec	`chaos-agent-v1`

Frequently Asked Questions

Answers to operational questions about agent attestation, execution DAGs, deterministic replay, and conformance validation.

What does "replayable" mean in the context of AI execution?

Replayable means that given the same session artifact, any independent party can re-execute the agent's decision sequence and arrive at the identical attestation hash. This is not log replay. It is deterministic cryptographic replay: every action, tool call, policy evaluation, and state transition is hash-chained into an execution DAG. The replay engine re-derives every intermediate hash and compares it against the original attestation. If any node diverges, the replay fails and identifies the exact point of divergence.

How is each agent action attested?

Every action produces a node in the execution DAG. The node contains: the action type, the input hash, the output hash, the policy evaluation result (5-scope check), the parent node hash, and a timestamp. The node is signed with three post-quantum signature families and compressed to a 74-byte H33-74 proof bundle. The parent hash creates a hash chain -- modifying any earlier node invalidates every subsequent node in the DAG. This is append-only by construction, not by policy.

What is a ToolEnvelope and why does it matter?

A ToolEnvelope wraps every external tool call with a cryptographic attestation. When an agent calls an external API, database, or service, the ToolEnvelope captures: the tool name, the request payload hash, the response payload hash, the latency, the policy scope that authorized the call, and the H33-74 attestation. An auditor can confirm not just that the agent made a tool call, but exactly what it sent, what it received, and whether the call was authorized by the governance policy.

How does deterministic policy evaluation work?

Before every action, the governance engine evaluates the proposed action against five policy scopes: organization, team, project, session, and action. Each scope produces a deterministic boolean result. If any scope denies the action, the action does not execute, and the denial itself is attested as a structured rejection node in the execution DAG. The 5-scope evaluation is hash-chained, meaning the policy decision is as replayable as the action itself.

What are conformance vectors and how are they validated?

Conformance vectors are canonical input/output pairs that define correct behavior for the agent governance engine. There are 20 conformance vectors covering policy evaluation, session lifecycle, tool authorization, memory checkpointing, and replay determinism. Each vector specifies an input, the expected output, and the expected attestation hash. Vectors are published at /conformance/agent-vectors/v1/ and are machine-verifiable: h33 verify conformance agent-vectors-v1.json.

Replayable AI Execution: Every Agent Decision as an Independently Verifiable Artifact