H33-Chaos

Philosophy

Break Before They Do

Unit tests prove code does what you intended. H33-Chaos proves code resists what adversaries intend. The difference is the gap between passing an exam and surviving a fight.

01 — Adversarial-First

Attack the graph, not the API

Every test in H33-Chaos operates below the API surface. We don't test endpoints — we tamper with the underlying execution DAG, forge identity assertions, corrupt hash chains, and inject nodes into sessions we don't own. If the infrastructure catches it, the API never needs to.

02 — Deterministic Proof

Same graph, same timestamp, byte-identical output

Replay determinism isn't a metric — it's a binary. Either every replay of every graph at every timestamp produces byte-identical output, or the system fails. We test this with 50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders.

03 — Cascade Awareness

One corrupted node must corrupt everything downstream

Integrity verification isn't useful if corruption is local. H33-Chaos specifically tests that a single tampered node causes every downstream verification to fail. Cascade detection is the difference between catching fraud and missing it.

Attack Surface

Four Categories of Adversarial Testing

Each category targets a different layer of the cryptographic infrastructure. Tests are designed to model realistic attack scenarios, not theoretical edge cases.

DAG Tampering

19 tests

✓

Forged Receipt Injection

Insert receipt with valid structure but forged hash chain

✓

Payload Mutation

Modify node payload after insertion, verify integrity catch

✓

Node Deletion

Remove interior node from DAG, verify chain break detection

✓

Node Insertion

Inject unauthorized node between existing chain links

✓

Node Reordering

Swap node positions to test ordering-sensitive verification

✓

Circular Dependency

Create self-referencing predecessor chain

✓

Cross-Session Injection

Insert node from Session A into Session B's chain

✓

1,000-Node Stress

Massive DAG integrity under high node count

✓

Cascade Corruption

Single tampered node must invalidate all downstream

+ 10 additional vectors: hash collision attempts, merkle root forgery, predecessor chain manipulation, double-spend nodes, timestamp backdating, phantom sessions, root hash substitution, parallel chain grafting, state rollback injection, orphan node flooding

Replay Attacks

13 tests

✓

1,000-Iteration Determinism

Replay same graph 1,000 times, verify byte-identical output

✓

8-Thread Concurrent Replay

Parallel replay must produce identical results across threads

✓

Replay Determinism Score

50-node DAG × 10 timestamps × 100 replays = 50,000 ops

✓

Fork Divergence

Different policy inputs produce provably different outputs

✓

Post-Growth Stability

Replay at timestamp T after DAG grows beyond T

✓

Insertion-Order Independence

Shuffled insertion order produces identical replay output

+ 7 additional vectors: timestamp sensitivity, replay frame hash verification, fork-and-merge consistency, selective node replay, boundary timestamp precision, partial graph replay, cross-session replay isolation

Session Exploitation

12 tests

✓

Expired Session Use

Attempt to record action on closed session

✓

Wrong Agent Binding

Agent A tries to use Agent B's session

✓

Scope Violations

Tool call outside session's declared scope boundaries

✓

Race Conditions

Concurrent session start/close with interleaved actions

✓

Delegation Depth

Exceeded delegation budget via chained sub-delegations

✓

Chain Break Detection

Session with missing predecessor in action chain

+ 6 additional vectors: double-close, session hijacking via forged canonical name, scope escalation, action-after-H33-74-close, temporal boundary crossing, session-to-session leakage

Identity Forgery

10 tests

✓

Forged Agent Identity

Fabricated AgentIdentity with valid-looking structure

✓

Revoked Key Usage

Sign action with key that has been rotated out

✓

Agent Impersonation

Use Agent A's canonical name with Agent B's key

✓

100-Agent Stress

Registry integrity under high agent count

✓

Collision Resistance

Canonical name uniqueness under adversarial registration

+ 5 additional vectors: key rotation mid-session, multi-key signing, ghost agent registration, name_history tampering, cross-org identity leakage

Results

Every Attack Detected. Every Corruption Caught.

H33-Chaos runs as part of every CI pipeline. No code merges to main without a clean Chaos run. These are not aspirational numbers — they are gating criteria.

Category	Tests	Passed	Detection Rate
DAG Tampering	19	19	100.0%
Replay Attacks	13	13	100.0%
Session Exploitation	12	12	100.0%
Identity Forgery	10	10	100.0%
Total	54	54	100.0%

Deterministic Replay

Replay Determinism Score: 100.0%

Same graph + same timestamp = byte-identical output. This is not a percentage that can be 99.9%. It is a binary that must be 100.0% or the system is broken.

// Replay determinism test output
PASS replay_determinism_score
     50-node DAG x 10 timestamps x 100 replays
     Total operations: 50,000
     Divergences:      0
     Score:            100.0%

PASS concurrent_replay_determinism (8 threads)
PASS post_growth_replay_stability
PASS insertion_order_independence
PASS fork_divergence_verification
PASS 1000_iteration_consistency

All 13 replay tests passed. Zero divergence.

Architecture

How H33-Chaos Works

H33-Chaos operates at the data structure level, not the API level. It directly manipulates the execution DAG, identity registry, session manager, and replay engine to simulate attacks that would bypass API-level validation.

Below the API Surface

Tests operate on raw AgentExecutionDag, AgentRegistry, SessionManager, and ReplayEngine structs. No HTTP. No auth middleware. No safety nets. If the data structures are vulnerable, Chaos finds it.

Hash Chain Verification

Every node in the execution DAG contains a SHA3-256 hash of its payload, its predecessor hash, and the H33_AGENT_V1 domain separator. Chaos tests that tampering with any single byte causes verification failure across the entire downstream chain.

CI/CD Gating

H33-Chaos runs on every pull request merge. The full 54-test suite executes in under 3 seconds. A single failure blocks the merge. No exceptions, no overrides, no "skip tests" flags.

Conformance Vector Generation

H33-Chaos produces 20 conformance test vectors (AGT-TV-001 through AGT-TV-020) that any independent implementation can use to verify compatibility. The vectors are the contract; the test suite enforces it.

        // Run the full H33-Chaos adversarial suite
cargo test --test dag_tampering --release
cargo test --test replay_attacks --release
cargo test --test session_attacks --release
cargo test --test identity_attacks --release

// Generate conformance vectors
cargo run --bin generate_agent_vectors --release
    

Performance Under Adversarial Load

Graviton4 Metal Benchmark

Adversarial testing is meaningless without production-scale validation. These numbers were measured on Graviton4 bare metal under sustained load.

24.79M

Attestations/sec
(192-core parallel)

2.12M

Sessions/sec
(full lifecycle)

2.5ms

Replay latency
(10K-action graph)

166ms

Verification
(100K-node DAG)

Full Benchmark Data Conformance Vectors

Guarantees

What H33-Chaos Proves

Integrity

Tampered data cannot survive verification

A single modified byte in any node payload, predecessor hash, or session binding causes the entire downstream chain to fail verification. Corruption doesn't hide — it cascades into detection.

Determinism

Replay is binary, not probabilistic

50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders all produced byte-identical output. The Replay Determinism Score is 100.0% — not 99.99%, not "approximately." Exactly 100.0%.

Isolation

Sessions cannot cross-contaminate

Every session is cryptographically bound to its agent identity and scope declaration. Wrong-agent binding, cross-session injection, and scope escalation are all caught at the data structure level before any API handler fires.