PricingDemo
Log InGet API Key
Adversarial Validation Infrastructure

H33-Chaos

We don't test whether our infrastructure works. We test whether it survives. H33-Chaos is a purpose-built adversarial framework that attacks every cryptographic primitive, every execution graph, every session boundary, and every identity assertion in the H33 stack. If it passes Chaos, it ships.

54
Attack vectors tested
4
Attack categories
0
Undetected attacks
100.0%
Detection rate
Philosophy

Break Before They Do

Unit tests prove code does what you intended. H33-Chaos proves code resists what adversaries intend. The difference is the gap between passing an exam and surviving a fight.

01 — Adversarial-First

Attack the graph, not the API

Every test in H33-Chaos operates below the API surface. We don't test endpoints — we tamper with the underlying execution DAG, forge identity assertions, corrupt hash chains, and inject nodes into sessions we don't own. If the infrastructure catches it, the API never needs to.

02 — Deterministic Proof

Same graph, same timestamp, byte-identical output

Replay determinism isn't a metric — it's a binary. Either every replay of every graph at every timestamp produces byte-identical output, or the system fails. We test this with 50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders.

03 — Cascade Awareness

One corrupted node must corrupt everything downstream

Integrity verification isn't useful if corruption is local. H33-Chaos specifically tests that a single tampered node causes every downstream verification to fail. Cascade detection is the difference between catching fraud and missing it.

Attack Surface

Four Categories of Adversarial Testing

Each category targets a different layer of the cryptographic infrastructure. Tests are designed to model realistic attack scenarios, not theoretical edge cases.

DAG Tampering

19 tests
Forged Receipt Injection
Insert receipt with valid structure but forged hash chain
Payload Mutation
Modify node payload after insertion, verify integrity catch
Node Deletion
Remove interior node from DAG, verify chain break detection
Node Insertion
Inject unauthorized node between existing chain links
Node Reordering
Swap node positions to test ordering-sensitive verification
Circular Dependency
Create self-referencing predecessor chain
Cross-Session Injection
Insert node from Session A into Session B's chain
1,000-Node Stress
Massive DAG integrity under high node count
Cascade Corruption
Single tampered node must invalidate all downstream

+ 10 additional vectors: hash collision attempts, merkle root forgery, predecessor chain manipulation, double-spend nodes, timestamp backdating, phantom sessions, root hash substitution, parallel chain grafting, state rollback injection, orphan node flooding

Replay Attacks

13 tests
1,000-Iteration Determinism
Replay same graph 1,000 times, verify byte-identical output
8-Thread Concurrent Replay
Parallel replay must produce identical results across threads
Replay Determinism Score
50-node DAG × 10 timestamps × 100 replays = 50,000 ops
Fork Divergence
Different policy inputs produce provably different outputs
Post-Growth Stability
Replay at timestamp T after DAG grows beyond T
Insertion-Order Independence
Shuffled insertion order produces identical replay output

+ 7 additional vectors: timestamp sensitivity, replay frame hash verification, fork-and-merge consistency, selective node replay, boundary timestamp precision, partial graph replay, cross-session replay isolation

Session Exploitation

12 tests
Expired Session Use
Attempt to record action on closed session
Wrong Agent Binding
Agent A tries to use Agent B's session
Scope Violations
Tool call outside session's declared scope boundaries
Race Conditions
Concurrent session start/close with interleaved actions
Delegation Depth
Exceeded delegation budget via chained sub-delegations
Chain Break Detection
Session with missing predecessor in action chain

+ 6 additional vectors: double-close, session hijacking via forged canonical name, scope escalation, action-after-H33-74-close, temporal boundary crossing, session-to-session leakage

Identity Forgery

10 tests
Forged Agent Identity
Fabricated AgentIdentity with valid-looking structure
Revoked Key Usage
Sign action with key that has been rotated out
Agent Impersonation
Use Agent A's canonical name with Agent B's key
100-Agent Stress
Registry integrity under high agent count
Collision Resistance
Canonical name uniqueness under adversarial registration

+ 5 additional vectors: key rotation mid-session, multi-key signing, ghost agent registration, name_history tampering, cross-org identity leakage

Results

Every Attack Detected. Every Corruption Caught.

H33-Chaos runs as part of every CI pipeline. No code merges to main without a clean Chaos run. These are not aspirational numbers — they are gating criteria.

Category Tests Passed Failed Detection Rate
DAG Tampering 19 19 0 100.0%
Replay Attacks 13 13 0 100.0%
Session Exploitation 12 12 0 100.0%
Identity Forgery 10 10 0 100.0%
Total 54 54 0 100.0%
VERDICT: ALL 54 ATTACK VECTORS DETECTED
No undetected attacks. No false negatives. No corruptions survived verification. Every tampered node, forged identity, and replay manipulation was caught before it could propagate.
Deterministic Replay

Replay Determinism Score: 100.0%

Same graph + same timestamp = byte-identical output. This is not a percentage that can be 99.9%. It is a binary that must be 100.0% or the system is broken.

// Replay determinism test output PASS replay_determinism_score 50-node DAG x 10 timestamps x 100 replays Total operations: 50,000 Divergences: 0 Score: 100.0% PASS concurrent_replay_determinism (8 threads) PASS post_growth_replay_stability PASS insertion_order_independence PASS fork_divergence_verification PASS 1000_iteration_consistency All 13 replay tests passed. Zero divergence.
Architecture

How H33-Chaos Works

H33-Chaos operates at the data structure level, not the API level. It directly manipulates the execution DAG, identity registry, session manager, and replay engine to simulate attacks that would bypass API-level validation.

Below the API Surface

Tests operate on raw AgentExecutionDag, AgentRegistry, SessionManager, and ReplayEngine structs. No HTTP. No auth middleware. No safety nets. If the data structures are vulnerable, Chaos finds it.

Hash Chain Verification

Every node in the execution DAG contains a SHA3-256 hash of its payload, its predecessor hash, and the H33_AGENT_V1 domain separator. Chaos tests that tampering with any single byte causes verification failure across the entire downstream chain.

CI/CD Gating

H33-Chaos runs on every pull request merge. The full 54-test suite executes in under 3 seconds. A single failure blocks the merge. No exceptions, no overrides, no "skip tests" flags.

Conformance Vector Generation

H33-Chaos produces 20 conformance test vectors (AGT-TV-001 through AGT-TV-020) that any independent implementation can use to verify compatibility. The vectors are the contract; the test suite enforces it.

// Run the full H33-Chaos adversarial suite cargo test --test dag_tampering --release cargo test --test replay_attacks --release cargo test --test session_attacks --release cargo test --test identity_attacks --release // Generate conformance vectors cargo run --bin generate_agent_vectors --release
Performance Under Adversarial Load

Graviton4 Metal Benchmark

Adversarial testing is meaningless without production-scale validation. These numbers were measured on Graviton4 bare metal under sustained load.

24.79M
Attestations/sec
(192-core parallel)
2.12M
Sessions/sec
(full lifecycle)
2.5ms
Replay latency
(10K-action graph)
166ms
Verification
(100K-node DAG)
Guarantees

What H33-Chaos Proves

Integrity

Tampered data cannot survive verification

A single modified byte in any node payload, predecessor hash, or session binding causes the entire downstream chain to fail verification. Corruption doesn't hide — it cascades into detection.

Determinism

Replay is binary, not probabilistic

50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders all produced byte-identical output. The Replay Determinism Score is 100.0% — not 99.99%, not "approximately." Exactly 100.0%.

Isolation

Sessions cannot cross-contaminate

Every session is cryptographically bound to its agent identity and scope declaration. Wrong-agent binding, cross-session injection, and scope escalation are all caught at the data structure level before any API handler fires.

Verify independently. Trust nothing.

H33-Chaos results are reproducible. The conformance vectors are published. The verifier is open. Run it yourself.

View Conformance Vectors Full Benchmark Data Public Verifier Agent Specification