We don't test whether our infrastructure works. We test whether it survives. H33-Chaos is a purpose-built adversarial framework that attacks every cryptographic primitive, every execution graph, every session boundary, and every identity assertion in the H33 stack. If it passes Chaos, it ships.
Unit tests prove code does what you intended. H33-Chaos proves code resists what adversaries intend. The difference is the gap between passing an exam and surviving a fight.
Every test in H33-Chaos operates below the API surface. We don't test endpoints — we tamper with the underlying execution DAG, forge identity assertions, corrupt hash chains, and inject nodes into sessions we don't own. If the infrastructure catches it, the API never needs to.
Replay determinism isn't a metric — it's a binary. Either every replay of every graph at every timestamp produces byte-identical output, or the system fails. We test this with 50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders.
Integrity verification isn't useful if corruption is local. H33-Chaos specifically tests that a single tampered node causes every downstream verification to fail. Cascade detection is the difference between catching fraud and missing it.
Each category targets a different layer of the cryptographic infrastructure. Tests are designed to model realistic attack scenarios, not theoretical edge cases.
+ 10 additional vectors: hash collision attempts, merkle root forgery, predecessor chain manipulation, double-spend nodes, timestamp backdating, phantom sessions, root hash substitution, parallel chain grafting, state rollback injection, orphan node flooding
+ 7 additional vectors: timestamp sensitivity, replay frame hash verification, fork-and-merge consistency, selective node replay, boundary timestamp precision, partial graph replay, cross-session replay isolation
+ 6 additional vectors: double-close, session hijacking via forged canonical name, scope escalation, action-after-H33-74-close, temporal boundary crossing, session-to-session leakage
+ 5 additional vectors: key rotation mid-session, multi-key signing, ghost agent registration, name_history tampering, cross-org identity leakage
H33-Chaos runs as part of every CI pipeline. No code merges to main without a clean Chaos run. These are not aspirational numbers — they are gating criteria.
| Category | Tests | Passed | Failed | Detection Rate |
|---|---|---|---|---|
| DAG Tampering | 19 | 19 | 0 | 100.0% |
| Replay Attacks | 13 | 13 | 0 | 100.0% |
| Session Exploitation | 12 | 12 | 0 | 100.0% |
| Identity Forgery | 10 | 10 | 0 | 100.0% |
| Total | 54 | 54 | 0 | 100.0% |
Same graph + same timestamp = byte-identical output. This is not a percentage that can be 99.9%. It is a binary that must be 100.0% or the system is broken.
// Replay determinism test output
PASS replay_determinism_score
50-node DAG x 10 timestamps x 100 replays
Total operations: 50,000
Divergences: 0
Score: 100.0%
PASS concurrent_replay_determinism (8 threads)
PASS post_growth_replay_stability
PASS insertion_order_independence
PASS fork_divergence_verification
PASS 1000_iteration_consistency
All 13 replay tests passed. Zero divergence.
H33-Chaos operates at the data structure level, not the API level. It directly manipulates the execution DAG, identity registry, session manager, and replay engine to simulate attacks that would bypass API-level validation.
Tests operate on raw AgentExecutionDag, AgentRegistry, SessionManager, and ReplayEngine structs. No HTTP. No auth middleware. No safety nets. If the data structures are vulnerable, Chaos finds it.
Every node in the execution DAG contains a SHA3-256 hash of its payload, its predecessor hash, and the H33_AGENT_V1 domain separator. Chaos tests that tampering with any single byte causes verification failure across the entire downstream chain.
H33-Chaos runs on every pull request merge. The full 54-test suite executes in under 3 seconds. A single failure blocks the merge. No exceptions, no overrides, no "skip tests" flags.
H33-Chaos produces 20 conformance test vectors (AGT-TV-001 through AGT-TV-020) that any independent implementation can use to verify compatibility. The vectors are the contract; the test suite enforces it.
// Run the full H33-Chaos adversarial suite
cargo test --test dag_tampering --release
cargo test --test replay_attacks --release
cargo test --test session_attacks --release
cargo test --test identity_attacks --release
// Generate conformance vectors
cargo run --bin generate_agent_vectors --release
Adversarial testing is meaningless without production-scale validation. These numbers were measured on Graviton4 bare metal under sustained load.
A single modified byte in any node payload, predecessor hash, or session binding causes the entire downstream chain to fail verification. Corruption doesn't hide — it cascades into detection.
50,000 replay operations across concurrent threads, growing DAGs, and shuffled insertion orders all produced byte-identical output. The Replay Determinism Score is 100.0% — not 99.99%, not "approximately." Exactly 100.0%.
Every session is cryptographically bound to its agent identity and scope declaration. Wrong-agent binding, cross-session injection, and scope escalation are all caught at the data structure level before any API handler fires.
H33-Chaos results are reproducible. The conformance vectors are published. The verifier is open. Run it yourself.