From 209 STARK Tests to Autonomous Enforcement

Most companies building AI governance start with policies. We started with proofs.

The H33 governance runtime did not begin as a compliance product. It began as a STARK engine — a zero-knowledge proof system built from scratch in Rust, operating over a Goldilocks prime field with algebraic intermediate representations, FRI polynomial commitments, and Fiat-Shamir transcripts. 209 tests cover the STARK foundation alone: biometric proofs, secp256k1 signature circuits, compliance circuits, identity circuits, device attestation proofs, canonical serialization, and transcript versioning.

From that foundation, we built upward — through integrity accumulators, route attestation, policy attestation, result attestation, state transition chains, governance graphs, search engines, replay engines, streaming layers, trust lifecycle management, customer profiles, and autonomous enforcement.

The result is 459 tests across 19 modules. A complete cryptographic governance operating system.

The Foundation: STARK Engine

The STARK engine is not a wrapper around an external library. It is a native Rust implementation with several properties that matter for governance:

Goldilocks prime field. The field modulus p = 2⁶⁴ - 2³² + 1 allows native u64 arithmetic without big-integer overhead. Field multiplication, addition, and inversion are all implemented with u64 operations and lazy reduction.

Algebraic Intermediate Representation (AIR). Constraints are expressed as algebraic relations over execution traces. The system supports multiple circuit implementations: biometric verification (7 columns), secp256k1 signature verification (1,210 columns, 836 constraints), PQ signature compression (8 columns), compliance (6 columns), identity (6 columns), and device attestation (5-proof composition).

FRI commitment scheme. The Fast Reed-Solomon Interactive Oracle Proof of Proximity provides the core commitment mechanism. Per-round domain separation ensures that folding round labels are unique. Blowup factor validation ensures power-of-2 and minimum-2 constraints.

Fiat-Shamir transcript. The transcript binds all public inputs and configuration parameters before generating challenges. Transcript version 2 includes hierarchical domain constants for STARK, FRI, biometric, compliance, device, secp256k1, and PQ signature circuits.

Canonical serialization. Deterministic byte encoding ensures that identical logical state produces identical byte sequences across machines, architectures, and compiler versions. Field elements use 8-byte little-endian encoding. Vectors are length-prefixed. Maps use declaration order.

209 Tests Are the Trust Foundation

These 209 tests are not incidental. They are the trust foundation. Every governance receipt, every integrity root, every replay frame ultimately depends on the correctness of the underlying cryptographic primitives.

Layer by Layer

Integrity Accumulator (32 tests)

The integrity accumulator is a recursive hash chain. Every authenticated request folds into a continuously evolving integrity root. The root is a SHA3-256 hash that commits to every auth event in order.

Properties: append-only (events can only be added, never removed), order-preserving (event sequence is committed via chain hash), tamper-evident (any modification breaks the root), constant-size summary (root + chain_hash + count regardless of N), and replay-detectable (duplicate event hashes are tracked).

Each fold produces an IntegrityReceipt with sequence number, event hash, previous root, new root, timestamp, auth method, transcript version, and optional bindings to route, policy, and result receipts. Every receipt is PQ-signed with ML-DSA-65.

Route Attestation (15 tests)

The IQ router is a claim-aware, two-stage engine selection system. Stage 1 applies hard filters (data type, operation type, security level, authorization claims, time validity, evidence class) and produces either a direct route, a route-and-derive, a derive-first-then-reroute, or a hard reject. Stage 2 applies derived tag scoring for cases requiring tag-aware selection.

Every routing decision produces a RouteDecisionReceipt that binds: request hash, selected engine, rejected engines with reasons, scoring weights version, router schema version, policy version, hardware snapshot ID, security target, latency target, transcript version, engine manifest version, derivation selector version, stage 1 determinism hash, and credit cost.

Policy Attestation (10 tests)

Every policy gate evaluation produces a PolicyDecisionReceipt binding: policy ID, policy version, request hash, route receipt hash, allowed/denied decision, enforcement mode (Enforce/AuditOnly), required security target, required engine class, data classification, and tenant identity hash.

The critical design decision: a denied policy cannot produce a successful execution event. This is enforced cryptographically. The verifier checks the governance chain and rejects any bundle where a denied policy is linked to a successful execution.

Result Attestation (10 tests)

Every computation result produces a ResultAttestationReceipt binding: request hash, route receipt hash, policy decision hash, event hash, engine ID, parameter set ID, input commitment, output commitment, result type, success/failure status, and error code.

A failed result cannot masquerade as a successful execution. The status is committed into the hash. The verifier rejects bundles where a failed result is bound to a governed execution event.

State Transition (8 tests)

State transitions operate within namespaces. Each namespace maintains an independent chain where transition N+1's prior state commitment must equal transition N's new state commitment.

The StateChainManager enforces: namespace continuity, mutation idempotency, tenant isolation, and genesis handling. Cross-tenant mutation is detected both at recording time and at verification time.

Governance Graph (13 tests)

All attestation types project into a unified GovernanceGraph. Eight node types: Route, Policy, Event, Result, StateTransition, Checkpoint, Federation, Anchor. Each node exposes: canonical hash, transcript version, signer key ID, parent references, timestamp, tenant binding.

The graph verifier runs six checks: transcript version consistency, orphan reference detection, orphan node detection, cycle detection (DFS with gray/black coloring), cross-tenant contamination, and missing governance lineage.

Deterministic Root Hash

The graph root hash is deterministic. Insertion order does not matter. This is tested explicitly with three different insertion orders producing identical roots.

Governance Search (20 tests)

The query engine builds secondary indexes at construction time: prefix index, type index, tenant index, signer index, transcript version index, child index (reverse edges), and time-sorted index.

Search supports: exact hash lookup (O(1)), partial hash prefix lookup, faceted search with composable filters, parent/child traversal, upstream/downstream lineage traversal, shortest path between nodes, natural language query parsing, and saved queries with deterministic query hashing.

Governance Replay (14 tests)

The replay engine produces: point-in-time snapshots (ReplayFrame), forward replay, reverse replay, scoped replay (tenant/namespace/policy/route), replay diffs (ReplayDiff), and evenly-spaced timelines.

Integrity checks within replay detect: continuity gaps, missing lineage, transcript incompatibilities, and signer discontinuities.

Governance Stream + Durability (24 tests)

The event stream provides: append-only ordered events, cursor-based reads, subscription filtering, checkpoint resume, and bounded eviction. The durability layer provides persistence abstraction, log entry verification, corruption detection, cursor checkpoint persistence, and recovery from durable log.

The delivery manager provides: at-least-once webhook delivery, idempotency key enforcement, exponential backoff retry (1s to 1h), dead-letter queue, and delivery status tracking. SIEM adapters format events for Splunk, Datadog, Elastic, and Sentinel.

Trust Lifecycle (13 tests)

The signer registry tracks keys through: Pending, Active, Rotating, Revoked, Expired. Replacement chain continuity is enforced. The terminal key in any replacement chain must be Active.

The monotonic verifier detects: timestamp rollback per signer, replayed timestamps, and impossible ordering across lineage. Trust policies restrict allowed algorithms, allowed trust domains, and signer separation rules. The production policy allows only ML-DSA-65 and ML-DSA-87.

Customer Profiles (12 tests)

Banking

Q128, BFV-64/TFHE, 3 quorum, 7-year retention, 5 required receipt types

Healthcare

Q256, BFV-64 only, 5 quorum, 6-year retention

Insurance

Q128, BFV-64/CKKS, 3 quorum, 10-year retention

Government

Q256, BFV-64/TFHE, 5 quorum, 20-year retention, all 8 receipt types

Crypto

Q128, BFV-64/32/TFHE, 3 quorum, 5-year retention

AI Governance

Q128, BFV-64/CKKS/TFHE, 1 quorum, 3-year retention

Autonomous Enforcement (13 tests)

Ten triggers map to nine actions across four enforcement levels.

The critical architectural decision: enforcement actions are themselves PQ-signed and auditable. The enforcer does not operate outside the governance chain. It operates within it. Every block, every isolation, every lockdown produces an EnforcementDecisionReceipt that is hash-committed, PQ-signed, and independently verifiable.

CriticalLockdown blocks everything.

This is the sentence that changes the category from observability to operational control.

Why This Architecture

We started from proofs because proofs compose. A hash chain that is correct at the primitive level remains correct when you build receipts on top of it. Receipts that are correct remain correct when you build a governance graph on top of them. A governance graph that is correct remains correct when you build search, replay, streaming, and enforcement on top of it.

Each layer adds capability without weakening the guarantees of the layers below. The enforcer can block execution because the integrity pipeline guarantees that the block is cryptographically recorded. The replay engine can reconstruct state because the accumulator guarantees that every event is hash-linked and ordered. The search engine can traverse lineage because the governance graph guarantees that every parent reference points to an existing node.

This is the difference between building governance as an afterthought and building it as a substrate.

459 tests. 19 modules. Every decision provable. Every state replayable. Every enforcement auditable.