HATS Replayable Adversarial Resilience — Audit-Grade

The asserted-test problem. Today's AI adversarial testing produces a report. The vendor says "we tested." The regulator asks "how do we know?" The auditor asks "can we reproduce the result years later?" Nobody can. HATS makes the answer cryptographic, replayable, and chain-portable.

Read the canonical spec. This page describes one HATS record kind (adversarial_resilience). The full eight-layer schema, field-by-field invariants, and verification protocol are documented in HATS Record v1.

HATS Record v1 →

The framework underneath

The methodology HATS implements is the CKA-based surrogate selection framework proposed by Cox & Bunzel (OWASP AI Exchange / Fraunhofer SIT / ATHENE, 2025). The paper observes that exhaustive coverage of adversarial subspaces is computationally infeasible (NP-complete; high-dimensional input spaces), so resilience testing must instead deliberately span the space of surrogate models. The framework selects surrogates at both high and low Centered Kernel Alignment similarity to the target model, then evaluates transferred attack success rates across that bounded but representative set.

The framework is sound. What it does not address: how the regulator, auditor, or downstream consumer of the result independently verifies that the testing was done as claimed.

Cox & Bunzel (2025). Quantifying the Risk of Transferred Black Box Attacks. OWASP AI Exchange / Fraunhofer SIT / TU-Darmstadt / ATHENE. Proposes CKA-similarity-thresholded surrogate selection (r₁ ≈ 0.55 high similarity; r₂ ≈ 0.35 low similarity), AutoAttack ensemble (APGD + FAB + Square), and regression-based risk estimation.

What HATS adds

HATS produces a cryptographically signed record of every decision the framework requires, plus the inputs, the outputs, and the verifier configuration. The record is anchored to one or more public chains for independent notarization, and the entire testing decision is reproducible by anyone with the open-source HATS verifier.

What gets recorded

The target model M_p with architecture, training data commitment, weight hash, and pre-production version stamp.
The high-similarity surrogate set M₁ — each model, its CKA similarity score against M_p, the architecture, and the rationale for inclusion.
The low-similarity surrogate set M₂ — same fields. The cardinalities and the thresholds r₁ and r₂ used for inclusion.
The attack corpus — AutoAttack ensemble configuration, perturbation budgets, attack-specific parameters, RNG seed for reproducibility.
Per-surrogate attack outcomes — which attacks transferred, transfer rates, confidence intervals.
The risk-estimation methodology — regression model used, prior, posterior, and the produced risk estimate with its confidence interval.
The verifier-binding metadata — open-source verifier version, dependency hashes, container image digest, build-environment commitment.

A HATS adversarial-resilience record, visualized

HATS · Adversarial Resilience Record

0x9c4a3b1e7f8a2d6b5c9e1f3a8d4b7c2e5f9a1d3b6c8e4f7a2b5d9c1e6f3a8b4d

✓ ML-DSA-65 · FALCON-512 · SLH-DSA-128f

Target model M_p

ResNet-50 · trained on ImageNet-1K · weight hash 0x2d8f4a6b9c1e3f5a... · pre-production stamp 2026-Q2-rc3

Surrogate set M_1 · CKA ≥ r_1 (0.55)

Model

CKA

Transfer rate

DenseNet-201

0.587

12 / 48 attacks

VGG-19

0.612

9 / 48 attacks

ResNet-152

0.694

18 / 48 attacks

Surrogate set M_2 · CKA ≤ r_2 (0.35)

Model

CKA

Transfer rate

MobileNetV2

0.298

3 / 48 attacks

EfficientNet-B0

0.331

5 / 48 attacks

Attack corpus

AutoAttack ensemble · APGD (loss CE + DLR) · FAB (targeted + untargeted) · Square Attack · perturbation budget ε = 8/255 (L∞) · RNG seed 0x847921a

Risk estimate

Transfer rate 0.196 ± 0.041 · Bayesian regression with CKA-weighted prior · 95% credible interval · regression model hash 0x4b3e2f...

Verifier binding

hats-verify-cli v0.4.0 · container digest sha256:7a2b5d9c... · open-source · independently runnable without H33 infrastructure

Independent notarization anchors

POLYGON

block 11,234,567 · 0xa1b2c3d4e5f6...8a9b0c1d2e3f

✓ VERIFIED

BITCOIN

block 891,234 · 8f7e6d5c4b3a...9c8b7a6f5e4d

✓ VERIFIED

ETHEREUM

block 19,876,543 · 0x4d3e2f1a0b9c...8e7d6f5a4b3c

✓ VERIFIED

Illustrative. The record above is a structural example showing the HATS field set. Real records produced by the HATS verifier on production models render the same shape with real model identities, real CKA scores, and real anchor transactions.

Replay: how an auditor verifies the result years later

A regulator, auditor, or downstream consumer with only the HATS record and the open-source verifier can independently reproduce the testing decision. The replay does not require H33's infrastructure, the original testing vendor, or the AI provider's continued cooperation.

Fetch the record

Pull the HATS record from any of the chain anchors or from the customer's preservation store. Verify the three post-quantum signatures.

Verify surrogate inclusion

Recompute CKA similarity between each surrogate and the target model using the recorded weight hashes. Confirm each surrogate met its threshold (M_1 ≥ r_1, M_2 ≤ r_2).

Re-derive the attacks

Using the recorded RNG seed, attack parameters, and surrogate weights, deterministically regenerate the AutoAttack corpus. Confirm the attack set matches what was evaluated.

Reproduce the risk estimate

Run the recorded regression model with the recorded prior on the recorded transfer outcomes. Confirm the produced risk estimate (0.196 ± 0.041) is reproducible.

Check anchor consistency

Verify the chain anchors on Polygon, Bitcoin, and Ethereum reference the same record hash. Independent notarization confirms the record existed by the recorded block timestamps.

Conclude

The testing decision is cryptographically reproducible. The surrogate selection was sound under the framework. Risk estimate is defensible. Replay succeeds without operator infrastructure.

What this changes

Traditional AI security testing

Vendor asserts they tested. Report PDF. Auditor trusts the report. No reproducibility years later. No regulator-direct verification. Vendor lock-in for evidence.

HATS replayable adversarial resilience

Cryptographic record of what was tested, against what surrogates with what CKA scores, with what attack outcomes. Regulator verifies directly. Result reproducible from the record alone for the lifetime of the chain anchors and the verifier. No operator dependency.

Regulatory mapping

EU AI Act Article 15 — accuracy, robustness, cybersecurity

Article 15 requires high-risk AI systems to be designed and developed with appropriate levels of accuracy, robustness, and cybersecurity, including resilience to attempts by unauthorized third parties to alter the use or performance through exploiting system vulnerabilities. HATS records demonstrate the robustness testing methodology, the surrogate coverage, and the produced risk estimate to the national competent authority and the AI Office.

EU AI Act Article 12 — automatic logging

Article 12 requires automatic logging of AI system activity. HATS records each adversarial test event as one log entry with structural metadata. Logging compliance becomes cryptographically verifiable. See the EU AI Act crosswalk.

NIST AI Risk Management Framework

The NIST AI RMF (AI 100-1) emphasizes test, evaluation, validation, and verification (TEVV) as a continuous lifecycle activity. HATS records make TEVV provable rather than asserted, supporting the Measure and Manage functions of the framework.

OWASP AI Exchange

The OWASP AI Exchange community advocates for measurable AI security controls. HATS records implement the Cox & Bunzel framework as a cryptographically verifiable layer that the broader community can adopt as the audit-grade extension of the methodology.

Stronger than "AI security testing." Audit-grade adversarial resilience evidence. Cryptographically reproducible. Chain-portable. Survives the testing vendor, the AI provider, and the chain it was anchored to.

Integration shape

For AI providers and AI security testing vendors

Wrap the testing pipeline with the HATS recorder. Every surrogate selection decision, every attack run, every regression step emits a structured field that composes into the final record. Adds milliseconds to the testing pipeline. The output is one cryptographically signed record per pre-production model evaluation.

For AI deployers and customers

Request HATS records as part of vendor due diligence. Run the open-source HATS verifier to confirm the testing claims independently. Anchor the records to your own preservation chain so the evidence does not depend on the vendor's continued operation.

For regulators and auditors

Verify any HATS record directly using the open-source verifier. The record reproduces the testing decision deterministically. The verification does not require the AI provider, the testing vendor, or H33 to be operational.

HATS is the audit-grade extension of the framework

The Cox & Bunzel methodology gives you bounded coverage. HATS makes the result cryptographically replayable.

HATS Standard H33-74 for AI Decisions

Replayable Adversarial-Risk Evidence for AI Models

The framework underneath

What HATS adds

What gets recorded

A HATS adversarial-resilience record, visualized

Replay: how an auditor verifies the result years later

Fetch the record

Verify surrogate inclusion

Re-derive the attacks

Reproduce the risk estimate

Check anchor consistency

Conclude

What this changes

Regulatory mapping

EU AI Act Article 15 — accuracy, robustness, cybersecurity

EU AI Act Article 12 — automatic logging

NIST AI Risk Management Framework

OWASP AI Exchange

Integration shape

For AI providers and AI security testing vendors

For AI deployers and customers

For regulators and auditors

HATS is the audit-grade extension of the framework

Related