HATS · H33 AI Trust Standard · AI Security

Replayable Adversarial-Risk Evidence for AI Models

HATS records which surrogate models were selected, why they were selected, which attacks transferred, what risk estimate was produced, and whether the result can be independently verified years later. Stronger than "AI security testing" — audit-grade adversarial resilience evidence.

The asserted-test problem. Today's AI adversarial testing produces a report. The vendor says "we tested." The regulator asks "how do we know?" The auditor asks "can we reproduce the result years later?" Nobody can. HATS makes the answer cryptographic, replayable, and chain-portable.
Read the canonical spec. This page describes one HATS record kind (adversarial_resilience). The full eight-layer schema, field-by-field invariants, and verification protocol are documented in HATS Record v1.
HATS Record v1 →

The framework underneath

The methodology HATS implements is the CKA-based surrogate selection framework proposed by Cox & Bunzel (OWASP AI Exchange / Fraunhofer SIT / ATHENE, 2025). The paper observes that exhaustive coverage of adversarial subspaces is computationally infeasible (NP-complete; high-dimensional input spaces), so resilience testing must instead deliberately span the space of surrogate models. The framework selects surrogates at both high and low Centered Kernel Alignment similarity to the target model, then evaluates transferred attack success rates across that bounded but representative set.

The framework is sound. What it does not address: how the regulator, auditor, or downstream consumer of the result independently verifies that the testing was done as claimed.

Cox & Bunzel (2025). Quantifying the Risk of Transferred Black Box Attacks. OWASP AI Exchange / Fraunhofer SIT / TU-Darmstadt / ATHENE. Proposes CKA-similarity-thresholded surrogate selection (r1 ≈ 0.55 high similarity; r2 ≈ 0.35 low similarity), AutoAttack ensemble (APGD + FAB + Square), and regression-based risk estimation.

What HATS adds

HATS produces a cryptographically signed record of every decision the framework requires, plus the inputs, the outputs, and the verifier configuration. The record is anchored to one or more public chains for independent notarization, and the entire testing decision is reproducible by anyone with the open-source HATS verifier.

What gets recorded

A HATS adversarial-resilience record, visualized

HATS · Adversarial Resilience Record
0x9c4a3b1e7f8a2d6b5c9e1f3a8d4b7c2e5f9a1d3b6c8e4f7a2b5d9c1e6f3a8b4d
✓ ML-DSA-65 · FALCON-512 · SLH-DSA-128f
Target model M_p
ResNet-50 · trained on ImageNet-1K · weight hash 0x2d8f4a6b9c1e3f5a... · pre-production stamp 2026-Q2-rc3
Surrogate set M_1 · CKA ≥ r_1 (0.55)
Model
CKA
Transfer rate
DenseNet-201
0.587
12 / 48 attacks
VGG-19
0.612
9 / 48 attacks
ResNet-152
0.694
18 / 48 attacks
Surrogate set M_2 · CKA ≤ r_2 (0.35)
Model
CKA
Transfer rate
MobileNetV2
0.298
3 / 48 attacks
EfficientNet-B0
0.331
5 / 48 attacks
Attack corpus
AutoAttack ensemble · APGD (loss CE + DLR) · FAB (targeted + untargeted) · Square Attack · perturbation budget ε = 8/255 (L∞) · RNG seed 0x847921a
Risk estimate
Transfer rate 0.196 ± 0.041 · Bayesian regression with CKA-weighted prior · 95% credible interval · regression model hash 0x4b3e2f...
Verifier binding
hats-verify-cli v0.4.0 · container digest sha256:7a2b5d9c... · open-source · independently runnable without H33 infrastructure
Independent notarization anchors
POLYGON
block 11,234,567 · 0xa1b2c3d4e5f6...8a9b0c1d2e3f
✓ VERIFIED
BITCOIN
block 891,234 · 8f7e6d5c4b3a...9c8b7a6f5e4d
✓ VERIFIED
ETHEREUM
block 19,876,543 · 0x4d3e2f1a0b9c...8e7d6f5a4b3c
✓ VERIFIED
Illustrative. The record above is a structural example showing the HATS field set. Real records produced by the HATS verifier on production models render the same shape with real model identities, real CKA scores, and real anchor transactions.

Replay: how an auditor verifies the result years later

A regulator, auditor, or downstream consumer with only the HATS record and the open-source verifier can independently reproduce the testing decision. The replay does not require H33's infrastructure, the original testing vendor, or the AI provider's continued cooperation.

01

Fetch the record

Pull the HATS record from any of the chain anchors or from the customer's preservation store. Verify the three post-quantum signatures.

02

Verify surrogate inclusion

Recompute CKA similarity between each surrogate and the target model using the recorded weight hashes. Confirm each surrogate met its threshold (M_1 ≥ r_1, M_2 ≤ r_2).

03

Re-derive the attacks

Using the recorded RNG seed, attack parameters, and surrogate weights, deterministically regenerate the AutoAttack corpus. Confirm the attack set matches what was evaluated.

04

Reproduce the risk estimate

Run the recorded regression model with the recorded prior on the recorded transfer outcomes. Confirm the produced risk estimate (0.196 ± 0.041) is reproducible.

05

Check anchor consistency

Verify the chain anchors on Polygon, Bitcoin, and Ethereum reference the same record hash. Independent notarization confirms the record existed by the recorded block timestamps.

06

Conclude

The testing decision is cryptographically reproducible. The surrogate selection was sound under the framework. Risk estimate is defensible. Replay succeeds without operator infrastructure.

What this changes

Traditional AI security testing
Vendor asserts they tested. Report PDF. Auditor trusts the report. No reproducibility years later. No regulator-direct verification. Vendor lock-in for evidence.
HATS replayable adversarial resilience
Cryptographic record of what was tested, against what surrogates with what CKA scores, with what attack outcomes. Regulator verifies directly. Result reproducible from the record alone for the lifetime of the chain anchors and the verifier. No operator dependency.

Regulatory mapping

EU AI Act Article 15 — accuracy, robustness, cybersecurity

Article 15 requires high-risk AI systems to be designed and developed with appropriate levels of accuracy, robustness, and cybersecurity, including resilience to attempts by unauthorized third parties to alter the use or performance through exploiting system vulnerabilities. HATS records demonstrate the robustness testing methodology, the surrogate coverage, and the produced risk estimate to the national competent authority and the AI Office.

EU AI Act Article 12 — automatic logging

Article 12 requires automatic logging of AI system activity. HATS records each adversarial test event as one log entry with structural metadata. Logging compliance becomes cryptographically verifiable. See the EU AI Act crosswalk.

NIST AI Risk Management Framework

The NIST AI RMF (AI 100-1) emphasizes test, evaluation, validation, and verification (TEVV) as a continuous lifecycle activity. HATS records make TEVV provable rather than asserted, supporting the Measure and Manage functions of the framework.

OWASP AI Exchange

The OWASP AI Exchange community advocates for measurable AI security controls. HATS records implement the Cox & Bunzel framework as a cryptographically verifiable layer that the broader community can adopt as the audit-grade extension of the methodology.

Stronger than "AI security testing." Audit-grade adversarial resilience evidence. Cryptographically reproducible. Chain-portable. Survives the testing vendor, the AI provider, and the chain it was anchored to.

Integration shape

For AI providers and AI security testing vendors

Wrap the testing pipeline with the HATS recorder. Every surrogate selection decision, every attack run, every regression step emits a structured field that composes into the final record. Adds milliseconds to the testing pipeline. The output is one cryptographically signed record per pre-production model evaluation.

For AI deployers and customers

Request HATS records as part of vendor due diligence. Run the open-source HATS verifier to confirm the testing claims independently. Anchor the records to your own preservation chain so the evidence does not depend on the vendor's continued operation.

For regulators and auditors

Verify any HATS record directly using the open-source verifier. The record reproduces the testing decision deterministically. The verification does not require the AI provider, the testing vendor, or H33 to be operational.

HATS is the audit-grade extension of the framework

The Cox & Bunzel methodology gives you bounded coverage. HATS makes the result cryptographically replayable.

HATS Standard H33-74 for AI Decisions

Related