Introducing H33-ZKP-AIR: A New STARK Engine and the Fiat-Shamir Audit That Shaped It

Why We Built a Second STARK Engine

H33's production authentication pipeline runs on a custom STARK engine optimized for short, fixed-width execution traces — 32-user biometric verification batches processed at 2.17 million authentications per second on AWS Graviton4. That engine is tuned for a specific job: prove that an FHE inner product was computed correctly, attest it with a Dilithium ML-DSA-65 signature, and do it all in under 39 microseconds per auth.

But the next generation of H33 products — starting with H33-ZK-Procure — requires a fundamentally different proof geometry. Instead of short traces with narrow columns, procurement evaluation needs long, variable-width traces: thousands of rows representing file-by-file analysis across an entire codebase, with wide columns tracking multiple scoring dimensions simultaneously.

Extending the production engine was the obvious path. We rejected it for three reasons:

Design Decision

1. No production risk. The auth STARK processes billions of verifications per month. Modifying its constraint system to accommodate variable-width traces introduces regression risk we refused to accept.

2. Different circuit geometry. Auth traces are fixed at 32 rows with lookup tables. Evaluation traces run to 65,536+ rows with branching transition logic. The AIR (Algebraic Intermediate Representation) for these two workloads shares almost no structure.

3. Independent release cycle. A second engine ships on its own timeline, with its own benchmarks and its own security review. No coupling, no coordination overhead.

The result is H33-ZKP-AIR: a standalone STARK proving system built on the Goldilocks prime field (p = 2⁶⁴ - 2³² + 1), SHA3-256 Merkle commitments, and FRI polynomial proofs. No trusted setup. Post-quantum secure. And now, production-hardened against the Fiat-Shamir class of vulnerabilities.

The Bug That Burned Six Vendors

In March 2026, OtterSec published findings showing that six independent zkVM implementations contained critical flaws in their Fiat-Shamir transforms. The Fiat-Shamir heuristic is how you convert an interactive zero-knowledge proof into a non-interactive one: instead of a live verifier sending random challenges, the prover hashes the protocol transcript and derives challenges deterministically.

The bug pattern across all six implementations was the same: incomplete transcript binding. Not all prior commitments were included in the hash input before deriving challenges. This gap allows a malicious prover to manipulate the challenge value and construct a "proof" that the verifier accepts — for a statement that is mathematically false.

This is not a theoretical concern. A working exploit means a prover can claim they executed a computation correctly when they did not. In a ZK-rollup context, this means fabricating state transitions. In an authentication context, it means forging identity proofs. In a procurement context — our context — it means faking a vendor's code quality score.

The fact that six independent teams made the same mistake tells you something important: this is an easy error to introduce and a hard one to catch without a deliberate audit.

What We Audited

We audited both H33 STARK engines against six specific criteria derived from the OtterSec findings:

Check	What It Means
Transcript Completeness	Every commitment (trace, constraint, FRI layer) is absorbed into the hash before the dependent challenge is derived
No Mutable State Injection	No data can be modified between a commitment and its corresponding challenge
Domain Separation	Each proof type and each protocol step uses a unique prefix, preventing cross-context replay
Length-Extension Resistance	Hash inputs are length-prefixed and labeled, preventing concatenation collisions
Challenge Freshness	Each protocol step derives a new challenge from the full transcript up to that point
Prover/Verifier Symmetry	The verifier reconstructs the exact same transcript as the prover — any divergence is a bug

Production STARK: Results

Our production STARK engine (the one powering 2.17M auth/sec) is not affected by the OtterSec pattern.

The Fiat-Shamir transcript implementation follows a strict absorb-then-squeeze discipline. Public inputs are absorbed first. The trace commitment is absorbed before constraint composition coefficients are derived. The constraint commitment is absorbed before the out-of-domain challenge. OOD evaluations are absorbed before FRI. The full chain of dependencies is correct, and the verifier mirrors the prover's transcript step-for-step on the production code path.

SHA3-256 (Keccak sponge construction) provides native resistance to length-extension attacks. Every absorb operation is tagged with a label and a length prefix. Every squeeze operation increments a monotonic counter. Domain separation is enforced at the protocol level (STARK_PROOF_TRANSCRIPT vs FRI_COMMITMENT_TRANSCRIPT) and at the operation level (APPEND: vs CHALLENGE:).

What We Did Find

The OtterSec pattern was absent, but the audit was not clean. We identified two issues unrelated to transcript completeness:

Issue 1 (Critical API bug): A secondary verification code path skipped the constraint coefficient challenge derivation, causing the verifier's transcript to diverge from the prover's. The production code path was unaffected — this only impacted a convenience API that is not called in production. We removed the broken path and made the correct path the only path.

Issue 2 (FRI query binding): The FRI verifier accepted prover-supplied query positions without independently regenerating them from the Fiat-Shamir transcript. A sophisticated attacker could theoretically choose advantageous query positions to weaken FRI soundness. We added transcript-derived position regeneration with strict matching — any mismatch between claimed and derived positions now causes immediate rejection.

Both issues were fixed within hours of discovery. Neither was exploitable in our production deployment, but both represented real gaps that could have become exploitable as the codebase evolved. We do not wait for exploitability to fix vulnerabilities.

H33-ZKP-AIR: Architecture

The new engine is designed for a different class of computation. Where the production STARK handles fixed 32-row traces in microseconds, ZKP-AIR handles variable traces up to hundreds of thousands of rows — the scale needed for codebase analysis, compliance evaluation, and procurement scoring.

Property	Auth STARK (Production)	ZKP-AIR (New)
Field	BLS12-381 scalar	Goldilocks (2⁶⁴ − 2³² + 1)
Trace Width	Fixed, narrow (7 columns)	Variable, wide (3–32+ columns)
Trace Length	32 rows	256 – 65,536+ rows
Hash	Poseidon + SHA3-256	SHA3-256 (domain-separated)
Polynomial Commitment	FRI (custom)	FRI (Winterfell 0.13)
Trusted Setup	None	None
Post-Quantum	Yes (hash-based)	Yes (hash-based)

The Goldilocks field was chosen for native u64 arithmetic on ARM (Graviton4) and x86 without requiring big-integer libraries. The special form of the prime (2⁶⁴ - 2³² + 1) allows modular reduction using only shifts, adds, and subtracts — no division in the hot path. Our benchmarks show 3 nanoseconds per field multiplication on Apple M4 Max, with NTT transforms completing in 2.2 milliseconds at 65,536 elements.

For the proving engine itself, we integrated Winterfell (Facebook Research), a production-grade STARK library that has been through multiple independent audits. Winterfell handles the FRI commitment scheme, the prover pipeline, and the verification logic. Our custom Goldilocks field, Merkle trees, and transcript implementation serve as the utility layer, while Winterfell provides the soundness-critical core.

Early Benchmarks

These numbers are from our local development environment (Apple M4 Max, single core, --release mode). Graviton4 numbers will follow once we complete the production deployment.

Trace Rows	Prove	Verify	Proof Size
256	1.9 ms	141 µs	18 KB
1,024	6.7 ms	201 µs	31 KB
4,096	31 ms	302 µs	43 KB
16,384	141 ms	337 µs	53 KB
65,536	610 ms	429 µs	68 KB

Verification is sub-millisecond at every trace size. This matters because the verification path runs in the browser (or on the procurement team's machine) — it needs to be instant.

What This Means for H33 Customers

ZKP-AIR powers the next wave of H33 products. H33-ZK-Procure is the first: cryptographic procurement intelligence where vendors get graded without exposing source code. The STARK proves the evaluation was executed correctly. The Dilithium signature seals it with post-quantum certainty.

More broadly, ZKP-AIR is a general-purpose proving system for any computation where you need to say "this function ran on this data and produced this output" with cryptographic certainty. Audit trails. Supply chain attestation. Compliance proofs. Regulatory reporting where the underlying data is sensitive but the conclusion must be verifiable.

ZKP-AIR runs on H33's standardized credit system. Same packs, same dashboard, same API. Credits are fungible across all H33 products — use them for FHE biometrics, ZK lookups, procurement assessments, or any combination.

The Transparency Principle

Six teams shipped broken Fiat-Shamir implementations. None of them caught it internally. The bugs were found by a third-party auditor. That tells you everything you need to know about the limits of internal review.

We are not immune to this. The two issues we found in our own codebase were not the OtterSec pattern, but they were real — and they existed in code that had passed our internal review process. The difference is not that we write perfect code. The difference is that we audit aggressively, fix immediately, and publish the results.

If you are evaluating ZK infrastructure — from any vendor, including us — ask three questions:

Three Questions for Any ZK Vendor

1. Show me your Fiat-Shamir transcript. Is the entire protocol transcript absorbed before each challenge? Is the hash domain-separated per proof type? Is the verifier's transcript reconstruction identical to the prover's?

2. Show me your FRI verifier. Does it independently derive query positions from the transcript, or does it trust positions supplied by the prover? Does it verify folding consistency between layers?

3. When was your last audit, and what did it find? If the answer is "nothing," that means they did not look hard enough.

We publish our audit findings because trust in cryptographic infrastructure is not built on claims. It is built on evidence. The math is the trust layer. But the math only works if the implementation is correct — and the only way to know the implementation is correct is to look.

H33-ZKP-AIR is available now in the H33 platform. Explore the product page, read the API documentation, or get started with a free API key.