Why We Built a Second STARK Engine
H33's production authentication pipeline runs on a custom STARK engine optimized for short, fixed-width execution traces — 32-user biometric verification batches processed at 2.17 million authentications per second on AWS Graviton4. That engine is tuned for a specific job: prove that an FHE inner product was computed correctly, attest it with a Dilithium ML-DSA-65 signature, and do it all in under 39 microseconds per auth.
But the next generation of H33 products — starting with H33-ZK-Procure — requires a fundamentally different proof geometry. Instead of short traces with narrow columns, procurement evaluation needs long, variable-width traces: thousands of rows representing file-by-file analysis across an entire codebase, with wide columns tracking multiple scoring dimensions simultaneously.
Extending the production engine was the obvious path. We rejected it for three reasons:
1. No production risk. The auth STARK processes billions of verifications per month. Modifying its
constraint system to accommodate variable-width traces introduces regression risk we refused to accept.
2. Different circuit geometry. Auth traces are fixed at 32 rows with lookup tables. Evaluation traces
run to 65,536+ rows with branching transition logic. The AIR (Algebraic Intermediate Representation) for these
two workloads shares almost no structure.
3. Independent release cycle. A second engine ships on its own timeline, with its own benchmarks
and its own security review. No coupling, no coordination overhead.
The result is H33-ZKP-AIR: a standalone STARK proving system built on the Goldilocks prime field
(p = 264 - 232 + 1), SHA3-256 Merkle commitments, and FRI polynomial proofs. No trusted setup.
Post-quantum secure. And now, production-hardened against the Fiat-Shamir class of vulnerabilities.
The Bug That Burned Six Vendors
In March 2026, OtterSec published findings showing that six independent zkVM implementations contained critical flaws in their Fiat-Shamir transforms. The Fiat-Shamir heuristic is how you convert an interactive zero-knowledge proof into a non-interactive one: instead of a live verifier sending random challenges, the prover hashes the protocol transcript and derives challenges deterministically.
The bug pattern across all six implementations was the same: incomplete transcript binding. Not all prior commitments were included in the hash input before deriving challenges. This gap allows a malicious prover to manipulate the challenge value and construct a "proof" that the verifier accepts — for a statement that is mathematically false.
This is not a theoretical concern. A working exploit means a prover can claim they executed a computation correctly when they did not. In a ZK-rollup context, this means fabricating state transitions. In an authentication context, it means forging identity proofs. In a procurement context — our context — it means faking a vendor's code quality score.
The fact that six independent teams made the same mistake tells you something important: this is an easy error to introduce and a hard one to catch without a deliberate audit.
What We Audited
We audited both H33 STARK engines against six specific criteria derived from the OtterSec findings:
| Check | What It Means |
|---|---|
| Transcript Completeness | Every commitment (trace, constraint, FRI layer) is absorbed into the hash before the dependent challenge is derived |
| No Mutable State Injection | No data can be modified between a commitment and its corresponding challenge |
| Domain Separation | Each proof type and each protocol step uses a unique prefix, preventing cross-context replay |
| Length-Extension Resistance | Hash inputs are length-prefixed and labeled, preventing concatenation collisions |
| Challenge Freshness | Each protocol step derives a new challenge from the full transcript up to that point |
| Prover/Verifier Symmetry | The verifier reconstructs the exact same transcript as the prover — any divergence is a bug |
Production STARK: Results
Our production STARK engine (the one powering 2.17M auth/sec) is not affected by the OtterSec pattern.
The Fiat-Shamir transcript implementation follows a strict absorb-then-squeeze discipline. Public inputs are absorbed first. The trace commitment is absorbed before constraint composition coefficients are derived. The constraint commitment is absorbed before the out-of-domain challenge. OOD evaluations are absorbed before FRI. The full chain of dependencies is correct, and the verifier mirrors the prover's transcript step-for-step on the production code path.
SHA3-256 (Keccak sponge construction) provides native resistance to length-extension attacks. Every absorb
operation is tagged with a label and a length prefix. Every squeeze operation increments a monotonic counter.
Domain separation is enforced at the protocol level (STARK_PROOF_TRANSCRIPT vs
FRI_COMMITMENT_TRANSCRIPT) and at the operation level (APPEND: vs CHALLENGE:).
The OtterSec pattern was absent, but the audit was not clean. We identified two issues
unrelated to transcript completeness:
Issue 1 (Critical API bug): A secondary verification code path skipped the constraint coefficient
challenge derivation, causing the verifier's transcript to diverge from the prover's. The production code path was
unaffected — this only impacted a convenience API that is not called in production. We removed the broken path and
made the correct path the only path.
Issue 2 (FRI query binding): The FRI verifier accepted prover-supplied query positions
without independently regenerating them from the Fiat-Shamir transcript. A sophisticated attacker could
theoretically choose advantageous query positions to weaken FRI soundness. We added transcript-derived
position regeneration with strict matching — any mismatch between claimed and derived positions now
causes immediate rejection.
Both issues were fixed within hours of discovery. Neither was exploitable in our production deployment, but both represented real gaps that could have become exploitable as the codebase evolved. We do not wait for exploitability to fix vulnerabilities.
H33-ZKP-AIR: Architecture
The new engine is designed for a different class of computation. Where the production STARK handles fixed 32-row traces in microseconds, ZKP-AIR handles variable traces up to hundreds of thousands of rows — the scale needed for codebase analysis, compliance evaluation, and procurement scoring.
| Property | Auth STARK (Production) | ZKP-AIR (New) |
|---|---|---|
| Field | BLS12-381 scalar | Goldilocks (264 − 232 + 1) |
| Trace Width | Fixed, narrow (7 columns) | Variable, wide (3–32+ columns) |
| Trace Length | 32 rows | 256 – 65,536+ rows |
| Hash | Poseidon + SHA3-256 | SHA3-256 (domain-separated) |
| Polynomial Commitment | FRI (custom) | FRI (Winterfell 0.13) |
| Trusted Setup | None | None |
| Post-Quantum | Yes (hash-based) | Yes (hash-based) |
The Goldilocks field was chosen for native u64 arithmetic on ARM (Graviton4) and x86 without requiring big-integer
libraries. The special form of the prime (264 - 232 + 1) allows modular reduction
using only shifts, adds, and subtracts — no division in the hot path. Our benchmarks show 3 nanoseconds
per field multiplication on Apple M4 Max, with NTT transforms completing in 2.2 milliseconds at 65,536 elements.
For the proving engine itself, we integrated Winterfell (Facebook Research), a production-grade STARK library that has been through multiple independent audits. Winterfell handles the FRI commitment scheme, the prover pipeline, and the verification logic. Our custom Goldilocks field, Merkle trees, and transcript implementation serve as the utility layer, while Winterfell provides the soundness-critical core.
Early Benchmarks
These numbers are from our local development environment (Apple M4 Max, single core, --release mode).
Graviton4 numbers will follow once we complete the production deployment.
| Trace Rows | Prove | Verify | Proof Size |
|---|---|---|---|
| 256 | 1.9 ms | 141 µs | 18 KB |
| 1,024 | 6.7 ms | 201 µs | 31 KB |
| 4,096 | 31 ms | 302 µs | 43 KB |
| 16,384 | 141 ms | 337 µs | 53 KB |
| 65,536 | 610 ms | 429 µs | 68 KB |
Verification is sub-millisecond at every trace size. This matters because the verification path runs in the browser (or on the procurement team's machine) — it needs to be instant.
What This Means for H33 Customers
ZKP-AIR powers the next wave of H33 products. H33-ZK-Procure is the first: cryptographic procurement intelligence where vendors get graded without exposing source code. The STARK proves the evaluation was executed correctly. The Dilithium signature seals it with post-quantum certainty.
More broadly, ZKP-AIR is a general-purpose proving system for any computation where you need to say "this function ran on this data and produced this output" with cryptographic certainty. Audit trails. Supply chain attestation. Compliance proofs. Regulatory reporting where the underlying data is sensitive but the conclusion must be verifiable.
ZKP-AIR runs on H33's standardized credit system. Same packs, same dashboard, same API. Credits are fungible across all H33 products — use them for FHE biometrics, ZK lookups, procurement assessments, or any combination.
The Transparency Principle
Six teams shipped broken Fiat-Shamir implementations. None of them caught it internally. The bugs were found by a third-party auditor. That tells you everything you need to know about the limits of internal review.
We are not immune to this. The two issues we found in our own codebase were not the OtterSec pattern, but they were real — and they existed in code that had passed our internal review process. The difference is not that we write perfect code. The difference is that we audit aggressively, fix immediately, and publish the results.
If you are evaluating ZK infrastructure — from any vendor, including us — ask three questions:
1. Show me your Fiat-Shamir transcript. Is the entire protocol transcript absorbed before
each challenge? Is the hash domain-separated per proof type? Is the verifier's transcript reconstruction
identical to the prover's?
2. Show me your FRI verifier. Does it independently derive query positions from the transcript, or
does it trust positions supplied by the prover? Does it verify folding consistency between layers?
3. When was your last audit, and what did it find? If the answer is "nothing," that means
they did not look hard enough.
We publish our audit findings because trust in cryptographic infrastructure is not built on claims. It is built on evidence. The math is the trust layer. But the math only works if the implementation is correct — and the only way to know the implementation is correct is to look.
H33-ZKP-AIR is available now in the H33 platform. Explore the product page, read the API documentation, or get started with a free API key.