PricingDemo
Log InGet API Key
Engineering

H33 ZK-STARK Audit: AIR Constraints and Fiat-Shamir Transcript Binding

Five critical findings from our internal STARK audit, how we fixed them, and why transparent proofs matter for production cryptography

Zero-knowledge proofs are easy to get wrong. The mathematics is elegant, the security properties are powerful, and the implementation pitfalls are subtle enough to pass code review while leaving the system fundamentally broken. When we built H33's STARK-based verification layer, we knew that the only responsible path was to audit every component as if an adversary had read our source code. This article describes what we found, what we fixed, and why the AIR constraint system and Fiat-Shamir transform are the two places where STARK implementations most commonly fail.

H33 uses ZK-STARK proofs as the verification layer in its cryptographic pipeline. Every batch of authenticated operations produces a STARK proof that the FHE computation was performed correctly, without revealing the encrypted data or intermediate computation steps. This proof is then signed with post-quantum digital signatures and distilled into a 74-byte H33-74 attestation primitive. The STARK layer sits between the FHE computation and the signature layer, and its correctness is critical to the integrity of the entire pipeline.

Why STARKs Over Groth16

Before describing the audit findings, it is worth explaining why H33 uses STARKs rather than the more widely deployed Groth16 proof system. Groth16 is a SNARK (Succinct Non-Interactive Argument of Knowledge) that produces very small proofs (around 200 bytes) and has fast verification. It is the backbone of most ZK rollups and privacy-preserving blockchain systems today. So why not use it?

The first reason is the trusted setup. Groth16 requires a per-circuit trusted setup ceremony where cryptographic parameters are generated. If the randomness used during this ceremony is ever compromised or not properly destroyed, an attacker can forge proofs that verify as valid for any statement. This is not a theoretical concern; it is an operational burden that requires careful ceremony management, multi-party computation protocols, and trust in the participants. For a production pipeline that processes millions of operations per second, the trusted setup introduces a systemic risk that we chose to eliminate entirely.

The second reason is quantum resistance. Groth16 relies on elliptic curve pairings over BN254 or BLS12-381 curves. These pairings are based on the hardness of the discrete logarithm problem in elliptic curve groups, which is efficiently solvable by Shor's algorithm on a sufficiently large quantum computer. When H33's pipeline is designed to protect data for decades, using a verification layer that will be broken by quantum computers is incoherent. STARKs rely on the collision resistance of hash functions (SHA3-256 in our case), which are believed to remain secure against quantum attack with appropriate parameter sizes.

The third reason is transparency. STARK proofs are transparent: the verifier's challenges are derived deterministically from the proof transcript via the Fiat-Shamir transform, and no secret parameters are involved anywhere in the setup, proving, or verification process. Anyone can verify a STARK proof with nothing more than the proof itself and the public statement. This transparency aligns with H33's philosophy that cryptographic claims should be independently verifiable.

Understanding the AIR Constraint System

AIR stands for Algebraic Intermediate Representation. It is the constraint system that STARKs use to encode computations. The basic idea is straightforward: a computation is represented as an execution trace, which is a table of values where each row represents one step of the computation and each column represents a register or state variable. The AIR defines polynomial constraints that must hold between consecutive rows of the trace (transition constraints) and at specific positions in the trace (boundary constraints).

For example, if you want to prove that you computed a Fibonacci-like sequence correctly, the transition constraint would be: for every row i, the value in column C at row i+1 must equal the sum of columns A and B at row i. The boundary constraints would fix the initial values: column A at row 0 equals some known starting value. The prover generates a valid execution trace, commits to it using a Merkle tree, and then proves that the polynomial constraints hold everywhere on the trace without revealing the trace itself.

H33's AIR is more complex than a Fibonacci example. It encodes the verification of FHE computation results: given committed ciphertext inputs and a committed output, the AIR proves that the homomorphic operations were applied correctly according to the defined circuit. The trace has multiple columns representing intermediate polynomial evaluations, modular arithmetic steps, and accumulator states. The constraint degree (the highest degree polynomial in the transition constraints) directly affects the proof size and verification time, so keeping it low is an engineering priority.

How Fiat-Shamir Binds the Transcript

In an interactive STARK protocol, the verifier sends random challenges to the prover at several points during the proof process. These challenges force the prover to commit to specific values before knowing what questions the verifier will ask. The Fiat-Shamir heuristic replaces this interaction by having the prover compute the challenges as hash digests of everything committed so far. The "transcript" is the running record of all committed values and derived challenges.

The security of this transformation depends entirely on the transcript being complete and correctly ordered. If the prover can influence or predict a challenge before committing to the values that challenge is supposed to test, the soundness of the proof collapses. This is why Fiat-Shamir transcript binding is the single most critical implementation detail in any non-interactive STARK system.

In H33's implementation, the Fiat-Shamir transcript is built using SHA3-256 in a sponge construction. Each commitment (trace polynomial commitments, constraint evaluations, FRI layer commitments) is absorbed into the sponge before the corresponding challenge is squeezed out. The transcript also includes domain parameters, the AIR constraint definitions, and the public inputs, ensuring that the proof is bound not just to the committed values but to the specific computation being proved.

The Five Critical Audit Findings

Finding 1: Incomplete Domain Separation in Fiat-Shamir

The initial implementation absorbed commitments into the Fiat-Shamir transcript in the correct order, but did not include domain separation tags between different types of commitments. This meant that a trace commitment and a constraint evaluation commitment of the same byte length could theoretically be swapped by a malicious prover without changing the transcript hash. While exploiting this required constructing specific collisions, the absence of domain separation violated the principle that each commitment type should be cryptographically distinguishable in the transcript.

The fix added unique prefix bytes for each commitment type: 0x01 for trace commitments, 0x02 for constraint evaluations, 0x03 for FRI layer commitments, and 0x04 for the final FRI polynomial. The transcript now unambiguously encodes both the values and their roles in the proof.

Finding 2: Missing Boundary Constraint Validation

The verifier checked transition constraints (relationships between consecutive rows) correctly but did not independently validate boundary constraints (values at specific positions) against the public inputs. The boundary constraint values were included in the proof and accepted without verification against the expected public input values. A malicious prover could have generated a valid proof for a different set of public inputs than the verifier expected.

This was a straightforward logic error, not a subtle mathematical issue. The fix added explicit verification that the boundary constraint values claimed in the proof match the public inputs provided to the verifier. This check happens before any polynomial evaluation, ensuring that an invalid proof is rejected as early as possible.

Finding 3: Out-of-Domain Sampling Bias

STARK verification involves evaluating committed polynomials at a random out-of-domain (OOD) point. This point must be sampled uniformly from the field, excluding the evaluation domain. The initial implementation sampled the OOD point by taking a Fiat-Shamir challenge and reducing it modulo the field size, but did not explicitly check whether the result fell within the evaluation domain. If it did, the proof could verify for an incorrect computation because the constraint polynomial would be zero at that point by construction.

The probability of this occurring was approximately 2^(-40) for our parameter sizes, which sounds small until you consider that H33 processes millions of operations per second. At 2,293,766 operations per second, a 2^(-40) probability event would be expected roughly once every 12 days. The fix adds a rejection sampling loop that re-derives the OOD point with an incremented counter if it falls within the evaluation domain.

Finding 4: Insufficient Trace Commitment Binding

The Merkle tree commitment to the execution trace used SHA3-256 for internal nodes but did not include the leaf index in the hash of leaf nodes. This meant that a prover could potentially swap two leaf positions in the tree without changing the root hash, as long as the leaf values happened to hash to the same internal node parent. While this required finding specific tree structures, it weakened the binding between the trace polynomial evaluations and their positions in the evaluation domain.

The fix prepends the leaf index (as a 64-bit little-endian integer) to the leaf data before hashing. This ensures that every leaf position is cryptographically bound to its index in the tree, making position swaps impossible regardless of the leaf values.

Finding 5: Constraint Degree Overflow in Composition Polynomial

The composition polynomial is formed by combining all transition and boundary constraints with random linear coefficients derived from the Fiat-Shamir transcript. The degree of this composition polynomial must be bounded for the FRI (Fast Reed-Solomon Interactive Oracle Proof of Proximity) protocol to work correctly. The initial implementation computed the composition polynomial degree from the maximum individual constraint degree but did not account for the degree increase when multiplying constraints by their zerofier polynomials (the polynomials that enforce where each constraint should hold).

In certain configurations with high-degree transition constraints, the actual composition polynomial degree exceeded the FRI domain size, causing the FRI protocol to operate on an aliased version of the polynomial. This did not produce incorrect proofs in our test suite because our standard parameters kept constraint degrees low, but it would have caused soundness failures with alternative AIR configurations. The fix computes the composition polynomial degree as the maximum of (constraint degree + zerofier degree) across all constraints, and validates that this does not exceed the FRI domain bounds before proof generation begins.

Lessons From the Audit

The five findings above share a common pattern: each one involved a component that worked correctly in isolation but failed to maintain security guarantees at the boundary between components. Domain separation failures happen at the boundary between commitment types. Boundary constraint validation failures happen at the boundary between the proof and the public inputs. OOD sampling failures happen at the boundary between the random oracle and the evaluation domain. These are integration bugs, not algorithm bugs, and they are the hardest kind to find through standard testing because each unit test passes.

The audit reinforced our approach of differential testing: running the same computation through the STARK prover-verifier and through a reference implementation that directly evaluates the constraints on the plaintext trace. Any discrepancy between the two results indicates either a bug in the STARK system or a bug in the reference implementation, and investigating the discrepancy always reveals something worth fixing. After the audit, we expanded the differential test suite to over 1,100 test vectors covering edge cases in domain boundaries, maximum-degree constraints, and minimal-length traces.

Performance After Fixes

All five fixes were implemented without measurable impact on proving or verification performance. The domain separation tags add negligible bytes to the Fiat-Shamir transcript. The boundary constraint validation is a constant-time comparison that happens once per proof. The OOD rejection sampling loop executes at most once in practice (the probability of needing a second iteration is vanishingly small). The leaf index prepending adds 8 bytes per leaf hash. The composition degree check is a one-time validation during proof setup.

H33's STARK verification in production uses cached lookups through CacheeEngine, which brings the per-verification latency to sub-microsecond for previously seen computation patterns. The proving side runs as part of the batch attestation stage, contributing to the total pipeline latency of 38 microseconds per authentication. The fixes ensure that this performance comes without compromising the cryptographic soundness that makes the proof meaningful in the first place.

Transparent Security as a Design Principle

The choice of STARKs over Groth16 was not just a technical preference. It reflects a design principle: every security claim that H33 makes should be independently verifiable without trusting anyone, including H33. STARK proofs are transparent. The verifier needs no secret parameters. The proof can be checked by anyone with the public statement and a SHA3-256 implementation. There is no trusted setup to compromise, no ceremony to audit, and no secret toxic waste to destroy.

Combined with H33's three-family post-quantum signature scheme and the 74-byte H33-74 attestation primitive, the STARK layer completes a verification chain that runs from the raw encrypted computation all the way to an on-chain attestation record. Every link in this chain is based on well-studied mathematical hardness assumptions, implemented in memory-safe Rust, and audited for the integration bugs that standard testing misses. That is what it takes to build a cryptographic pipeline that earns the word "production."

Ready to Deploy Production FHE?

See H33's integrated pipeline in action. Schedule a demo or start with the free tier.

Verify It Yourself