The Problem With Most ZKP Implementations
Zero-knowledge proofs have captured the imagination of the cryptography community for good reason. The ability to prove you know something without revealing what you know is foundational to modern privacy-preserving systems. But there is a significant gap between what gets published in academic papers and what actually runs in production at scale.
Most STARK implementations you encounter today fall into one of three categories. First, there are academic prototypes that demonstrate the mathematical properties of the proof system but run in minutes or hours per proof. Second, there are blockchain-specific implementations that are optimized for on-chain verification but are designed for a very different use case than real-time authentication. Third, there are conference demos that show impressive proofs over trivial computations—proving you can add two numbers, not proving you can authenticate a biometric template against an encrypted database.
H33 sits in none of these categories. We built a STARK proof system from scratch, purpose-designed for production authentication workloads. It runs on every API call. It is integrated into the same pipeline that does FHE biometric matching and post-quantum signature generation. And it is fast enough that the ZKP stage accounts for less than one percent of our total pipeline latency.
This post explains how we got there, what architectural decisions made it possible, and why most teams attempting to ship ZKPs in production fail.
What is an AIR and Why Does Column Count Matter?
AIR stands for Algebraic Intermediate Representation. It is the constraint system that defines what a valid computation looks like in the STARK framework. Think of it as the specification that the prover must satisfy: a set of polynomial constraints over a trace of execution steps, where each row represents a state transition and each column represents a register or variable in the computation.
The number of columns in an AIR directly determines the expressiveness of the computation you can prove. A 2-column AIR can prove something like a Fibonacci sequence. A 4-column AIR can handle basic state machines. H33 uses a 7-column AIR that encodes the full authentication verification pipeline. Those seven columns represent: the input commitment hash, the FHE ciphertext fingerprint, the biometric match score, the signature verification status, the chain state hash, the timestamp, and the output attestation digest.
Each row in the trace represents one step of the verification process, and the transition constraints enforce that each step follows correctly from the previous one. If any step is tampered with—if someone tries to claim an authentication succeeded when it did not—the constraint polynomial will not vanish on the evaluation domain, and the proof will fail to verify.
The reason column count matters for production systems is that more columns mean more expressiveness per row. A narrower AIR requires more rows to express the same computation, which increases proof generation time. A wider AIR with more columns can encode a richer computation in fewer rows, at the cost of slightly larger proofs. Seven columns is our sweet spot: expressive enough to capture the full verification pipeline, narrow enough to keep proof sizes under 50 kilobytes.
The Constraint System in Detail
The constraints in our AIR fall into three categories: boundary constraints, transition constraints, and consistency constraints.
Boundary constraints enforce that the first and last rows of the trace have specific values. The first row must commit to the input data (the encrypted biometric template and the FHE parameters). The last row must contain a valid attestation digest that matches the computation. This prevents a prover from generating a valid-looking trace that does not actually correspond to a real authentication.
Transition constraints enforce the relationships between consecutive rows. For example, if row i contains a biometric match score, row i+1 must contain the correct signature verification result for that score. You cannot skip steps. You cannot reorder steps. The polynomial constraints enforce the exact computation sequence that a correct authentication follows.
Consistency constraints enforce column-level invariants. The chain state hash in column 5, for instance, must be the SHA3-256 hash of the previous chain state concatenated with the current row's data. This creates a hash chain within the proof itself, ensuring that even the internal state of the computation is tamper-evident.
When all three constraint types are satisfied simultaneously, you get a proof that the entire authentication pipeline executed correctly—from encrypted input to signed output—without revealing any of the intermediate values. The verifier learns only that the computation was correct and the output is authentic.
Why SHA3-256 and Post-Quantum Security
One of the most important architectural decisions in H33's STARK system is the choice of hash function. We use SHA3-256 (Keccak) exclusively in our proof system. This is not arbitrary. It is a deliberate decision that ties the security of our ZKP system to the security of our broader post-quantum architecture.
STARK proofs derive their security from the collision resistance of the underlying hash function. If you can find collisions in the hash function, you can forge proofs. By using SHA3-256, we ground the security of our proof system in the same family of hash functions that NIST has standardized for post-quantum use. SHA3-256 provides 128-bit security against both classical and quantum attacks (via Grover's algorithm, quantum search reduces the effective security of a hash function by roughly half, so SHA3-256's 256-bit classical security becomes approximately 128-bit quantum security).
This means our STARK proofs are post-quantum secure without requiring any lattice-based or code-based assumptions. The proof system relies only on the hardness of finding collisions in SHA3-256. This is the simplest possible security assumption for a proof system, and it is one of the most well-studied hardness assumptions in cryptography.
Compare this to SNARK systems, which typically require bilinear pairings over elliptic curves. Those curves are vulnerable to Shor's algorithm on a quantum computer. A SNARK proof generated today could be forged by a quantum computer in the future. Our STARK proofs cannot, because there is no algebraic structure for a quantum algorithm to exploit—only a hash function.
The DashMap Caching Architecture
Here is where most ZKP implementations fall apart in production: proof generation is expensive. Even optimized STARK provers take milliseconds to generate a proof. When you are processing over a million authentications per second, you cannot afford to generate a fresh proof for every single request.
H33 solves this with a caching layer built on DashMap, a concurrent hash map implemented in Rust. The architecture works as follows. When a batch of 32 authentications completes, the system generates one STARK proof for the entire batch. That proof is then stored in a DashMap keyed by the attestation digest. Subsequent verification requests for any authentication in that batch can be resolved with a single DashMap lookup instead of a full proof re-verification.
The cached lookup time is 0.062 microseconds. To put that in perspective, a full STARK verification takes approximately 71 microseconds. The cache achieves a speedup of over 1,000 times compared to re-verifying the proof from scratch. Since most verification requests hit the cache (because proofs are batch-generated ahead of time), the effective average ZKP latency in our pipeline is a fraction of a microsecond.
The DashMap choice was deliberate. We evaluated several caching strategies during development. A TCP-based cache (like a RESP proxy) introduced serialization overhead that was catastrophic at high concurrency. A mutex-protected HashMap could not handle the contention from 96 concurrent workers. DashMap uses lock-free concurrent access with sharded buckets, which means 96 threads can read and write simultaneously without blocking each other. The overhead compared to a raw HashMap is negligible, and the concurrency characteristics are exactly what a 96-core production workload requires.
Batch Proof Generation: 32 Authentications Per Proof
Generating a STARK proof for each individual authentication would be wasteful. The fixed costs of proof generation—committing to the trace, computing the FRI (Fast Reed-Solomon Interactive Oracle Proof of Proximity) layers, and producing the Merkle tree—are amortized across all rows in the trace. A trace with 32 rows takes only marginally longer to prove than a trace with 1 row.
H33 batches 32 authentications into a single proof. Each authentication occupies a set of rows in the 7-column trace. The transition constraints link the authentications together in sequence, so the proof attests to the correctness of all 32 simultaneously. The resulting proof is approximately 45 kilobytes and can be verified in 71 microseconds.
This batching strategy has three benefits. First, it amortizes proof generation cost. The per-authentication cost of the ZKP stage drops to the point where it accounts for less than one percent of pipeline latency. Second, it creates a natural Merkle aggregation: each batch proof is a commitment to 32 authentications, and batches can themselves be aggregated into higher-level proofs (this is covered by our patent, Claims 124-125). Third, it aligns with our FHE batching: the BFV inner product already processes 32 users per ciphertext, so the ZKP batch boundary matches the FHE batch boundary, eliminating any awkward misalignment between the two stages.
The Full Pipeline: FHE, Attestation, and ZKP
The STARK proof does not exist in isolation. It is the third and final stage of a three-stage pipeline that processes every authentication request.
Stage 1 is FHE Batch Processing. The BFV homomorphic encryption engine computes an inner product between the encrypted biometric template and the stored reference, processing 32 users per ciphertext. This stage takes approximately 943 microseconds per batch on Graviton4 hardware and accounts for 70 percent of total pipeline latency.
Stage 2 is Batch Attestation. The results from Stage 1 are hashed with SHA3-256 and signed with Dilithium (ML-DSA), the NIST-standardized post-quantum signature algorithm. One signature covers the entire batch of 32 authentications, followed by an immediate verification to ensure the signature is valid before returning it to the caller. This stage takes approximately 391 microseconds and accounts for 29 percent of pipeline latency.
Stage 3 is the ZKP Cached Lookup. The attestation digest from Stage 2 is used to look up the corresponding STARK proof in the DashMap cache. If found (which it almost always is, because proofs are pre-generated during batch attestation), the lookup returns in 0.062 microseconds. If not found, a fresh proof is generated on demand. This stage accounts for less than one percent of total latency.
The total pipeline latency for a batch of 32 authentications is approximately 1,345 microseconds, which works out to about 42 microseconds per authentication. The STARK proof is the fastest stage by far, not because STARK proofs are inherently cheap, but because the caching architecture makes verification effectively free.
Why Most Teams Fail to Ship ZKPs in Production
We have talked to dozens of engineering teams attempting to integrate zero-knowledge proofs into production systems. The failure modes are remarkably consistent.
The first failure mode is treating proof generation as a request-time operation. If you generate a proof on every request, your latency budget is dominated by the prover. Even a fast prover takes milliseconds. At any reasonable request rate, this becomes the bottleneck. The solution is to decouple proof generation from request serving—generate proofs in batches, cache them, and serve verification from the cache.
The second failure mode is choosing the wrong proof system for the workload. SNARKs have smaller proofs but require a trusted setup and are not post-quantum secure. Bulletproofs are compact but slow to verify. PLONK is flexible but the prover is expensive. STARKs have larger proofs but are transparent (no trusted setup), post-quantum secure, and have relatively fast verification. For an authentication workload where you need post-quantum security and the proofs are verified server-side (so proof size is not a critical constraint), STARKs are the correct choice.
The third failure mode is using an off-the-shelf ZKP library without understanding the constraint system. Generic ZKP toolkits (like Circom or Noir) compile high-level programs into constraint systems. The resulting systems are often orders of magnitude less efficient than a hand-crafted AIR, because the compiler cannot make domain-specific optimizations. H33's 7-column AIR was designed by hand to encode exactly the computation we need to prove, with no wasted constraints and no unnecessary columns.
The fourth failure mode is ignoring concurrency. Most ZKP libraries are single-threaded. Rust's type system and ownership model let us build a proof generation pipeline that safely runs across 96 concurrent workers without locks or race conditions. The DashMap cache handles concurrent reads and writes from all 96 workers without serialization. This is not a feature you can bolt on after the fact; it has to be designed into the architecture from the beginning.
Proof Verification: What the Verifier Actually Checks
When a verifier receives a STARK proof from H33, what are they actually checking? The verification algorithm performs four steps.
First, it checks the boundary constraints. The verifier confirms that the committed first and last rows of the trace have the values claimed by the prover. This establishes that the computation started with the correct inputs and ended with the claimed output.
Second, it checks the transition constraints at random points. Instead of verifying every row transition (which would require the full trace), the verifier uses the FRI protocol to spot-check that the constraint polynomials are low-degree. If the constraints are satisfied at enough random points, the probability that they are satisfied everywhere is overwhelming. This is the source of STARK soundness: a cheating prover would have to construct a high-degree polynomial that looks low-degree at random evaluation points, which is computationally infeasible.
Third, it verifies the Merkle commitments. The trace and the constraint evaluations are committed via Merkle trees. The verifier checks that the values opened during the FRI protocol are consistent with the Merkle root. This binds the prover to a specific trace before the random challenges are chosen.
Fourth, it checks the FRI layers. The FRI protocol involves multiple rounds of polynomial folding, where each round reduces the degree of the polynomial. The verifier checks that each folding step was performed correctly. If all FRI layers check out, the verifier accepts the proof.
The entire verification process takes 71 microseconds for a fresh proof, or 0.062 microseconds for a cached lookup. The verification is deterministic and requires no interaction with the prover, making it suitable for asynchronous and offline verification scenarios.
The Open Verifier Model
H33 open-sourced the STARK verifier while keeping the proving engine proprietary. This is a deliberate architectural decision that aligns with how we think about trust in cryptographic systems.
The verifier is the component that establishes trust. If you cannot inspect the verification algorithm, you are trusting H33's assertion that the proofs are valid. By open-sourcing the verifier, we eliminate that trust assumption. Anyone can download the verifier, inspect the code, and independently verify any proof generated by H33. The verification does not require any secret keys, trusted hardware, or proprietary software.
The prover, on the other hand, is the component that generates value. It contains the optimized constraint system, the batch generation pipeline, the caching layer, and the integration with our FHE and signature engines. These are proprietary because they represent years of engineering investment and contain patented technology. But their correctness is verifiable from the outside: if the prover generated an invalid proof, the open-source verifier would reject it.
This model—open verification, proprietary computation—is analogous to how TLS works. The TLS verification algorithm is public and standardized. Certificate authorities and TLS implementations are proprietary businesses. Trust flows from the public verifiability of the protocol, not from trust in any particular implementation.
Performance Characteristics on Graviton4
All of H33's production benchmarks are run on AWS c8g.metal-48xl instances with 192 vCPUs powered by Graviton4 processors. Here are the ZKP-specific performance numbers from our production environment.
Proof generation for a 32-authentication batch takes approximately 12 milliseconds. This includes trace generation, constraint evaluation, FRI commitment, and Merkle tree construction. Since proof generation happens asynchronously (not on the request path), this latency does not affect request-serving performance.
Fresh proof verification takes 71 microseconds. This is the time to perform the full FRI verification, including Merkle path checks and constraint evaluation at random points.
Cached proof lookup takes 0.062 microseconds. This is a DashMap get operation that returns the pre-verified proof status for a given attestation digest.
In our sustained benchmark (120 seconds of continuous load), the ZKP stage adds less than 0.4 microseconds of average latency per authentication, accounting for the mix of cache hits and occasional cache misses. At the production throughput of 2,209,429 authentications per second, the ZKP system processes approximately 69,000 proof verifications per second (one per batch of 32) while serving millions of cached lookups.
Comparison to Other Approaches
How does H33's STARK approach compare to alternative ZKP strategies used in production?
Groth16 (used by Zcash and many DeFi protocols) produces very small proofs (128 bytes) and has fast verification (approximately 1 millisecond). However, it requires a trusted setup ceremony, is not post-quantum secure, and the prover is extremely expensive. Groth16 is optimized for on-chain verification where proof size is critical. For server-side authentication, the trusted setup is an unnecessary liability.
PLONK (used by several L2 rollups) eliminates the per-circuit trusted setup but still requires a universal setup. It has flexible constraint expressiveness but slower prover performance than STARKs. PLONK proofs are also based on elliptic curve pairings and are therefore not quantum-resistant.
Bulletproofs (used by Monero) require no trusted setup and produce compact proofs. However, verification is linear in the proof size, making it slower than STARKs for complex computations. Bulletproofs are also based on the discrete logarithm problem and are vulnerable to quantum attacks.
H33's STARK system has larger proofs (approximately 45 kilobytes versus 128 bytes for Groth16) but is the only option that is simultaneously transparent, post-quantum secure, and fast enough for real-time authentication workloads. Since our proofs are verified server-side and cached, the larger proof size is irrelevant—we are not putting these proofs on a blockchain where every byte costs gas.
Patent Coverage and Intellectual Property
H33's STARK proof system is covered by our patent portfolio, specifically Claims 124-125 which address batched Merkle response attestation. The batch-prove-then-cache architecture, the 7-column AIR constraint system, and the integration with the FHE and signature pipeline are all part of the protected technology.
The open-source verifier is released under a permissive license, and anyone can use it to verify H33 proofs. The proving engine, caching layer, and integration pipeline remain proprietary.
What Comes Next
We are working on three extensions to the STARK system that will ship in the coming months.
First, recursive proof composition. Currently, each batch proof is independent. Recursive composition would allow us to combine multiple batch proofs into a single proof that attests to thousands of authentications. This is relevant for audit trails and compliance reporting, where a single proof can demonstrate that an entire day's worth of authentications was computed correctly.
Second, client-side verification. While the verifier is already open-source, we are building a WebAssembly version that can run in the browser. This will allow end users to verify proofs directly without trusting any server.
Third, cross-proof attestation. We are building the infrastructure to link STARK proofs with our H33-74 attestation format, so that a 74-byte attestation can reference a STARK proof that provides a deeper level of computational integrity verification.
Conclusion
Shipping STARK proofs in production is not a matter of plugging in a library. It requires a custom constraint system designed for the specific computation, a caching architecture that decouples proof generation from request serving, a concurrency model that handles dozens of parallel workers, and deep integration with the rest of the cryptographic pipeline.
H33 has built all of this. Our 7-column AIR encodes the full authentication verification pipeline. Our DashMap cache delivers proof lookups in 0.062 microseconds. Our batch generation amortizes proving costs across 32 authentications. And the entire system is post-quantum secure, relying only on SHA3-256 collision resistance.
If you are evaluating ZKP solutions for production use, ask your vendor these questions: What is your constraint system? How many columns is your AIR? What is your cached verification latency? How do you handle concurrent proof generation? If they cannot answer, they are not shipping real STARK proofs. They are shipping demos.
Try H33 STARK Verification
The H33 STARK verifier is open source. Download it, inspect it, and verify any proof generated by the H33 platform independently.
Get API Key Learn About ZK Lookups