BenchmarksStack RankingH33 FHEH33 ZKAPIsPricingPQCTokenDocsWhite PaperBlogAboutSecurity Demo

From Milliseconds to Microseconds: The H33 Performance Journey

When we started building H33, ZK proof generation took seconds. FHE operations took minutes. The idea of sub-millisecond authentication with full cryptographic privacy seemed impossible. This is the story of how we got to 1.28ms full auth and high-throughput authentication.

"The goal was never just 'fast enough.' It was authentication so fast that adding security adds zero perceived latency."

The Starting Point

Our first prototype used off-the-shelf cryptographic libraries. The numbers were... humbling:

Early 2025
2.3 seconds
Initial ZK proof generation using standard libraries

2.3 seconds for a ZK proof. Acceptable for blockchain transactions. Completely unusable for authentication. Users would abandon the login before it completed.

Phase 1: Circuit Optimization

The first breakthrough came from rethinking our ZK circuits. Generic ZK circuits are designed for arbitrary computation. We needed circuits optimized specifically for authentication.

  • Removed unnecessary constraints: Our circuits don't need to prove arbitrary computation, just identity claims
  • Optimized hash functions: Switched to ZK-friendly hashes (Poseidon) for in-circuit operations
  • Reduced public inputs: Minimized data that must be revealed
Q2 2025
180ms
After circuit optimization (12x improvement)

180ms was usable but not invisible. Users would still perceive a slight delay.

Phase 2: The Rust Rewrite

Our JavaScript implementation hit a wall. Garbage collection pauses alone could exceed our latency budget. We rewrote the cryptographic core in Rust:

  • Zero-copy operations: No unnecessary buffer allocations
  • SIMD vectorization: AVX-512 for parallel field operations
  • Memory-mapped I/O: Direct hardware access for key operations
  • No GC pauses: Deterministic memory management
Q3 2025
4.2ms
After Rust rewrite (43x improvement)

4.2ms was getting close. Sub-10ms is generally imperceptible. But we knew we could do better.

Phase 3: Parallelism and Batching

Modern CPUs have many cores. Our single-threaded prover was leaving performance on the table:

  • Parallel witness generation: Independent witness components computed simultaneously
  • Batched NTT operations: Number Theoretic Transforms grouped for cache efficiency
  • Work stealing: Idle cores pick up work from busy cores
Q4 2025
890µs
After parallelization (4.7x improvement)

Sub-millisecond! But our batch processing revealed even more opportunity.

Phase 4: Intelligent Caching

We noticed that verification was often redundant. The same proofs were being verified multiple times:

  • Proof fingerprinting: Unique identifier for each proof
  • Cache-aware verification: Skip full verification for known-valid proofs
  • Session context caching: Reuse verification work across requests

The Cache Breakthrough

Cold verification: 2.14ms → Cached verification: 32µs
67x speedup for returning users

Phase 5: Session Resume

If a user's context hasn't changed, why re-authenticate everything? Session resume verifies only that the session is still valid:

January 2026
50µs
Session resume for returning users

The Final Numbers

Our January 2026 benchmarks represent the culmination of this journey:

  • Full Auth (Turbo): 1.28ms (10,000x faster than our first prototype)
  • Session Resume: 50µs
  • Cached Proof Verify: 32µs (67x faster than cold)
  • Batch Auth (1000 users): 116µs = 8.6M/second
  • Full Auth (Turbo): 1.28ms including biometrics

What We Learned

1. Specialize ruthlessly. Generic solutions are generic-speed. Every optimization came from understanding exactly what we needed to compute.

2. Measure everything. Intuition about performance is usually wrong. Profile, don't guess.

3. Question assumptions. "ZK proofs are slow" was an assumption. We challenged it.

4. Cache is king. The fastest computation is the one you don't do. Our 67x cache speedup proves it.

5. Hardware matters. SIMD, cache locality, memory bandwidth—understanding hardware unlocked our biggest gains.

What's Next

We're not done. Our roadmap includes:

  • GPU acceleration for batch proof generation
  • Custom ASIC design for FHE operations
  • Recursive proof composition for unlimited scale
  • Sub-10µs session validation

The journey from milliseconds to microseconds taught us that performance limits are often just engineering challenges. We're excited to see how much further we can go.

Experience the Performance

1.28ms full auth. 50µs session resume. high-throughput auth. Try it yourself.

Get Free API Key