The DIY Auth Stack: What 200-900ms of Bolt-On

You want production-grade biometric authentication with FHE, zero-knowledge proofs, and post-quantum signatures. There are two paths: build it yourself from open-source libraries, or use H33. One path takes 200-900ms per authentication. The other takes 1.28ms.

This post walks through the DIY path, component by component, and explains where the time goes.

200-900ms

DIY bolt-on stack

1.28ms

H33 integrated pipeline

The Shopping List

To build what H33's CollectiveAuthority pipeline does, you need six components. Each one is a separate library with its own API, its own data formats, and its own performance characteristics.

The DIY Bill of Materials 6 libraries

Microsoft SEAL 4.1 (BFV) 2.85ms (FHE-only)

+ serialization boundary ~2-5ms

Custom Shamir threshold (k-of-n) ~50-100ms

+ serialization boundary ~2-5ms

Groth16 or PLONK prover ~50-200ms

+ serialization boundary ~1-3ms

liboqs Dilithium (PQ signatures) ~1-5ms

+ SHA3 attestation glue ~5-20ms

Biometric encoding pipeline ~50-500ms

Total (best case → worst case) 220-950ms

Notice the orange lines. Every time data crosses from one library to another, you pay a serialization tax: encode the output of library A into bytes, decode it into library B's format, validate, allocate new memory. These boundaries add up.

Where the Time Actually Goes

Layer 1: FHE (2.85ms)

SEAL's BFV implementation works. Encrypt a biometric template (680µs), compute Euclidean distance on the ciphertext (1,530µs), relinearize (490µs), decrypt the result (150µs). Total: 2.85ms on server hardware. Multiply + relinearize is 71% of SEAL's time.

H33 does the same FHE operations plus threshold decryption, ZK proofs, and PQ signatures in 1.28ms—2.2× faster single-thread. The difference comes from single-modulus Q=56 (vs SEAL's multi-prime Q=109), Montgomery NTT with Harvey lazy reduction (no division in the hot path), no relinearization needed (shallow auth circuit), and batch CBD sampling (5x faster noise generation). At production scale with SIMD batching: 12.5× faster.

Layer 2: Threshold Decryption (~50-100ms)

SEAL doesn't do threshold decryption. The ciphertext decrypts with a single key. For production authentication, that's a non-starter—a single-key architecture means one compromised node exposes every user.

You need Shamir secret sharing with a k-of-n threshold. Building this from scratch means:

Splitting the secret key into n shares at setup
Coordinating k decryption parties at runtime
Combining partial decryptions with Lagrange interpolation
Validating each share's contribution

A clean implementation in C++ or Rust takes 50-100ms. Most of the time goes to coordination overhead, not the math. H33's integrated threshold runs in ~330µs + ~75µs combine because there's no IPC, no serialization, and the share validation is fused with the combination step.

Layer 3: Zero-Knowledge Proof (~50-200ms)

You need to prove the computation was correct without revealing the inputs. Off-the-shelf options:

Groth16 (snarkjs): ~100-300ms proof generation for a simple circuit
PLONK (various): ~80-200ms depending on implementation
RISC Zero: ~500ms-2s for general programs

H33 uses ZK-STARK, a Circle STARK over the M31 field with Poseidon2 hashing. Proving takes <20ms (async). Verification takes 2.09ns (cached). The proof is ~180KB. This is possible because the circuit is purpose-built for authentication—it's not a general-purpose VM executing arbitrary programs.

<20ms (async) vs 50-200ms

H33's ZK-STARK proof runs asynchronously and verifies in 2.09ns (cached). Purpose-built circuits beat general-purpose VMs by orders of magnitude.

Layer 4: Post-Quantum Signatures (~1-5ms)

liboqs provides reference implementations of NIST post-quantum algorithms. Dilithium sign + verify through liboqs typically runs 1-5ms depending on compilation flags and platform.

H33's native Rust Dilithium runs in 238µs (238µs total, 7.4µs batch-amortized). That's 4-21x faster than liboqs, primarily because H33's implementation uses the same Montgomery NTT infrastructure as the FHE layer—shared twiddle tables, shared SIMD paths, shared memory pools.

Layer 5: Attestation Chain (~5-20ms)

Building a SHA3 attestation chain that ties the FHE result, ZK proof, and PQ signature together into a single verifiable bundle. This is pure glue code, but it involves hashing, serialization, and typically JSON or protobuf encoding for the attestation record.

In a DIY stack, this is where you discover that SEAL outputs seal::Ciphertext objects, your ZK library outputs Proof structs, and liboqs outputs raw byte arrays. Making them talk to each other is the tax you pay for integration.

Layer 6: Biometric Encoding (~50-500ms)

Before any crypto happens, you need to encode the raw biometric signal into a fixed-dimension vector suitable for FHE computation. This includes normalization, quantization, and quality checks. Off-the-shelf face encoding pipelines (FaceNet, ArcFace) run 50-500ms depending on model size and hardware.

H33's encoding pipeline runs in ~300µs because it operates on pre-extracted embeddings with a purpose-built quantization scheme matched to the BFV plaintext space.

The Integration Tax

The individual library latencies explain maybe half the gap. The other half is integration overhead:

Overhead Source	DIY Cost	H33 Cost
Serialization between libraries	5-15ms	0 (fused pipeline)
Memory allocation / copying	3-10ms	~50µs (arena pools)
Format conversion	2-8ms	0 (native types)
Error handling / retry logic	1-5ms	0 (single Result chain)
Thread coordination	2-10ms	~10µs (Rayon work-stealing)
Total integration overhead	13-48ms	<100µs

In a fused pipeline, data stays in the same memory space, in the same type system, in the same thread pool. There are no serialization boundaries because there's nothing to serialize—the output of BFV encrypt is already in the format that FHE distance expects.

What You Give Up Going DIY

Beyond performance, the DIY path has structural gaps:

            Missing from the DIY stack
            No unified security proof. Each library has its own security model. The composition may have gaps.
No shared NTT infrastructure. SEAL's NTT, liboqs's NTT, and your ZK library's NTT are three separate implementations doing the same math.
No production hardening. Side-channel resistance, constant-time operations, and fault injection detection are per-library concerns.
No upgrade path. When NIST finalizes ML-DSA-87, you update liboqs. When SEAL releases 5.0, you update SEAL. Version compatibility is your problem.
6+ dependencies to audit. Each library has its own CVE surface. Supply chain risk multiplies.

        

The Math

At the best case (200ms DIY, well-optimized):

200ms ÷ 1.28ms = 139.9x slower

Throughput: 5 auth/sec (DIY) vs — (H33)
Daily: 432K auths (DIY) vs billions/day (H33)
Annual server cost at 10M auths/day:
  DIY: ~24 servers × $2.40/hr = $1,382/day
  H33: 2 servers × $2.40/hr = $115/day

At the worst case (900ms DIY, reference implementations):

900ms ÷ 1.28ms = 629.4x slower

Throughput: 1.1 auth/sec (DIY) vs — (H33)
Daily: 95K auths (DIY) vs billions/day (H33)

Bottom line

H33 is 104-468× faster than a DIY equivalent stack while providing stronger security guarantees (unified security proof, shared constant-time NTT, single audit surface) and costing a fraction of the infrastructure.

When DIY Makes Sense

To be fair: if you only need FHE without ZK proofs, without post-quantum signatures, and without threshold decryption, then SEAL is a fine library. If you're doing research on new FHE schemes, SEAL is the right tool.

But if you're building production authentication—where you need the complete security stack running at production throughput—the 200-900ms penalty of bolting libraries together isn't a performance problem. It's an architecture problem. And architecture problems don't get fixed with faster hardware.

Skip the Integration

One API call. Seven cryptographic operations. 1.28ms. Get your free API key and see the difference.

Get Free API Key

The DIY Auth Stack:
What 200-900ms of Bolt-On Crypto Looks Like

The Shopping List

The DIY Bill of Materials 6 libraries

Where the Time Actually Goes

Layer 1: FHE (2.85ms)

Layer 2: Threshold Decryption (~50-100ms)

Layer 3: Zero-Knowledge Proof (~50-200ms)

<20ms (async) vs 50-200ms

Layer 4: Post-Quantum Signatures (~1-5ms)

Layer 5: Attestation Chain (~5-20ms)

Layer 6: Biometric Encoding (~50-500ms)

The Integration Tax

What You Give Up Going DIY

Missing from the DIY stack

The Math

Bottom line

When DIY Makes Sense

Skip the Integration

Build With Post-Quantum Security

The Shopping List

The DIY Bill of Materials 6 libraries

Where the Time Actually Goes

Layer 1: FHE (2.85ms)

Layer 2: Threshold Decryption (~50-100ms)

Layer 3: Zero-Knowledge Proof (~50-200ms)

<20ms (async) vs 50-200ms

Layer 4: Post-Quantum Signatures (~1-5ms)

Layer 5: Attestation Chain (~5-20ms)

Layer 6: Biometric Encoding (~50-500ms)

The Integration Tax

What You Give Up Going DIY

Missing from the DIY stack

The Math

Bottom line

When DIY Makes Sense

Skip the Integration

Related Articles

Build With Post-Quantum Security

Related Articles