Authentication systems have traditionally forced developers to choose between security depth and latency. Adding biometric checks introduces round-trip delays. Layering on zero-knowledge proofs means another service call. Post-quantum signature verification adds yet another hop. H33's Full Stack Auth endpoint eliminates that trade-off entirely by collapsing biometric verification, fully homomorphic encryption, zero-knowledge proof generation, and post-quantum attestation into a single API call that completes in roughly 42 microseconds per authentication on production hardware.
This guide walks through what happens inside that call, how to integrate it into your application, and why the architecture decisions behind it matter for real-world security posture.
Anatomy of a Full Stack Auth Call
When your application sends a request to the /v1/auth/full-stack endpoint, four distinct cryptographic operations execute in sequence within a single process. No network hops between services, no message queues, no serialization overhead. Every stage runs in the same memory space on the same core.
Stage 1: FHE Biometric Matching
The user's biometric template — a 128-dimensional feature vector extracted from a face scan, fingerprint, or voice sample — arrives already encrypted under the BFV fully homomorphic encryption scheme. H33 uses SIMD batching to pack 32 user templates into a single ciphertext by exploiting the 4,096 polynomial slots available at security parameter N=4096 with plaintext modulus t=65537. The server computes an encrypted inner product between the probe template and the enrolled reference without ever decrypting either one.
This is not a toy demonstration of FHE. The BFV engine uses Montgomery-form NTT arithmetic with Harvey lazy reduction, keeping all butterfly values in the range [0, 2q) between stages to eliminate costly modular divisions from the hot path. Enrolled templates are stored pre-transformed in NTT domain, which means the multiply-and-accumulate step skips the forward NTT entirely. The result: a 32-user batch completes in approximately 1,109 microseconds on Graviton4 hardware, or about 35 microseconds per user for the FHE stage alone.
Traditional biometric systems decrypt templates server-side, exposing raw biometric data in memory. If an attacker compromises the server, they harvest every user's fingerprint or face encoding permanently — biometrics cannot be rotated like passwords. FHE matching means the server never sees plaintext biometrics, even during comparison.
Stage 2: Zero-Knowledge Proof Verification
Once the encrypted match score is computed, the system generates a zero-knowledge proof attesting that the match exceeded the configured threshold — without revealing the score itself or any information about the biometric templates. The ZKP subsystem uses STARK-based lookup arguments (plookup) with SHA3-256 as the hash primitive, making it post-quantum secure by construction since no elliptic curve assumptions are involved.
In production, H33 caches ZKP results using an in-process DashMap rather than an external cache service. This architectural decision proved critical at scale: when we tested a TCP-based cache proxy (Cachee over RESP) with 96 concurrent workers, throughput collapsed from 1.51 million auth/sec to just 136,000 — an 11x regression caused by connection serialization at the single proxy. The in-process DashMap delivers lookups in 0.085 microseconds, which is 44x faster than recomputing the raw STARK proof, with zero network contention.
Stage 3: Post-Quantum Attestation
The final stage produces a cryptographic attestation binding the authentication result to a timestamp and session context. H33 uses CRYSTALS-Dilithium (ML-DSA) for signatures, a lattice-based scheme selected by NIST for post-quantum standardization. A SHA3-256 digest of the authentication payload is computed, then signed and verified with Dilithium keys.
A key optimization here is batch attestation: rather than signing each of the 32 users in a batch individually, the system produces a single Dilithium signature covering the entire batch. This reduces the attestation cost from 32 sign-and-verify cycles to just one, cutting that stage's contribution from roughly 7,800 microseconds down to about 244 microseconds per batch.
| Stage | Operation | Latency (32-user batch) | PQ-Secure |
|---|---|---|---|
| 1. FHE Batch | BFV inner product | ~1,109 µs | Yes (lattice) |
| 2. ZKP | In-process DashMap lookup | ~0.085 µs | Yes (SHA3-256) |
| 3. Attestation | SHA3 digest + Dilithium sign/verify | ~244 µs | Yes (ML-DSA) |
| Total | Full pipeline | ~1,356 µs | |
| Per auth | ~42 µs |
Integration: Making Your First Call
Integrating Full Stack Auth requires three steps: enrolling a biometric template, requesting authentication, and verifying the attestation on your backend. Here is the authentication request:
POST /v1/auth/full-stack HTTP/1.1
Host: api.h33.ai
Authorization: Bearer h33_sk_your_api_key
Content-Type: application/json
{
"user_id": "usr_8f3a9b2c",
"biometric_probe": "<base64-encoded encrypted template>",
"options": {
"zk_proof": true,
"attestation": "dilithium",
"threshold": 0.92
}
}
The response includes the encrypted match result, the ZKP proof bytes, and the Dilithium signature over the attestation payload. Your backend verifies the signature using H33's public verification key, which confirms both the identity match and the integrity of the entire pipeline without trusting H33's server at the application layer.
Production Performance at Scale
On a Graviton4 c8g.metal-48xl instance (192 vCPUs, 377 GiB RAM), H33's Full Stack Auth sustains 2,172,518 authentications per second across 96 parallel workers. That number is not a burst figure — it is the sustained throughput measured over continuous load with all three cryptographic stages active on every request.
Several architectural decisions make this possible:
- System allocator over jemalloc — On aarch64, glibc's malloc is heavily optimized for ARM's flat memory model. jemalloc's arena bookkeeping becomes pure overhead under 96 workers doing tight FHE loops, causing an 8% throughput regression in testing.
- NTT-domain persistence — Public keys, enrolled templates, and intermediate ciphertext values remain in NTT (Number Theoretic Transform) form wherever possible, eliminating redundant forward and inverse transforms.
- Batch CBD sampling — Error polynomials for encryption use a single RNG call per 10 coefficients rather than per-coefficient sampling, yielding a 5x speedup in noise generation.
- Pre-computed delta*m — The plaintext scaling factor is computed once and cached, removing a u128 multiplication from the encrypt hot loop.
The fastest cryptographic operation is the one you avoid. Every optimization in H33's pipeline either eliminates a transform, fuses two steps into one, or moves computation from request time to key-generation time.
Security Model and Trust Boundaries
Full Stack Auth is designed around the principle that the server should be untrusted. The FHE layer ensures biometric plaintext never exists on the server. The ZKP layer ensures the match decision is verifiable without revealing the score. The Dilithium attestation ensures the result cannot be forged or tampered with in transit. Even if an attacker gains full read access to server memory, they obtain only encrypted ciphertexts and lattice-based signatures — both of which remain secure against quantum adversaries under current hardness assumptions.
This stands in contrast to conventional multi-factor authentication, where the server must decrypt and compare secrets at some point in the flow. With H33, the trust boundary sits at the client SDK level: as long as the biometric template is encrypted before leaving the device, the entire server-side pipeline operates on ciphertext alone.
When to Use Full Stack Auth
Full Stack Auth is the right choice when your application handles sensitive identity verification and you need defense-in-depth without latency penalties. Common use cases include financial services login flows, healthcare patient identity, government credentialing, and any system where regulatory frameworks demand that biometric data remain encrypted at rest and in transit. At approximately 42 microseconds per authentication, the endpoint introduces negligible overhead compared to a typical TLS handshake — which means security is no longer the bottleneck.
For applications that need only a subset of the pipeline — FHE matching without attestation, or ZKP generation without biometrics — H33 exposes each stage as an independent endpoint as well. But for the majority of production deployments where maximum security coverage matters, a single call to /v1/auth/full-stack delivers everything in one round trip.
Ready to Go Quantum-Secure?
Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.
Get Free API Key →