The Problem with Biometric Authentication
Biometric authentication is the strongest identity signal available. Passwords can be shared. MFA tokens can be phished. But your facial geometry, fingerprint minutiae, and iris pattern are uniquely yours. The problem isn't the biometric — it's how every system handles the biometric template.
Today's biometric systems follow a pattern that would horrify any security engineer if applied to passwords: they decrypt the stored template, load it into plaintext memory, perform a distance comparison, and then (hopefully) wipe the memory. During that comparison window, the template exists unencrypted in RAM. It can be extracted via memory dumps, cold-boot attacks, speculative execution side-channels, or a compromised process on the same host.
When a password database is breached, you issue new passwords. When a biometric template database is breached, you cannot issue new faces. The damage is permanent. The Illinois Biometric Information Privacy Act (BIPA) has generated over $5 billion in settlements since 2023 precisely because legislators understand this irreversibility.
The regulatory landscape reflects the severity. BIPA imposes $5,000 per willful violation. GDPR Article 9 classifies biometric data as a special category requiring explicit consent and enhanced safeguards. CCPA/CPRA treats biometric information as sensitive personal information with strict purpose limitations. HIPAA applies when biometrics are used in healthcare contexts. And these regulations are tightening, not loosening.
The core question: can you authenticate a biometric without ever seeing the biometric?
What If Templates Never Decrypted?
Fully homomorphic encryption (FHE) allows computation directly on ciphertext. You can add, multiply, and compare encrypted values without ever decrypting them. The result of the computation is itself encrypted — only the key holder can decrypt the final answer.
For biometric matching, this means: encrypt the stored template once during enrollment. Encrypt the probe template during authentication. Compute the inner product (similarity score) between the two encrypted vectors. The result — a single encrypted similarity score — is the only value that ever gets decrypted. The templates themselves remain encrypted throughout the entire pipeline.
H33 uses the BFV (Brakerski/Fan-Vercauteren) FHE scheme with parameters tuned for biometric inner product matching:
- Polynomial degree N = 4,096 — determines the number of SIMD slots available for parallel computation.
- Plaintext modulus t = 65,537 — satisfies the CRT batching condition t ≡ 1 (mod 2N), enabling 4,096 independent slots per ciphertext.
- Single 56-bit ciphertext modulus Q — H33-128 security level with minimal noise budget overhead for inner product depth.
- 128-dimensional facial embeddings — standard output from modern face recognition models (ArcFace, AdaFace, CosFace).
The security of BFV rests on the Ring Learning With Errors (RLWE) problem — a lattice-based hardness assumption that is believed resistant to both classical and quantum attack. This means H33's biometric matching is post-quantum secure by construction, not by adding a post-quantum wrapper after the fact.
The Benchmark
Every number in this section is measured, not estimated. The benchmark runs on
c8g.metal-48xl (AWS Graviton4, 192 vCPUs, 377 GiB RAM) using Criterion.rs v0.5
with 120-second sustained measurement windows. The full authentication pipeline —
from encrypted biometric matching through post-quantum attestation and ZKP verification —
breaks down as follows:
| Stage | Operation | Latency | % Pipeline | Notes |
|---|---|---|---|---|
| 1 | FHE Batch (32 users) | 939 µs | 76.2% | BFV inner product, NTT-domain fused |
| 2 | Dilithium Attestation | 291 µs | 23.6% | 1 sign+verify per 32-user batch |
| 3 | ZKP Cache Lookup | 0.059 µs | <0.01% | STARK proof via DashMap cache |
| 4 | ML Agents | ~2.35 µs | 0.19% | Harvest + SideChannel + CryptoHealth |
| Total | 32-user batch | 1,232 µs | 100% | |
| Per authentication | 38.5 µs | 1,232 µs ÷ 32 users |
Faster than a blink of an eye (300,000 µs). Faster than a network round-trip to the nearest data center. Faster than a single frame at 240fps (4,167 µs). Thirty-two users authenticated on encrypted biometric data in the time it takes most systems to decrypt a single template.
Pipeline Breakdown
The FHE inner product dominates the pipeline at 76.2%. This is expected — homomorphic multiplication is the computationally expensive operation. Everything else combined accounts for less than a quarter of total latency.
The Dilithium attestation (ML-DSA sign + verify) provides post-quantum integrity for the entire batch result. One signature covers all 32 users, amortizing the 291µs cost to ~9µs per user. The ZKP cache lookup and ML security agents add negligible overhead. See the full optimization journey for how each stage was tuned.
How SIMD Batching Works
The BFV scheme supports Single Instruction, Multiple Data (SIMD) batching through the
Chinese Remainder Theorem (CRT). When the plaintext modulus t satisfies
t ≡ 1 (mod 2N), the polynomial ring decomposes into N
independent slots. Each slot can hold a separate integer, and all homomorphic operations
execute on all slots in parallel.
With N = 4,096 and each biometric embedding using 128 dimensions:
4,096 slots ÷ 128 dimensions = 32 users per ciphertext. Each user's 128-dimensional embedding occupies 128 consecutive slots. A single FHE inner product operation matches all 32 users simultaneously — same ciphertext, same computational cost as matching one user.
This SIMD packing also delivers a massive storage reduction. A single unencrypted 128-dimensional float32 biometric template requires about 32 MB of FHE storage overhead when encrypted individually. With SIMD batching, 32 templates share one ciphertext, reducing storage to approximately 256 KB per user — a 128× reduction.
Enrollment
The batch_enroll() function on CollectiveAuthority packs up to 32
biometric embeddings into a single ciphertext. Templates are stored in NTT (Number Theoretic
Transform) form to eliminate a forward transform during every match operation.
use h33::{CollectiveAuthority, BiometricEmbedding}; // Create authority with BFV parameters (N=4096, t=65537) let authority = CollectiveAuthority::new(H33_128)?; // Pack up to 32 user embeddings into one ciphertext let embeddings: Vec<BiometricEmbedding> = users .iter() .map(|u| u.facial_embedding()) // 128-dim f32 vector .collect(); // batch_enroll() handles: // 1. Quantize f32 → u16 (preserving cosine similarity ordering) // 2. SIMD-pack 32 embeddings into 4096 plaintext slots // 3. BFV encrypt → ciphertext (stored in NTT form) // 4. Dilithium-sign the enrollment commitment let enrolled_ct = authority.batch_enroll(&embeddings)?; // Storage: ~256 KB per user (vs ~32 MB without batching) // Templates NEVER exist in plaintext after this point
Verification
The batch_verify_multi() function performs the FHE inner product between an
encrypted probe and all 32 enrolled templates simultaneously. The result is an encrypted
similarity score vector — the only value that gets decrypted is the score, never
the template.
// Encrypt the probe (live capture) biometric let probe_ct = authority.encrypt_probe(&live_embedding)?; // Match against all 32 enrolled templates in one operation // Internally: NTT-domain fused inner product (ONE final INTT) let results = authority.batch_verify_multi( &probe_ct, &enrolled_ct, )?; // results: Vec<MatchResult> for all 32 users // Each MatchResult contains: // - user_id: which slot matched // - score: decrypted similarity (only this leaves ciphertext) // - dilithium_attestation: PQ signature over the result // - stark_proof: cached ZKP of correct computation for result in &results { if result.score >= threshold { // Authenticated — template was NEVER decrypted grant_access(result.user_id); } }
The critical property: batch_verify is constant time. Whether you match 1 user or 32 users, the operation takes approximately the same ~1,040µs. This is because the FHE computation operates on the full ciphertext regardless of how many slots are populated — empty slots simply carry zero-valued embeddings that produce zero similarity scores.
Throughput at Scale
Single-batch latency tells you how fast one authentication is. Throughput tells you how many authentications you can sustain per second when the system is fully loaded. H33's Rayon-based worker pool scales near-linearly with available cores:
| Workers | Batch/sec | Auth/sec | Instance |
|---|---|---|---|
| 1 | ~800 | ~25,600 | Single core |
| 32 | ~6,600 | ~213,000 | c8g.16xlarge |
| 96 | ~67,900 | ~2,172,518 | c8g.metal-48xl |
On encrypted data. With post-quantum attestation. With zero-knowledge proof verification. Sustained over 120 seconds with ±0.71% variance. On a single instance. This is not a theoretical projection — it is a measured production benchmark on commodity cloud hardware.
The near-linear scaling is possible because each worker operates on independent ciphertexts.
There is no shared mutable state in the FHE hot path. The ZKP cache uses a lock-free
DashMap (in-process, no TCP serialization) that adds 0.059µs per lookup
with zero contention at 96 workers. The Dilithium attestation is batched — one
signature per 32-user batch — amortizing the 291µs signing cost across all users.
Variance collapsed from ±6% (v9 pipeline) to ±0.71% (v10) through elimination of allocation jitter. The system allocator (glibc on aarch64) outperforms jemalloc by 8% in this workload due to ARM's flat memory model — jemalloc's arena bookkeeping is pure overhead under tight FHE loops.
Compliance by Architecture
Most compliance frameworks require "adequate safeguards" for biometric data. FHE doesn't just satisfy the safeguard requirement — it eliminates the exposure surface entirely. If the template never decrypts, there is no plaintext to protect, no memory to wipe, and no breach window to close.
| Regulation | Requirement | Traditional Biometrics | H33 FHE Biometrics |
|---|---|---|---|
| BIPA | Written consent + destruction schedule + no sale | Decrypt-compare-wipe cycle. Exposure window exists. | Templates never decrypt. No exposure window. |
| GDPR Art. 9 | Explicit consent + DPIA + purpose limitation + data minimization | Plaintext templates in memory during processing. | Processing on encrypted data. Only similarity score decrypted. |
| CCPA/CPRA | Opt-out rights + reasonable security + breach notification | Breach of template DB exposes irreversible biometrics. | Breach of DB yields only ciphertext. No biometric exposure. |
| HIPAA | PHI encryption at rest + in transit + access controls | Encrypted at rest, decrypted during matching. | Encrypted at rest, in transit, AND during matching. |
The compliance argument is straightforward: if a biometric template is encrypted with FHE and never decrypted during processing, it is encrypted at rest, in transit, and in use. This is the trifecta that every framework asks for but assumes is impossible. FHE makes it real.
For organizations operating under multiple jurisdictions simultaneously — a global bank processing face authentication across EU, US, and APAC regions — FHE biometrics provide a single architectural answer that satisfies all of them. No per-jurisdiction carve-outs. No "decrypt in the EU but not in Illinois" conditional logic.
The Optimization Journey
H33's FHE biometric pipeline did not start at 967µs. The first working prototype ran at approximately 50 milliseconds per batch — functional, but too slow for production authentication. Over six months of systematic optimization, we achieved a 50× speedup through a series of targeted improvements:
fused_inv_mont factor combines inverse NTT scaling and Montgomery reduction into a single operation: 3 REDC operations reduced to 2.For the deep technical breakdown of each optimization, including the failed approaches (arena pooling, fused NTT pre-twist, jemalloc on Graviton4), see the NTT performance deep dive and complete optimization journey.
Getting Started
H33's biometric API exposes the full encrypted matching pipeline through two endpoints: enrollment and verification. The FHE encryption, SIMD batching, Dilithium attestation, and ZKP caching are handled server-side. You send biometric embeddings; you get back cryptographically attested match results. The templates never leave their ciphertext.
# Enroll a biometric template (128-dim embedding) curl -X POST https://api.h33.ai/v1/biometric/enroll \ -H "Authorization: Bearer $H33_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "user_id": "usr_a1b2c3d4", "embedding": [0.0234, -0.1847, 0.0912, ...], "modality": "face", "model": "arcface_r100" }' # Response: # { # "enrolled": true, # "user_id": "usr_a1b2c3d4", # "batch_slot": 14, # "fhe_scheme": "BFV", # "security_level": "H33-128", # "dilithium_commitment": "0x3a7f..." # }
# Verify a live biometric against enrolled template curl -X POST https://api.h33.ai/v1/biometric/verify \ -H "Authorization: Bearer $H33_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "user_id": "usr_a1b2c3d4", "embedding": [0.0251, -0.1823, 0.0894, ...], "modality": "face", "threshold": 0.85 }' # Response: # { # "match": true, # "score": 0.9847, # "latency_us": 38.5, # "fhe_scheme": "BFV", # "template_decrypted": false, # "dilithium_attestation": "0x8b2e...", # "stark_proof_id": "prf_9x8w7v6u" # }
For SDK integration, server-side libraries, webhook configuration, and batch enrollment workflows, see the full API documentation.
What You Get
| Capability | Specification |
|---|---|
| Encrypted matching latency | 38.5 µs per auth (32-user batch) |
| Throughput (metal-48xl) | 2,172,518 auth/sec sustained |
| Template exposure | Zero — templates never decrypt |
| Post-quantum security | Lattice-based FHE + Dilithium attestation |
| ZKP verification | STARK proof per batch (cached: 0.059µs) |
| Storage per user | ~256 KB (128× reduction via SIMD) |
| Supported modalities | Face, fingerprint, iris, voice, palm |
| Embedding dimensions | 128, 256, 512 (configurable) |