Encrypted Biometric Matching at 967µs: The First Public FHE Benchmark

The Problem with Biometric Authentication

Biometric authentication is the strongest identity signal available. Passwords can be shared. MFA tokens can be phished. But your facial geometry, fingerprint minutiae, and iris pattern are uniquely yours. The problem isn't the biometric — it's how every system handles the biometric template.

Today's biometric systems follow a pattern that would horrify any security engineer if applied to passwords: they decrypt the stored template, load it into plaintext memory, perform a distance comparison, and then (hopefully) wipe the memory. During that comparison window, the template exists unencrypted in RAM. It can be extracted via memory dumps, cold-boot attacks, speculative execution side-channels, or a compromised process on the same host.

The Irreversibility Problem

When a password database is breached, you issue new passwords. When a biometric template database is breached, you cannot issue new faces. The damage is permanent. The Illinois Biometric Information Privacy Act (BIPA) has generated over $5 billion in settlements since 2023 precisely because legislators understand this irreversibility.

The regulatory landscape reflects the severity. BIPA imposes $5,000 per willful violation. GDPR Article 9 classifies biometric data as a special category requiring explicit consent and enhanced safeguards. CCPA/CPRA treats biometric information as sensitive personal information with strict purpose limitations. HIPAA applies when biometrics are used in healthcare contexts. And these regulations are tightening, not loosening.

The core question: can you authenticate a biometric without ever seeing the biometric?

What If Templates Never Decrypted?

Fully homomorphic encryption (FHE) allows computation directly on ciphertext. You can add, multiply, and compare encrypted values without ever decrypting them. The result of the computation is itself encrypted — only the key holder can decrypt the final answer.

For biometric matching, this means: encrypt the stored template once during enrollment. Encrypt the probe template during authentication. Compute the inner product (similarity score) between the two encrypted vectors. The result — a single encrypted similarity score — is the only value that ever gets decrypted. The templates themselves remain encrypted throughout the entire pipeline.

H33 uses the BFV (Brakerski/Fan-Vercauteren) FHE scheme with parameters tuned for biometric inner product matching:

Polynomial degree N = 4,096 — determines the number of SIMD slots available for parallel computation.
Plaintext modulus t = 65,537 — satisfies the CRT batching condition t ≡ 1 (mod 2N), enabling 4,096 independent slots per ciphertext.
Single 56-bit ciphertext modulus Q — H33-128 security level with minimal noise budget overhead for inner product depth.
128-dimensional facial embeddings — standard output from modern face recognition models (ArcFace, AdaFace, CosFace).

The security of BFV rests on the Ring Learning With Errors (RLWE) problem — a lattice-based hardness assumption that is believed resistant to both classical and quantum attack. This means H33's biometric matching is post-quantum secure by construction, not by adding a post-quantum wrapper after the fact.

The Benchmark

Every number in this section is measured, not estimated. The benchmark runs on c8g.metal-48xl (AWS Graviton4, 192 vCPUs, 377 GiB RAM) using Criterion.rs v0.5 with 120-second sustained measurement windows. The full authentication pipeline — from encrypted biometric matching through post-quantum attestation and ZKP verification — breaks down as follows:

Stage	Operation	Latency	% Pipeline	Notes
1	FHE Batch (32 users)	939 µs	76.2%	BFV inner product, NTT-domain fused
2	Dilithium Attestation	291 µs	23.6%	1 sign+verify per 32-user batch
3	ZKP Cache Lookup	0.059 µs	<0.01%	STARK proof via DashMap cache
4	ML Agents	~2.35 µs	0.19%	Harvest + SideChannel + CryptoHealth
Total	32-user batch	1,232 µs	100%
	Per authentication	38.5 µs		1,232 µs ÷ 32 users

38.5 Microseconds Per Authentication

Faster than a blink of an eye (300,000 µs). Faster than a network round-trip to the nearest data center. Faster than a single frame at 240fps (4,167 µs). Thirty-two users authenticated on encrypted biometric data in the time it takes most systems to decrypt a single template.

Pipeline Breakdown

The FHE inner product dominates the pipeline at 76.2%. This is expected — homomorphic multiplication is the computationally expensive operation. Everything else combined accounts for less than a quarter of total latency.

FHE Batch (939µs)

939 µs

76.2%

Dilithium (291µs)

291 µs

23.6%

ZKP Cache (0.059µs)

<0.01%

ML Agents (~2.35µs)

0.19%

The Dilithium attestation (ML-DSA sign + verify) provides post-quantum integrity for the entire batch result. One signature covers all 32 users, amortizing the 291µs cost to ~9µs per user. The ZKP cache lookup and ML security agents add negligible overhead. See the full optimization journey for how each stage was tuned.

How SIMD Batching Works

The BFV scheme supports Single Instruction, Multiple Data (SIMD) batching through the Chinese Remainder Theorem (CRT). When the plaintext modulus t satisfies t ≡ 1 (mod 2N), the polynomial ring decomposes into N independent slots. Each slot can hold a separate integer, and all homomorphic operations execute on all slots in parallel.

With N = 4,096 and each biometric embedding using 128 dimensions:

SIMD Slot Arithmetic

4,096 slots ÷ 128 dimensions = 32 users per ciphertext. Each user's 128-dimensional embedding occupies 128 consecutive slots. A single FHE inner product operation matches all 32 users simultaneously — same ciphertext, same computational cost as matching one user.

This SIMD packing also delivers a massive storage reduction. A single unencrypted 128-dimensional float32 biometric template requires about 32 MB of FHE storage overhead when encrypted individually. With SIMD batching, 32 templates share one ciphertext, reducing storage to approximately 256 KB per user — a 128× reduction.

Enrollment

The batch_enroll() function on CollectiveAuthority packs up to 32 biometric embeddings into a single ciphertext. Templates are stored in NTT (Number Theoretic Transform) form to eliminate a forward transform during every match operation.

Rust batch_enrollment.rs

use h33::{CollectiveAuthority, BiometricEmbedding};

// Create authority with BFV parameters (N=4096, t=65537)
let authority = CollectiveAuthority::new(H33_128)?;

// Pack up to 32 user embeddings into one ciphertext
let embeddings: Vec<BiometricEmbedding> = users
    .iter()
    .map(|u| u.facial_embedding())  // 128-dim f32 vector
    .collect();

// batch_enroll() handles:
//   1. Quantize f32 → u16 (preserving cosine similarity ordering)
//   2. SIMD-pack 32 embeddings into 4096 plaintext slots
//   3. BFV encrypt → ciphertext (stored in NTT form)
//   4. Dilithium-sign the enrollment commitment
let enrolled_ct = authority.batch_enroll(&embeddings)?;

// Storage: ~256 KB per user (vs ~32 MB without batching)
// Templates NEVER exist in plaintext after this point

Verification

The batch_verify_multi() function performs the FHE inner product between an encrypted probe and all 32 enrolled templates simultaneously. The result is an encrypted similarity score vector — the only value that gets decrypted is the score, never the template.

Rust batch_verification.rs

// Encrypt the probe (live capture) biometric
let probe_ct = authority.encrypt_probe(&live_embedding)?;

// Match against all 32 enrolled templates in one operation
// Internally: NTT-domain fused inner product (ONE final INTT)
let results = authority.batch_verify_multi(
    &probe_ct,
    &enrolled_ct,
)?;

// results: Vec<MatchResult> for all 32 users
// Each MatchResult contains:
//   - user_id: which slot matched
//   - score: decrypted similarity (only this leaves ciphertext)
//   - dilithium_attestation: PQ signature over the result
//   - stark_proof: cached ZKP of correct computation

for result in &results {
    if result.score >= threshold {
        // Authenticated — template was NEVER decrypted
        grant_access(result.user_id);
    }
}

The critical property: batch_verify is constant time. Whether you match 1 user or 32 users, the operation takes approximately the same ~1,040µs. This is because the FHE computation operates on the full ciphertext regardless of how many slots are populated — empty slots simply carry zero-valued embeddings that produce zero similarity scores.

Throughput at Scale

Single-batch latency tells you how fast one authentication is. Throughput tells you how many authentications you can sustain per second when the system is fully loaded. H33's Rayon-based worker pool scales near-linearly with available cores:

Workers	Batch/sec	Auth/sec	Instance
1	~800	~25,600	Single core
32	~6,600	~213,000	c8g.16xlarge
96	~67,900	~2,172,518	c8g.metal-48xl

2.17 Million Biometric Authentications Per Second

On encrypted data. With post-quantum attestation. With zero-knowledge proof verification. Sustained over 120 seconds with ±0.71% variance. On a single instance. This is not a theoretical projection — it is a measured production benchmark on commodity cloud hardware.

The near-linear scaling is possible because each worker operates on independent ciphertexts. There is no shared mutable state in the FHE hot path. The ZKP cache uses a lock-free DashMap (in-process, no TCP serialization) that adds 0.059µs per lookup with zero contention at 96 workers. The Dilithium attestation is batched — one signature per 32-user batch — amortizing the 291µs signing cost across all users.

Variance collapsed from ±6% (v9 pipeline) to ±0.71% (v10) through elimination of allocation jitter. The system allocator (glibc on aarch64) outperforms jemalloc by 8% in this workload due to ARM's flat memory model — jemalloc's arena bookkeeping is pure overhead under tight FHE loops.

Compliance by Architecture

Most compliance frameworks require "adequate safeguards" for biometric data. FHE doesn't just satisfy the safeguard requirement — it eliminates the exposure surface entirely. If the template never decrypts, there is no plaintext to protect, no memory to wipe, and no breach window to close.

Regulation	Requirement	Traditional Biometrics	H33 FHE Biometrics
BIPA	Written consent + destruction schedule + no sale	Decrypt-compare-wipe cycle. Exposure window exists.	Templates never decrypt. No exposure window.
GDPR Art. 9	Explicit consent + DPIA + purpose limitation + data minimization	Plaintext templates in memory during processing.	Processing on encrypted data. Only similarity score decrypted.
CCPA/CPRA	Opt-out rights + reasonable security + breach notification	Breach of template DB exposes irreversible biometrics.	Breach of DB yields only ciphertext. No biometric exposure.
HIPAA	PHI encryption at rest + in transit + access controls	Encrypted at rest, decrypted during matching.	Encrypted at rest, in transit, AND during matching.

The compliance argument is straightforward: if a biometric template is encrypted with FHE and never decrypted during processing, it is encrypted at rest, in transit, and in use. This is the trifecta that every framework asks for but assumes is impossible. FHE makes it real.

For organizations operating under multiple jurisdictions simultaneously — a global bank processing face authentication across EU, US, and APAC regions — FHE biometrics provide a single architectural answer that satisfies all of them. No per-jurisdiction carve-outs. No "decrypt in the EU but not in Illinois" conditional logic.

The Optimization Journey

H33's FHE biometric pipeline did not start at 967µs. The first working prototype ran at approximately 50 milliseconds per batch — functional, but too slow for production authentication. Over six months of systematic optimization, we achieved a 50× speedup through a series of targeted improvements:

Montgomery NTT

Replaced division-based modular reduction with Montgomery multiplication in the NTT hot path. Twiddle factors stored in Montgomery form. Zero division instructions in the inner loop.

Eliminated ~40% of NTT cycle time

NTT-Domain Fused Inner Product

Instead of INTT after each polynomial multiplication, accumulate all products in NTT domain and perform a single INTT at the end. Saves 2×M transforms per call (M = moduli count).

1,375µs → 1,080µs per batch

Pre-NTT Enrolled Templates

Templates stored in NTT form after enrollment. Eliminates a forward NTT during every verification — the most expensive single operation in the pipeline.

Saved ~200µs per batch

Harvey Lazy Reduction

Butterfly values kept in [0, 2q) between NTT stages instead of fully reducing to [0, q). Halves the number of conditional subtractions per butterfly.

~15% NTT speedup

INTT Post-Processing Fusion

Precomputed fused_inv_mont factor combines inverse NTT scaling and Montgomery reduction into a single operation: 3 REDC operations reduced to 2.

~8% INTT speedup

Batch Dilithium Attestation

One Dilithium sign + verify per 32-user batch instead of per user. Amortizes the 291µs cost to ~9µs per authentication.

31× reduction in attestation overhead

In-Process DashMap ZKP Cache

Replaced TCP-based cache proxy (which caused 11× regression at 96 workers) with lock-free in-process DashMap. 0.059µs lookups, zero contention.

44× faster than raw STARK proving

For the deep technical breakdown of each optimization, including the failed approaches (arena pooling, fused NTT pre-twist, jemalloc on Graviton4), see the NTT performance deep dive and complete optimization journey.

Getting Started

H33's biometric API exposes the full encrypted matching pipeline through two endpoints: enrollment and verification. The FHE encryption, SIMD batching, Dilithium attestation, and ZKP caching are handled server-side. You send biometric embeddings; you get back cryptographically attested match results. The templates never leave their ciphertext.

cURL enroll.sh

# Enroll a biometric template (128-dim embedding)
curl -X POST https://api.h33.ai/v1/biometric/enroll \
  -H "Authorization: Bearer $H33_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "usr_a1b2c3d4",
    "embedding": [0.0234, -0.1847, 0.0912, ...],
    "modality": "face",
    "model": "arcface_r100"
  }'

# Response:
# {
#   "enrolled": true,
#   "user_id": "usr_a1b2c3d4",
#   "batch_slot": 14,
#   "fhe_scheme": "BFV",
#   "security_level": "H33-128",
#   "dilithium_commitment": "0x3a7f..."
# }

cURL verify.sh

# Verify a live biometric against enrolled template
curl -X POST https://api.h33.ai/v1/biometric/verify \
  -H "Authorization: Bearer $H33_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "usr_a1b2c3d4",
    "embedding": [0.0251, -0.1823, 0.0894, ...],
    "modality": "face",
    "threshold": 0.85
  }'

# Response:
# {
#   "match": true,
#   "score": 0.9847,
#   "latency_us": 38.5,
#   "fhe_scheme": "BFV",
#   "template_decrypted": false,
#   "dilithium_attestation": "0x8b2e...",
#   "stark_proof_id": "prf_9x8w7v6u"
# }

For SDK integration, server-side libraries, webhook configuration, and batch enrollment workflows, see the full API documentation.

What You Get

Capability	Specification
Encrypted matching latency	38.5 µs per auth (32-user batch)
Throughput (metal-48xl)	2,172,518 auth/sec sustained
Template exposure	Zero — templates never decrypt
Post-quantum security	Lattice-based FHE + Dilithium attestation
ZKP verification	STARK proof per batch (cached: 0.059µs)
Storage per user	~256 KB (128× reduction via SIMD)
Supported modalities	Face, fingerprint, iris, voice, palm
Embedding dimensions	128, 256, 512 (configurable)

Encrypted Biometric Matching.
967µs for 32 Users. Templates Never Decrypt.

The Problem with Biometric Authentication

What If Templates Never Decrypted?

The Benchmark

Pipeline Breakdown

How SIMD Batching Works

Enrollment

Verification

Throughput at Scale

Compliance by Architecture

The Optimization Journey

Getting Started

What You Get

Biometric Authentication That Never Decrypts

The Problem with Biometric Authentication

What If Templates Never Decrypted?

The Benchmark

Pipeline Breakdown

How SIMD Batching Works

Enrollment

Verification

Throughput at Scale

Compliance by Architecture

The Optimization Journey

Getting Started

What You Get

Biometric Authentication That Never Decrypts

Related Articles