Face recognition has become the dominant biometric modality for mobile unlock, remote onboarding, border control, and continuous authentication. Apple's Face ID processes over a billion unlocks per day. Financial institutions use face verification for account recovery. Law enforcement deploys face identification across surveillance networks. The technology works—and that is precisely what makes its security implications so consequential.
A compromised password can be rotated. A leaked face template cannot. Your face is permanent, public, and increasingly reconstructable from stolen embeddings. Building a secure face recognition system requires understanding the full pipeline: how embeddings are generated, how they are attacked, and how they must be protected.
Face embeddings are biometric data with infinite shelf life. Unlike passwords or tokens, a leaked face template compromises the user permanently. Storing plaintext embeddings on a server is the biometric equivalent of storing passwords in cleartext—it is indefensible in 2026. Every architecture decision in this guide flows from this fact.
How Modern Face Recognition Works
Modern face recognition is a four-stage pipeline: detection, alignment, embedding, and matching. Each stage has distinct security properties and distinct failure modes.
Stage 1: Face Detection
Before a face can be recognized, it must be found. Detection answers a simple question: where in this image are there faces? It outputs bounding boxes, not identities. Modern detectors like RetinaFace and MTCNN (Multi-Task Cascaded Convolutional Networks) achieve >99% detection rates under controlled conditions, but degrade with extreme angles, partial occlusion, and low lighting.
Detection is not recognition. A system that detects 50 faces in a crowd has identified zero people—it has only located regions of interest for the next stage.
Stage 2: Alignment and Normalization
Detected faces are cropped and aligned to a canonical pose using facial landmark detection. Five-point alignment (two eyes, nose tip, two mouth corners) is standard. The aligned face is then resized to a fixed resolution (typically 112x112 or 160x160 pixels) and normalized.
This step is critical for accuracy. A 10-degree head tilt that is not corrected can shift the embedding significantly, causing a false rejection. Good alignment compensates for pose, expression, and moderate occlusion.
Stage 3: CNN Embedding Extraction
The aligned face is passed through a deep convolutional neural network (CNN) that compresses the image into a compact numerical vector—the face embedding. This is the core of modern face recognition.
Key Embedding Architectures
- ArcFace (2019)—Additive angular margin loss. Pushes same-identity embeddings closer together and different-identity embeddings further apart on a hypersphere. Produces 512-dimensional embeddings. Current state of the art for verification accuracy.
- CosFace (2018)—Cosine margin loss. Similar principle to ArcFace but applies the margin differently. Slightly simpler to train, marginally lower accuracy.
- FaceNet (2015)—Google's triplet-loss approach. Produces 128-dimensional embeddings. Foundational work, but outperformed by angular margin methods on modern benchmarks.
- MobileFaceNet—Lightweight variant for edge/mobile deployment. Produces 128-dimensional embeddings with <1M parameters, enabling on-device inference at 5-10ms per face on modern mobile SoCs.
These networks are trained on millions of face images (MS1M, WebFace260M, VGGFace2) and learn to project faces into a high-dimensional space where geometric distance corresponds to identity similarity. Two photos of the same person yield embeddings that are close together (cosine similarity >0.6); two different people yield embeddings that are far apart (cosine similarity <0.3).
The embedding itself is a vector of 128 to 512 floating-point numbers. It is not a compressed image—you cannot directly "see" a face by looking at its embedding. But as we will discuss later, reconstruction attacks have made this distinction increasingly fragile.
Stage 4: Matching
Matching compares a probe embedding (the face being verified) against one or more enrolled templates. The comparison is typically a cosine similarity or Euclidean distance calculation, producing a similarity score that is compared against a threshold.
import numpy as np def cosine_similarity(embedding_a, embedding_b): """Compute cosine similarity between two face embeddings.""" dot = np.dot(embedding_a, embedding_b) norm_a = np.linalg.norm(embedding_a) norm_b = np.linalg.norm(embedding_b) return dot / (norm_a * norm_b) # Typical thresholds: # > 0.60 = same person (permissive) # > 0.68 = same person (balanced) # > 0.75 = same person (strict / high-security) score = cosine_similarity(probe, enrolled) if score > threshold: grant_access()
Face Detection vs. Recognition vs. Verification
These three terms are frequently conflated, but they represent fundamentally different tasks with different security implications, different regulatory treatment, and different computational requirements.
| Task | Question Answered | Comparison | Use Case | Privacy Risk |
|---|---|---|---|---|
| Detection | Is there a face in this image? | None | Camera autofocus, crowd counting | Low |
| Verification (1:1) | Is this the person they claim to be? | 1 vs. 1 | Phone unlock, KYC, border e-gates | Medium |
| Identification (1:N) | Who is this person? | 1 vs. N | Surveillance, law enforcement, photo tagging | High |
Verification is a cooperative, consent-based process. The user claims an identity (by entering a username, scanning an ID, or selecting an account), and the system confirms the claim by comparing the live capture against a single enrolled template. The search space is 1.
Identification is a search problem. A probe face is compared against an entire gallery of N enrolled faces to determine identity. The search space can be millions. This is the mode used in surveillance and law enforcement, and it carries orders-of-magnitude greater privacy risk because it operates without active consent—the subject may not even know they are being scanned.
1:N identification requires all N templates to be accessible for comparison. This means a centralized gallery that, if breached, exposes every enrolled user. 1:1 verification can operate with a single template stored on-device or in a per-user encrypted silo. The security architectures are fundamentally different—never use a 1:N design when 1:1 verification suffices.
Accuracy: Benchmarks, Progress, and Bias
LFW and Modern Benchmarks
The Labeled Faces in the Wild (LFW) benchmark was the gold standard for face verification from 2007 to roughly 2018. It consists of 13,233 face images of 5,749 people, collected from news articles. Modern systems have saturated LFW—ArcFace achieves 99.83% accuracy, and the benchmark no longer meaningfully discriminates between top systems.
More challenging benchmarks have emerged:
| Benchmark | Challenge | SOTA Accuracy | Pairs / Subjects |
|---|---|---|---|
| LFW | Unconstrained faces | 99.83% | 6,000 pairs |
| MegaFace | Million-scale identification | 98.35% | 1M distractors |
| IJB-C | Mixed media (stills + video) | 97.2% | 3,531 subjects |
| TinyFace | Low-resolution faces (<20px) | ~72% | 5,139 faces |
| NIST FRVT 1:1 | Visa/mugshot photos | 99.97% | Millions |
NIST FRVT: The Authoritative Benchmark
NIST's Face Recognition Vendor Test (FRVT) is the most rigorous and comprehensive evaluation of face recognition algorithms. It tests commercial and academic algorithms on operational datasets (visa photos, mugshots, border crossing images) with millions of comparisons. Key findings from recent FRVT reports:
- Top-tier false non-match rates (FNMR) dropped from 4.1% (2014) to 0.08% (2024) at a fixed false match rate of 0.01%, representing a 50x improvement in one decade
- The best algorithms now exceed human performance on controlled-photo verification tasks
- Performance degrades significantly with aging (10+ years between photos), low resolution, and extreme pose variation
Racial and Demographic Bias
NIST's 2019 demographic study (NISTIR 8280) tested 189 algorithms from 99 developers and found systematic disparities:
- False positive rates were 10-100x higher for African American and Asian faces compared to Caucasian faces in one-to-one verification across most U.S.-developed algorithms
- False negatives were highest for African American women, meaning these users were most likely to be incorrectly rejected
- Algorithms developed in Asian countries showed smaller disparities across race, suggesting training data composition is a primary driver
- Age effects compound racial bias: elderly individuals from under-represented demographics had the worst performance
Demographic bias is not merely an ethical concern—it is a security failure. Higher false positive rates for specific demographics mean those groups face disproportionate risk of wrongful identification. Higher false negative rates mean those users experience disproportionate denial of service. Any production face recognition system must measure and report per-demographic error rates. Systems that only report aggregate accuracy are hiding their failure modes.
Presentation Attacks: How Face Recognition Gets Spoofed
A presentation attack (also called a spoof) attempts to fool a face recognition system by presenting a fake biometric sample to the sensor. Face recognition is uniquely vulnerable because the biometric is visible and publicly available—anyone can obtain a high-resolution photograph of your face from social media.
Attack Taxonomy
| Attack Type | Sophistication | Cost | Effectiveness vs. Basic Systems | Effectiveness vs. PAD |
|---|---|---|---|---|
| Printed photo | Low | ~$0.10 | High | Blocked |
| Screen replay (phone/tablet) | Low | ~$0 | High | Blocked |
| Video replay | Low-Medium | ~$0 | High | Partially blocked |
| Paper/cardboard mask | Medium | ~$5 | High | Blocked |
| 3D silicone mask | High | $3,000+ | High | Partially blocked |
| Hyper-realistic mask (custom) | Very High | $10,000+ | High | Often bypasses |
| Real-time deepfake (face swap) | High | ~$0 (open-source) | High | Emerging threat |
| Injection attack (camera bypass) | High | Varies | Bypasses everything | Bypasses sensor-level PAD |
Injection attacks deserve special attention. Instead of presenting a fake face to the camera, the attacker intercepts the data stream between the camera and the processing software, injecting a pre-recorded or synthesized image directly. This bypasses all sensor-level liveness detection because no physical presentation occurs. Defense requires cryptographic binding between the capture device and the processing pipeline.
ISO 30107-3: The PAD Testing Framework
ISO/IEC 30107-3 defines the standard methodology for evaluating Presentation Attack Detection (PAD) systems. It specifies two key metrics:
- APCER (Attack Presentation Classification Error Rate)—the proportion of attack presentations incorrectly classified as bona fide. Lower is better. Measured per Presentation Attack Instrument (PAI) species.
- BPCER (Bona Fide Presentation Classification Error Rate)—the proportion of legitimate presentations incorrectly classified as attacks. This is the false rejection rate of the PAD system.
ISO 30107-3 requires testing against specific PAI species (printed photos at various resolutions, screen replays at various sizes, 3D masks of various materials) and reporting APCER separately for each species. A vendor who reports only aggregate PAD accuracy is not ISO-compliant—they may be hiding a species that bypasses their system. Always demand per-species APCER/BPCER results.
The standard also defines three levels of evaluation rigor (Level 1: minimal presentation, Level 2: moderate effort, Level 3: sophisticated attacks including custom 3D masks). Most commercial PAD certifications are Level 1 or Level 2. Level 3 certification, which includes hyper-realistic masks and deepfakes, is rare and significantly more expensive to achieve.
Liveness Detection: Passive vs. Active
Liveness detection is the primary defense against presentation attacks. It determines whether the biometric sample is coming from a living person at the point of capture, as opposed to a reproduced artifact. Two fundamentally different approaches exist.
Active Liveness
Active liveness requires the user to perform a specific action during capture:
- Head turn: Rotate head left/right to demonstrate 3D geometry
- Blink detection: Blink on command to prove eye movement
- Smile/expression change: Alter facial expression on prompt
- Random challenge: Speak a random number, follow a moving dot with eyes
Advantages: Simple to implement, intuitive for users, effective against static photo attacks. Disadvantages: Adds friction (2-5 seconds per verification), vulnerable to video replay attacks that include the required motion, accessibility issues for users with motor impairments, and can be defeated by real-time deepfake face-swap software that maps the attacker's movements onto the victim's face in real time.
Passive Liveness
Passive liveness analyzes the captured image or short video without requiring any specific user action. The system determines liveness from inherent properties of the presentation:
Texture Analysis
Detects moire patterns from screens, halftone dots from print, specular highlights inconsistent with skin reflectance. LBP (Local Binary Pattern) and CNN-based texture classifiers operate on frequency-domain features invisible to the naked eye.
Depth Estimation
Monocular depth networks or structured light sensors (Face ID's 30,000-dot projector) estimate facial 3D structure. Flat presentations (photos, screens) produce zero depth variation. 3D masks may pass depth checks.
Micro-Movement Analysis
Involuntary physiological movements (blood flow-induced skin color changes, micro-saccades, pupil dilation) are present in live faces and absent in reproductions. Requires 30+ fps capture and temporal analysis over 0.5-2 seconds.
rPPG (Remote Photoplethysmography)
Detects blood pulse signal from subtle color variations in facial skin. A live face shows periodic 0.8-1.5 Hz signal corresponding to heart rate. Photos, screens, and masks produce no rPPG signal. Effective but requires 3+ seconds of video.
Passive liveness is preferred for production systems because it eliminates user friction and is harder to game. The user simply looks at the camera, and the system makes a liveness determination in the background. H33 uses multi-layer passive detection that combines texture analysis, depth estimation, and temporal features for seamless security with zero additional interaction steps.
Deepfake Detection: The Emerging Battleground
Deepfakes represent a qualitative shift in presentation attacks. Unlike static reproductions, deepfakes can generate photorealistic video of a target face in real time, rendering traditional liveness detection partially obsolete.
Real-Time Face Swap Attacks
Software like DeepFaceLive enables real-time face swapping through a virtual camera. The attacker's face is captured by a real webcam (passing liveness checks for head movement, blinking, and expression changes), while the face identity is swapped to match the victim. The resulting video stream is injected into the verification application as if it came from a physical camera.
This defeats active liveness completely—the "person" is genuinely alive and moving, but their face identity has been swapped. It also defeats most passive liveness checks because the underlying video has real-world texture, depth cues, and physiological signals from the attacker's actual face.
Current Detection Techniques
| Technique | What It Detects | Limitations |
|---|---|---|
| Frequency analysis | GAN-generated faces leave spectral artifacts (checkerboard patterns in FFT) | Latest diffusion models produce cleaner spectra; rapidly diminishing effectiveness |
| Temporal inconsistency | Frame-to-frame flicker at face boundaries, identity instability across angles | Higher-quality models are increasingly temporally consistent |
| Physiological signal analysis | Absence or distortion of rPPG signal, inconsistent pupil response | Advanced swaps preserve source rPPG from the attacker's real face |
| Source device verification | Validates that the image comes from a known, unmodified camera device | Requires hardware attestation; not universally supported |
| Injection detection | Detects virtual cameras, modified drivers, interposed video streams | Cat-and-mouse: new injection tools emerge faster than detections |
Deepfake detection is a fundamentally asymmetric problem. Detectors must identify every possible synthesis method. Attackers need only find one method that evades detection. As generative models improve, the detection window shrinks. No system should rely solely on deepfake detection for security. Defense in depth—combining liveness, device attestation, behavioral biometrics, and cryptographic template protection—is the only viable strategy.
Template Security: Why Storing Embeddings Is Dangerous
A face embedding is often treated as "not an image" and therefore safe to store. This assumption is false. Research has demonstrated that face images can be partially reconstructed from their embeddings, making template databases a high-value target.
Reconstruction Attacks
Reconstruction attacks use neural networks trained to invert the embedding process—taking a 512-dimensional vector and generating a face image that, while not pixel-identical to the original, is close enough to fool both human observers and face recognition systems.
- NbNet (2020): Reconstructed faces from ArcFace embeddings that achieved 95%+ match rate against the original enrolled template
- Vec2Face (2023): Generated high-quality 1024x1024 face images from embeddings using a StyleGAN-based decoder
- Embedding inversion as a service: Pre-trained reconstruction models are now available in open-source repositories, lowering the bar from "research lab" to "script kiddie"
The implication is stark: a database of face embeddings is functionally equivalent to a database of face photographs from a privacy and security standpoint. Any breach that exposes embeddings compromises the user's biometric identity permanently.
Template Protection Approaches
- Cancelable biometrics—Apply a non-invertible transformation (random projection, bio-hashing) to the embedding before storage. If compromised, the transformation parameters can be changed, generating a new template. Limitation: the transformation must preserve match accuracy, which limits its strength.
- Fuzzy vaults / fuzzy extractors—Bind a cryptographic key to the biometric template using error-correcting codes. The key can only be recovered by presenting a biometric sample that is "close enough" to the enrolled template. Limitation: parameter tuning is difficult, and cross-matching attacks exist.
- On-device matching—Store the template in a secure enclave on the user's device (Apple Secure Enclave, Android StrongBox). The template never leaves the device. Limitation: does not support server-side 1:N search, and device loss means template loss.
- Fully Homomorphic Encryption (FHE)—Encrypt the template and perform matching entirely in the encrypted domain. The server never sees the plaintext embedding. No reconstruction is possible because the plaintext never exists on the server. This is the only approach that provides both server-side matching and complete template confidentiality.
FHE Face Matching: Encrypted Inner Product
Fully Homomorphic Encryption enables arithmetic on ciphertexts—you can add and multiply encrypted values without decrypting them, and the result, when decrypted, matches what you would have gotten from operating on the plaintexts. For face matching, this means computing the inner product (cosine similarity) between two face embeddings without ever decrypting either one.
How It Works
Face matching reduces to an inner product. Given a probe embedding p = [p_0, p_1, ..., p_127] and an enrolled template t = [t_0, t_1, ..., t_127], the similarity score is:
score = SUM(p_i * t_i) for i = 0..127 // With BFV FHE: // 1. Probe is encrypted client-side: ct_probe = Encrypt(p, pk) // 2. Template is stored encrypted: ct_template = Encrypt(t, pk) // 3. Server computes: ct_score = FHE_InnerProduct(ct_probe, ct_template) // 4. ct_score is returned to client (or compared to encrypted threshold) // Server NEVER sees p, t, or score in plaintext
BFV (Brakerski/Fan-Vercauteren) is ideal for this computation because face embeddings can be quantized to integers without meaningful accuracy loss, and BFV's exact integer arithmetic avoids the approximation noise of CKKS. The inner product is a sequence of multiply-and-accumulate operations—exactly what BFV does efficiently.
SIMD Batching: 32 Faces per Ciphertext
BFV supports SIMD (Single Instruction, Multiple Data) batching via the CRT (Chinese Remainder Theorem). With polynomial degree N=4096 and plaintext modulus t=65537, each ciphertext contains 4096 plaintext slots. A 128-dimensional face embedding occupies 128 slots, so 32 face embeddings fit in a single ciphertext (4096 / 128 = 32).
This means a single FHE inner product operation matches 32 users simultaneously. The cost is amortized: ~1,375 microseconds for the batch, or roughly 43 microseconds per face match.
H33 Encrypted Face Matching Performance
The key security property: the server processes face templates without ever accessing plaintext embeddings. Even if the server is fully compromised—root access, memory dumps, the works—the attacker obtains only ciphertexts. Without the decryption key (which never leaves the client or a hardware security module), the templates are computationally indistinguishable from random noise.
And because BFV is a lattice-based cryptosystem, it is post-quantum secure. Shor's algorithm cannot help. Face templates encrypted with BFV today will remain secure even after cryptographically relevant quantum computers arrive. This is not true of templates encrypted with RSA or ECC-based schemes.
// H33 encrypted face matching — 32 users per ciphertext let authority = CollectiveAuthority::new(4096, 65537); // Enroll 32 users — templates stored in NTT-domain ciphertext for (i, template) in face_embeddings.iter().enumerate() { authority.batch_enroll(i, template); } // Verify — probe encrypted client-side, compared in FHE domain let encrypted_probe = authority.encrypt_probe(&live_capture_embedding); let results = authority.batch_verify_multi(&encrypted_probe); // results: Vec<bool> — one match result per user // Server saw ZERO plaintext templates. Quantum-safe by construction.
1:N Search vs. 1:1 Verification Architecture
The architectural difference between 1:1 verification and 1:N identification is not just a matter of scale—it fundamentally changes the security model.
1:1 Verification
User claims identity first. Only one enrolled template is retrieved for comparison. Compatible with per-user encryption, on-device storage, and FHE matching. The search space is exactly 1. False positive rate is the only relevant error metric. Scales to unlimited users with O(1) per-verification cost.
1:N Identification
No identity claim. Probe is compared against every enrolled template. Requires centralized gallery access. False positive rate multiplied by N (gallery size) determines practical error rate. A 0.01% FAR against 1M subjects produces ~100 false matches per query. Fundamentally harder to secure.
For 1:N search with FHE, the SIMD batching property becomes critical. Searching a gallery of 1 million faces requires comparing against 31,250 ciphertexts (1M / 32 users per ciphertext). At ~1,375 microseconds per ciphertext operation, this would take approximately 43 seconds—viable for offline processing but not real-time identification. Techniques like locality-sensitive hashing (LSH) in the encrypted domain can reduce the search space, but 1:N encrypted search remains an active research area.
For most authentication use cases, 1:1 verification is both sufficient and vastly more secure. Reserve 1:N identification for use cases that genuinely require it (law enforcement, deduplication) and apply proportionate security controls.
Edge vs. Cloud Processing
Where face recognition inference runs has major implications for latency, privacy, and attack surface.
| Factor | Edge (On-Device) | Cloud | Hybrid (FHE) |
|---|---|---|---|
| Latency | <50ms | 100-500ms (network) | ~50µs (compute) + RTT |
| Template exposure | Never leaves device | Plaintext on server | Encrypted on server |
| Model updates | Requires app update | Instant | Instant |
| Scalability | Per-device compute | Elastic | Elastic |
| Cross-device | Re-enrollment needed | Universal | Universal |
| Quantum safety | Depends on enclave | Not by default | Lattice-based |
The hybrid FHE approach combines the best properties of both: the template is encrypted on-device and sent to the cloud for matching. The cloud has elastic compute for scaling, can update matching algorithms instantly, and supports cross-device access. But the template is never exposed in plaintext on the server. This is the architecture H33 uses.
Privacy Regulations Specific to Face Recognition
Face recognition occupies a unique regulatory position. Unlike other biometrics, it can operate covertly (without the subject's knowledge or consent) and at distance. This has triggered legislation specifically targeting face recognition that goes beyond general data protection frameworks.
BIPA (Illinois Biometric Information Privacy Act)
The most consequential biometric privacy law in the United States. BIPA requires:
- Written informed consent before collecting any biometric identifier (face geometry, fingerprint, iris scan)
- Published data retention and destruction policies
- No sale, lease, or trade of biometric data
- Private right of action with statutory damages of $1,000 per negligent violation and $5,000 per intentional/reckless violation
BIPA has produced landmark settlements: Facebook paid $650 million (2021) and Google paid $100 million (2023) over face recognition features that collected face geometry without adequate consent. Clearview AI faces ongoing litigation. The private right of action provision makes BIPA uniquely dangerous for companies that handle face data carelessly.
EU AI Act
The EU AI Act (effective August 2024, phased enforcement through 2027) classifies real-time remote biometric identification in public spaces as a prohibited practice, with narrow exceptions for law enforcement. Non-real-time biometric identification systems are classified as high-risk and subject to:
- Mandatory risk assessments and conformity procedures
- Human oversight requirements
- Bias testing and documentation obligations
- Registration in an EU database before deployment
City and State Bans
Multiple U.S. jurisdictions have enacted outright bans or moratoriums on government use of face recognition:
| Jurisdiction | Scope | Year |
|---|---|---|
| San Francisco, CA | Government use prohibited | 2019 |
| Oakland, CA | Government use prohibited | 2019 |
| Boston, MA | Government use prohibited | 2020 |
| Portland, OR | Government + private sector prohibited | 2020 |
| King County, WA | Government use prohibited | 2021 |
| Vermont | Law enforcement use restricted statewide | 2020 |
| Virginia | Law enforcement use restricted statewide | 2021 |
The regulatory direction is clear: face recognition is moving toward strict consent requirements, purpose limitations, and outright bans for surveillance applications. Systems that process face data in plaintext create regulatory liability that scales with the template database size. FHE-based matching provides a technical defense: if the server never sees plaintext templates, the exposure surface for regulatory violations is fundamentally reduced.
Implementation Best Practices
Building a secure face recognition system requires getting dozens of details right. The following practices reflect lessons from production deployments, NIST FRVT results, and real-world attack post-mortems.
Enrollment Quality
- Capture multiple images (3-5 minimum) under varied lighting and slight pose changes during enrollment. Average the embeddings to create a more robust template that is less sensitive to capture conditions during verification.
- Enforce quality thresholds: Reject enrollment images with face confidence <0.95, inter-pupillary distance <60 pixels, extreme head pose (>30 degrees yaw), or heavy occlusion. Poor enrollment is the primary driver of false rejections in production.
- Re-enrollment cadence: Face appearance changes over time (aging, weight change, facial hair, glasses). Offer voluntary re-enrollment every 12-24 months and trigger mandatory re-enrollment after 5+ years or repeated verification failures.
Threshold Tuning
The verification threshold is the single most important operational parameter. It directly controls the FAR/FRR tradeoff:
| Threshold (Cosine Sim.) | FAR | FRR | Use Case |
|---|---|---|---|
| 0.55 | ~1:1,000 | ~0.5% | Low-security convenience (photo app tagging) |
| 0.65 | ~1:100,000 | ~2% | Standard authentication (phone unlock) |
| 0.75 | ~1:1,000,000 | ~5% | Financial transactions, high-security access |
| 0.85 | ~1:10,000,000 | ~12% | Border control, forensic identification |
Threshold must be tuned per-model (different embedding architectures produce different similarity distributions), per-population (demographic composition affects optimal thresholds), and per-risk-level (a bank transfer requires a different threshold than a social media login).
Fallback Mechanisms
Face recognition will fail. Cameras malfunction. Lighting conditions change. Users alter their appearance. Liveness detection produces false rejects. A production system must have graceful fallbacks:
- Multi-factor cascade: If face verification fails twice, fall back to PIN/password + SMS OTP. Never lock users out of their accounts based solely on biometric failure.
- Escalation path: After N failed attempts, route to a human review queue or offer alternative verification (ID document + selfie).
- Timeout and rate limiting: Lock out face verification (not the account) after 5 consecutive failures for 15 minutes. This prevents brute-force presentation attacks without permanently denying service.
- Audit trail: Log every verification attempt (result, confidence score, liveness result, device metadata) for forensic analysis. Never log the raw image or embedding.
Anti-Spoofing Defense in Depth
Layered Anti-Spoofing Architecture
- Device attestation—Verify the capture device is genuine (not a virtual camera) using platform-level attestation (SafetyNet, DeviceCheck, WebAuthn).
- Passive liveness—Texture, depth, and micro-movement analysis on the captured frames.
- Deepfake detection—Frequency analysis and temporal consistency checks for synthesized content.
- Injection detection—Verify the image byte stream originates from the attested camera sensor, not an interposed buffer.
- Behavioral signals—Device orientation, touch patterns during capture, and session-level behavioral biometrics as secondary confidence signals.
No single layer is sufficient. A printed photo defeats a naive camera check but fails texture analysis. A real-time deepfake passes liveness but may fail injection detection. A compromised camera app passes device attestation but fails behavioral checks. The attacker must defeat all layers simultaneously, which compounds the difficulty exponentially.
Embedding Model Selection
- Use 128-dimensional embeddings for FHE matching—128 dimensions provide 99.5%+ of the discriminative power of 512 dimensions while fitting exactly 32 users per BFV ciphertext (4096 slots / 128 dims). The accuracy tradeoff is negligible; the throughput improvement is 4x.
- Quantize to integers—BFV operates on integers. Multiply float embeddings by a scaling factor (e.g., 10,000) and round. The quantization noise is orders of magnitude smaller than inter-identity variation and does not affect match accuracy at any practical threshold.
- Normalize embeddings to unit length—L2-normalize all embeddings before storage. This ensures cosine similarity reduces to a simple inner product, eliminating the need for an expensive division in the encrypted domain.
H33 provides post-quantum face authentication with FHE biometric processing (BFV lattice-based, 32 faces per ciphertext), ML-DSA digital signatures for attestation, and ML-KEM key exchange—all in a single API call at ~50 microseconds per authentication. Face templates are encrypted at capture, matched in the encrypted domain, and never exposed as plaintext on any server. The cryptographic foundation is lattice-based and quantum-resistant by construction.