PricingDemo
Log InGet API Key
Biometrics

Biometric Accuracy Metrics: FAR, FRR, and EER Explained

|Eric Beans, CEO|14 min read

Biometric accuracy is measured in error rates. Understanding these rates is essential for deploying any biometric system, whether operating on plaintext or encrypted data. The three primary metrics, False Acceptance Rate, False Rejection Rate, and Equal Error Rate, define operational characteristics and determine suitability for different security contexts.

False Acceptance Rate (FAR)

FAR measures the probability of incorrectly accepting an unauthorized person. This is the security-critical metric: a false acceptance means an impostor gained access. FAR of 0.001 (0.1%) means one in a thousand impostor attempts succeeds. FAR is controlled by the decision threshold: raising it reduces FAR but increases false rejections. High-security applications (banking, government) require FAR below one in a million. Consumer applications accept around one in fifty thousand.

False Rejection Rate (FRR)

FRR measures incorrectly rejecting an authorized person. This is the usability metric. FRR of 0.01 (1%) means one in a hundred legitimate attempts fails. FRR is inversely related to FAR through the threshold. High FRR destroys user experience; if users must retry repeatedly, they abandon biometrics for passwords. Most production systems target FRR below 1%, with the best achieving below 0.1%.

Equal Error Rate (EER)

EER is where FAR equals FRR: a single-number accuracy summary. EER of 0.5% means 99.5% accuracy at the optimal operating point. EER is useful for comparing systems but is rarely the deployed operating point. Security-critical deployments operate with FAR much lower than FRR; convenience deployments do the reverse.

FHE and Accuracy: Zero Degradation

The critical question for encrypted biometric matching: does FHE affect accuracy? For BFV-based FHE, definitively no. BFV performs exact integer arithmetic. A homomorphic inner product produces exactly the same value as plaintext computation. No approximation, no rounding, no noise-induced error in the final result. This property is specific to BFV; CKKS introduces rounding errors that can change match outcomes, which is why H33 uses BFV for biometrics.

To be precise: the FAR, FRR, and EER of H33's encrypted matching are identical to plaintext matching with the same templates and threshold. Encryption adds zero accuracy penalty. Organizations can develop threshold policies using plaintext testing data and deploy the same thresholds with FHE matching knowing accuracy is identical.

Multi-Modal Fusion

Multi-modal systems combine modalities (face plus fingerprint) for accuracy better than any single modality. FHE supports fusion naturally: each modality produces an encrypted score, fusion (weighted sum, max rule) is performed homomorphically, and the fused score remains encrypted. If face EER is 0.5% and fingerprint is 0.3%, fused EER can be below 0.05%. H33's FHE pipeline supports this without accuracy degradation from encryption.

Production Validation

H33 validates accuracy continuously using standard biometric evaluation protocols. Encrypted and plaintext paths must produce bit-identical integer similarity scores for every test case. Every release is verified against reference benchmarks ensuring no accuracy regression. The encrypted path does not approximate; it computes exactly.

Threshold Selection Strategy

Selecting the right decision threshold is the most impactful configuration decision in any biometric deployment. The threshold determines the operating point on the FAR/FRR curve, and the right choice depends entirely on the security requirements and user experience goals of the specific deployment.

For banking and financial applications, the threshold should be set to achieve FAR below one in a million (0.0001%). This strict threshold means some legitimate users will experience false rejections (typically FRR around 1 to 3%), but the security guarantee is that unauthorized access is extremely unlikely. Failed authentications can fall back to multi-factor alternatives (OTP, security questions) without significant user experience degradation because financial applications already use multi-factor authentication.

For consumer applications like phone unlock or app login, the threshold should target FRR below 0.5% to ensure a seamless user experience. This typically corresponds to FAR around one in fifty thousand (0.002%), which provides adequate security for most consumer scenarios. The goal is to make biometric authentication feel instantaneous and reliable, encouraging adoption over password-based alternatives.

For healthcare and government applications, thresholds are often set by regulatory requirements. NIST SP 800-76 specifies minimum accuracy requirements for federal biometric systems. HIPAA-covered entities may need to demonstrate specific FAR levels for biometric access to protected health information. H33's configurable threshold system allows per-deployment tuning to meet these specific regulatory requirements.

Environmental Impact on Accuracy

Biometric accuracy is not a fixed property; it varies with environmental conditions. Lighting affects face recognition accuracy significantly: low light degrades feature extraction quality, harsh directional light creates shadows that alter facial geometry measurements, and infrared illumination (used by some sensors) produces different results than visible light. H33's quality assessment module evaluates capture conditions before processing, rejecting captures that fall below quality thresholds rather than processing low-quality data that would degrade accuracy.

For fingerprint recognition, moisture, dryness, cuts, and dirt on fingers affect capture quality. Cold temperatures constrict blood vessels, reducing fingerprint ridge contrast. Manual laborers may have worn ridges that produce poor-quality prints. H33's adaptive threshold system can adjust acceptance criteria based on sensor-reported quality metrics, maintaining consistent security properties despite varying environmental conditions.

For iris recognition, pupil dilation (affected by lighting and medication) changes the visible iris area. Glasses, contact lenses, and eye disease can affect capture quality. The iris modality is generally the most accurate single biometric but requires specialized hardware and controlled capture conditions for optimal performance.

Longitudinal Accuracy Monitoring

Biometric accuracy is not a deploy-and-forget metric. It must be monitored continuously because it changes over time. Template aging (the gradual divergence between enrolled templates and current biometric presentations) degrades accuracy unless templates are updated. Sensor degradation affects capture quality. Population changes (new users with different demographic characteristics) can shift the overall accuracy profile.

H33 provides continuous accuracy monitoring through the dashboard. Key metrics tracked include: the distribution of match scores for genuine and impostor pairs (which should show clear separation), the FRR over time (which should remain stable or improve with template updates), and the score distribution by demographic group (to detect and address bias). Alert thresholds trigger notifications when accuracy metrics deviate from established baselines, enabling proactive intervention before users experience degraded authentication quality.

All accuracy monitoring operates on encrypted data. The match scores used for monitoring are aggregated statistics derived from the encrypted matching pipeline. Individual match results are attested with H33-74 but the underlying biometric data remains encrypted throughout the monitoring process. This means accuracy monitoring itself does not create additional privacy exposure, maintaining the fundamental promise of FHE-based biometric authentication.

Demographic Fairness in Biometric Accuracy

Biometric accuracy varies across demographic groups, and this variation is a critical consideration for any production deployment. Face recognition systems have historically shown higher error rates for certain demographic groups, particularly darker-skinned individuals and women. These disparities result from biased training data, sensor characteristics (many cameras are optimized for lighter skin tones), and algorithmic design choices.

H33's FHE-based matching does not inherently address demographic accuracy disparities because the accuracy of the matching computation depends on the quality of the feature extraction model and the enrollment template, not the matching arithmetic. However, H33's monitoring system tracks accuracy metrics by demographic group (when demographic data is available and collection is consented), enabling organizations to detect and address disparities before they affect users.

The NIST Face Recognition Vendor Test (FRVT) provides benchmark data on demographic disparities across different algorithms and vendors. Organizations deploying biometric authentication should reference FRVT results when selecting their feature extraction model and should establish per-group accuracy thresholds that ensure equitable performance. H33's configurable threshold system supports per-group thresholds, allowing organizations to tune acceptance criteria to achieve equitable accuracy across all demographic groups served by the system.

Accuracy Under Adversarial Conditions

Biometric systems face adversarial threats that go beyond simple impostor attacks. Presentation attacks (spoofing) use fake biometric artifacts, such as printed face photos, silicone fingerprint molds, or high-resolution iris images, to deceive the biometric sensor. Morphing attacks blend two individuals' biometrics into a single artifact that matches both identities. Deepfake attacks use AI-generated video to impersonate a target individual in real time.

These adversarial attacks affect the FAR metric specifically: they represent sophisticated impostor attempts that may produce higher match scores than random impostor attempts. The effective FAR under adversarial conditions can be significantly worse than the FAR measured under benign conditions. Production biometric systems must incorporate liveness detection, presentation attack detection, and anti-morphing measures to maintain acceptable FAR under adversarial conditions.

H33's biometric pipeline includes liveness detection as a pre-processing step before FHE encryption. The liveness assessment runs on the client device and evaluates frame-to-frame consistency, depth estimation (on devices with depth sensors), and micro-movement analysis to distinguish live presentations from static or replayed artifacts. Only biometric captures that pass liveness detection are encrypted and submitted for matching. This client-side liveness check operates on plaintext data (on the user's own device, where privacy is not a concern) and provides the first line of defense against presentation attacks.

Benchmarking Methodology

Accurate biometric benchmarking requires careful methodology to produce meaningful results. The evaluation dataset must include both genuine pairs (same person, different captures) and impostor pairs (different people). The ratio of impostor to genuine pairs significantly affects the measured FAR: too few impostor pairs produces artificially optimistic FAR measurements that do not reflect real-world performance.

The NIST Biometric Evaluation Framework recommends using at least 10,000 genuine pairs and 1,000,000 impostor pairs for FAR measurements below 0.001%. For enterprise deployments, the evaluation dataset should be representative of the actual user population in terms of demographics, age distribution, and environmental conditions (lighting, camera quality, capture distance).

H33 provides a benchmarking toolkit that generates genuine and impostor pairs from customer-provided evaluation datasets, runs the complete FHE matching pipeline, and produces accuracy reports including FAR/FRR curves, EER, and demographic breakdowns. The toolkit runs in the same encrypted domain as production, ensuring that benchmark results accurately predict production accuracy. Organizations can run benchmarks before deployment and periodically during operation to verify that accuracy remains within acceptable bounds as the user population and environmental conditions evolve over time.

The benchmarking process itself operates entirely in the encrypted domain. Evaluation templates are encrypted using BFV, matching is performed homomorphically, and scores are collected in encrypted form. The benchmarking toolkit decrypts only the aggregate statistics (FAR, FRR, score distributions) using a benchmarking key that never exposes individual match scores or biometric templates. This ensures that even the benchmarking process maintains the privacy guarantees that define H33's approach to biometric authentication.

Accurate Biometrics, Fully Encrypted

H33 delivers production biometric accuracy with zero plaintext exposure.

Get API Key Read the Docs
Verify It Yourself