BenchmarksVerificationPricingDemo
Log InGet API Key

BFV FHE Performance

Version: 1.0.0
Status: Production
Last Updated: 2026-05-23
Hardware: AWS c8g.metal-48xl (Graviton4, 192 vCPU, 371 GiB)
FHE Scheme: Brakerski/Fan-Vercauteren (BFV)
Canonical URL: https://h33.ai/benchmarks/bfv/

1. Scope

This document presents production benchmarks for the BFV fully homomorphic encryption scheme as implemented in the H33 cryptographic pipeline. Measurements span five security tiers (H0 through H-256) and cover encrypt, decrypt, multiply, and inner-product operations. All numbers are from Graviton4 bare metal under sustained load.

2. Definitions

BFV (Brakerski/Fan-Vercauteren)
An integer-arithmetic FHE scheme supporting SIMD-style batching via the Chinese Remainder Theorem. Operations are performed on vectors of plaintext integers packed into a single ciphertext.
SIMD Batch
The technique of encoding multiple independent plaintext values into a single ciphertext polynomial using the CRT decomposition. For N=4096, up to 4096 independent values can be packed; H33 uses 32-user batches for biometric authentication.
Noise Budget
The remaining capacity for homomorphic operations before decryption fails. Each multiplication consumes noise budget. When the budget reaches zero, the ciphertext can no longer be decrypted correctly.
Inner Product
A batched homomorphic operation computing the element-wise multiply of two ciphertexts followed by a rotation-and-sum to produce a scalar result. Used in biometric matching (computing cosine similarity on encrypted feature vectors).

3. Parameter Table

TierNQ (bits)tSecurity (classical)Multiplicative DepthUse Case
H02,0484065,537112-bit1Development / testing
H14,0965665,537128-bit1Biometric auth (production)
H334,0965665,537128-bit1H33-128 (production default)
H28,19210965,537128-bit3Multi-hop computation
H-25616,38421865,537192-bit7H33-256 deep circuits

The production authentication pipeline uses the H33 tier (N=4096, single 56-bit modulus, t=65537). This is designated biometric_fast() in the codebase. The H-256 tier is reserved for deep computation workflows requiring 7+ multiplicative levels.

4. Per-Tier Benchmarks

4.1. Encrypt / Decrypt Latency

TierEncrypt (median)Decrypt (median)CT Size
H048 us18 us10 KiB
H1102 us38 us32 KiB
H33102 us38 us32 KiB
H2310 us105 us142 KiB
H-2561,180 us390 us570 KiB

4.2. Multiply Latency

TierCT x CT (median)CT x PT (median)Relinearize
H062 us22 us41 us
H1198 us64 us128 us
H33198 us64 us128 us
H2810 us242 us520 us
H-2563,400 us980 us2,100 us

4.3. Inner Product (Biometric Match)

TierVector LengthInner Product (median)Users per CT
H0128320 us16
H1128943 us32
H33128943 us32
H22564,200 us32
H-25651218,600 us32

5. SIMD Batch Throughput

The H33 tier packs 32 independent user authentications into a single ciphertext. The following table shows the end-to-end FHE stage (Stage 1) throughput for batch biometric authentication.

MetricValue
Batch size32 users per ciphertext
Batch FHE latency943 us
Per-user FHE latency29.5 us
Single-core throughput33,934 users/sec
192-core throughput (sustained 30s)1,667,875 auth/sec (full pipeline)

The 1,667,875 auth/sec figure includes the complete pipeline (FHE + attestation + ZKP), not FHE alone. FHE (Stage 1) accounts for 70% of the pipeline latency. See the Agent Infrastructure page for the full pipeline breakdown.

6. Graviton4 Scaling

BFV inner-product scaling across cores (H33 tier, batch of 32):

CoresBatches/secUsers/secEfficiency
11,06033,934100%
1616,800537,60099.1%
4850,1001,603,20098.6%
9699,4003,180,80097.7%
192195,6006,259,20096.2%

Near-linear scaling is achieved because each FHE batch is independent (no shared ciphertext state). The slight efficiency drop at 192 cores (96.2%) is attributable to L3 cache contention on the 56-bit modulus NTT butterfly operations.

7. Montgomery Radix-4 Optimization

The H33 BFV implementation uses a Montgomery radix-4 NTT with Harvey lazy reduction. This optimization reduces the number of modular reductions per butterfly from 2 to approximately 0.5, yielding a 1.8x speedup over the textbook radix-2 NTT at N=4096.

NTT VariantN=4096 Forward (median)Reductions per Butterfly
Radix-2 (textbook)14.2 us2.0
Radix-4 (Harvey lazy)7.9 us~0.5

8. Reproducibility

# Build with production flags $ cargo build --release --features bfv_bench # Run per-tier benchmarks $ ./target/release/examples/bfv_bench --tier h33 --iterations 100000 # Run SIMD batch throughput $ ./target/release/examples/graviton4_bench --batch-size 32 --duration 30 # Run scaling test $ ./target/release/examples/bfv_bench --tier h33 --threads 192 --iterations 10000

Do NOT use --features parallel for BiometricFast builds. It enables Rayon work-stealing, which causes 37% throughput regression from contention at 96+ workers. Use OS-level thread pinning instead.