This document presents production benchmarks for the BFV fully homomorphic encryption scheme as implemented in the H33 cryptographic pipeline. Measurements span five security tiers (H0 through H-256) and cover encrypt, decrypt, multiply, and inner-product operations. All numbers are from Graviton4 bare metal under sustained load.
| Tier | N | Q (bits) | t | Security (classical) | Multiplicative Depth | Use Case |
|---|---|---|---|---|---|---|
H0 | 2,048 | 40 | 65,537 | 112-bit | 1 | Development / testing |
H1 | 4,096 | 56 | 65,537 | 128-bit | 1 | Biometric auth (production) |
H33 | 4,096 | 56 | 65,537 | 128-bit | 1 | H33-128 (production default) |
H2 | 8,192 | 109 | 65,537 | 128-bit | 3 | Multi-hop computation |
H-256 | 16,384 | 218 | 65,537 | 192-bit | 7 | H33-256 deep circuits |
The production authentication pipeline uses the H33 tier (N=4096, single 56-bit modulus, t=65537). This is designated biometric_fast() in the codebase. The H-256 tier is reserved for deep computation workflows requiring 7+ multiplicative levels.
| Tier | Encrypt (median) | Decrypt (median) | CT Size |
|---|---|---|---|
H0 | 48 us | 18 us | 10 KiB |
H1 | 102 us | 38 us | 32 KiB |
H33 | 102 us | 38 us | 32 KiB |
H2 | 310 us | 105 us | 142 KiB |
H-256 | 1,180 us | 390 us | 570 KiB |
| Tier | CT x CT (median) | CT x PT (median) | Relinearize |
|---|---|---|---|
H0 | 62 us | 22 us | 41 us |
H1 | 198 us | 64 us | 128 us |
H33 | 198 us | 64 us | 128 us |
H2 | 810 us | 242 us | 520 us |
H-256 | 3,400 us | 980 us | 2,100 us |
| Tier | Vector Length | Inner Product (median) | Users per CT |
|---|---|---|---|
H0 | 128 | 320 us | 16 |
H1 | 128 | 943 us | 32 |
H33 | 128 | 943 us | 32 |
H2 | 256 | 4,200 us | 32 |
H-256 | 512 | 18,600 us | 32 |
The H33 tier packs 32 independent user authentications into a single ciphertext. The following table shows the end-to-end FHE stage (Stage 1) throughput for batch biometric authentication.
| Metric | Value |
|---|---|
| Batch size | 32 users per ciphertext |
| Batch FHE latency | 943 us |
| Per-user FHE latency | 29.5 us |
| Single-core throughput | 33,934 users/sec |
| 192-core throughput (sustained 30s) | 1,667,875 auth/sec (full pipeline) |
The 1,667,875 auth/sec figure includes the complete pipeline (FHE + attestation + ZKP), not FHE alone. FHE (Stage 1) accounts for 70% of the pipeline latency. See the Agent Infrastructure page for the full pipeline breakdown.
BFV inner-product scaling across cores (H33 tier, batch of 32):
| Cores | Batches/sec | Users/sec | Efficiency |
|---|---|---|---|
| 1 | 1,060 | 33,934 | 100% |
| 16 | 16,800 | 537,600 | 99.1% |
| 48 | 50,100 | 1,603,200 | 98.6% |
| 96 | 99,400 | 3,180,800 | 97.7% |
| 192 | 195,600 | 6,259,200 | 96.2% |
Near-linear scaling is achieved because each FHE batch is independent (no shared ciphertext state). The slight efficiency drop at 192 cores (96.2%) is attributable to L3 cache contention on the 56-bit modulus NTT butterfly operations.
The H33 BFV implementation uses a Montgomery radix-4 NTT with Harvey lazy reduction. This optimization reduces the number of modular reductions per butterfly from 2 to approximately 0.5, yielding a 1.8x speedup over the textbook radix-2 NTT at N=4096.
| NTT Variant | N=4096 Forward (median) | Reductions per Butterfly |
|---|---|---|
| Radix-2 (textbook) | 14.2 us | 2.0 |
| Radix-4 (Harvey lazy) | 7.9 us | ~0.5 |
# Build with production flags
$ cargo build --release --features bfv_bench
# Run per-tier benchmarks
$ ./target/release/examples/bfv_bench --tier h33 --iterations 100000
# Run SIMD batch throughput
$ ./target/release/examples/graviton4_bench --batch-size 32 --duration 30
# Run scaling test
$ ./target/release/examples/bfv_bench --tier h33 --threads 192 --iterations 10000Do NOT use --features parallel for BiometricFast builds. It enables Rayon work-stealing, which causes 37% throughput regression from contention at 96+ workers. Use OS-level thread pinning instead.