H33 BFV vs Microsoft SEAL 4.1.2
Executive Summary
What was measured: Both systems execute the same biometric authentication pipeline:
- Encode — Pack 128-dim biometric embedding into BFV plaintext
- Encrypt — BFV encryption of probe embedding
- Compute — Homomorphic inner product:
multiply_plain+ 7 Galois rotations +add_inplace - Decrypt — Recover similarity score from ciphertext
H33 additionally includes threshold decryption (k=3 of n=5), H33 ZKP Stark Lookup prove/verify, and Dilithium sign/verify. SEAL's number is FHE-only.
Head-to-Head: N=4096 (128-bit, NIST Level 1)
| Component | SEAL 4.1.2 | H33 BFV | Speedup |
|---|---|---|---|
| Encode | 0.04 ms | <0.01 ms | — |
| Encrypt | 0.53 ms | 0.42 ms | 1.3x |
| Compute (inner product) | 2.13 ms | 0.26 ms | 8.2x |
| Decrypt† | 0.15 ms 1 key, 1 server sees plaintext | 0.33 ms 3-of-5 threshold, zero data exposure | 3x work† |
| Pipeline Total | 2.85 ms | 1.28 ms | 2.2x |
| Auth/sec (single thread) | 350 | 781 | 2.2x |
Visual: N=4096 Pipeline
Head-to-Head: N=16384 (256-bit, NIST Level 5)
| Component | SEAL 4.1.2 | H33 BFV | Speedup |
|---|---|---|---|
| Encode | 0.18 ms | <0.01 ms | — |
| Encrypt | 2.58 ms | ~1.5 ms | 1.7x |
| Compute (inner product) | 15.49 ms | ~3.2 ms | 4.8x |
| Decrypt† | 0.84 ms 1 key, 1 server sees plaintext | ~1.4 ms 3-of-5 threshold, zero data exposure | 3x work† |
| Pipeline Total | 19.08 ms | 5.98 ms | 3.2x |
| Auth/sec (single thread) | 52 | 167 | 3.2x |
Visual: N=16384 Pipeline
The Decrypt Row: Why It's the Most Important Number
SEAL: Single-Key Decrypt
One server holds the complete secret key. One multiply operation. One server sees the plaintext biometric data in the clear.
Risk: Compromise that server → compromise every user's biometric. Single point of failure.
H33: Threshold Decrypt (k=3 of n=5)
Three independent authorities each compute a partial decryption with their key share. Shares combined via Shamir reconstruction + SHA3 integrity verification. No single server ever holds the full key. No single server ever sees the plaintext.
Guarantee: Even with 2 compromised authorities, zero data exposure.
| SEAL single-key decrypt | 1 partial decrypt | 0.15 ms |
| H33 threshold decrypt | 3 partial decrypts + Shamir + SHA3 | 0.33 ms |
| Implied per-share | H33's per-share: 0.33ms / 3 = 0.11ms | H33 faster per-op |
H33 does 3x the cryptographic work and completes in only 2.2x the time. The per-share decrypt is actually faster than SEAL's single-key decrypt. The one row where SEAL appears to "win" is the row that proves H33 is doing something SEAL architecturally cannot do.
H33 Full-Stack Component Breakdown
H33-128 (N=4096) — 1.28 ms
| Encode + quantize | 0.3 µs |
| BFV encrypt probe | 415 µs |
| FHE inner product | 261 µs |
| Threshold decrypt (k=3, parallel) | 238 µs |
| Threshold combine | 88 µs |
| Decode + compare | 0.5 µs |
| H33 ZKP Stark Lookup prove | 2.0 µs |
| H33 ZKP Stark Lookup verify | 0.2 µs |
| Dilithium sign | 81 µs |
| Dilithium verify | 25 µs |
H-256 (N=16384) — 5.98 ms
| FHE pipeline (all-in) | 6,359 µs |
| H33 ZKP Stark Lookup prove | 2.0 µs |
| H33 ZKP Stark Lookup verify | 0.2 µs |
| Dilithium sign (ML-DSA) | 80.7 µs |
| Dilithium verify | 24.8 µs |
FHE dominates at 98.3% of total latency. ZKP Stark Lookup + Dilithium combined: <108 µs (1.7%).
Corrected Product Card
| Tier | N | Full Auth | vs SEAL Pipeline | vs SEAL Compute | Auth/sec |
|---|---|---|---|---|---|
| H0 (Dev) | 1,024 | 356 µs | — | — | 2,809 |
| H33-128 | 4,096 | 1.28 ms | 2.2x | 8.2x | 781 |
| H-256 | 16,384 | 5.98 ms | 3.2x | 4.8x | 167 |
All H33 numbers include: FHE BFV + threshold decrypt (k=3/n=5, zero data exposure) + H33 ZKP Stark Lookup + Dilithium ML-DSA. SEAL numbers are single-key FHE-only.
Multi-Threaded Scaling
H33-128 (N=4096) Batch
| Workers | Auth/sec | Scaling | Per-auth |
|---|---|---|---|
| 1 | 614 | 1.0x | 1.63 ms |
| 4 | 2,116 | 3.7x | 0.47 ms |
| 8 | 3,098 | 5.4x | 0.32 ms |
| 16 | 3,433 | 6.0x | 0.29 ms |
| 48 | 3,202 | 5.6x | 0.31 ms |
H-256 (N=16384) Batch
| Workers | Auth/sec | Scaling | Per-auth |
|---|---|---|---|
| 1 | 5 | 1.0x | 207 ms |
| 4 | 19 | 3.9x | 52 ms |
| 8 | 37 | 7.7x | 27 ms |
| 16 | 32 | 6.5x | 32 ms |
| 48 | 33 | 6.8x | 30 ms |
Memory contention limits scaling beyond 8-16 workers for N=16384. These are M4 Max numbers; AWS Graviton4 (96 cores) achieves 1,148,018 auth/sec for H33-128 with SIMD batching.
Methodology: How We Made This Apples-to-Apples
1. Identical Cryptographic Parameters
| Config | N | Coeff Modulus (H33) | Coeff Modulus (SEAL) | Plain Modulus | Security |
|---|---|---|---|---|---|
| 128-bit | 4,096 | Q = 56 bits (single prime) | Q = 109 bits (3 primes: {36, 36, 37}) | t = 65,537 | 128-bit (NIST L1) |
| 256-bit | 16,384 | Q = 216 bits | Q = 216 bits (4 primes: {54, 54, 54, 54}) | t = 65,537 | 128-bit* (NIST L5) |
*H33 uses single-modulus Q=56 for 128-bit config (no relinearization needed for shallow auth circuits). SEAL uses multi-prime Q=109 to support general-purpose depth.
2. Identical Workload
- Embedding dimension: 128 (standard biometric embedding size)
- Operation: Encrypted inner product —
multiply_plain+ 7 Galois rotations +add_inplace - Pipeline: Encode → Encrypt → Compute → Decrypt
3. Fairness to SEAL
- Pre-generated keys: All SEAL keys (public, relin, Galois) generated in keygen phase, not counted in per-auth timing.
- Warmup: 2-3 warmup iterations before measurement.
- SEAL compilation:
-O2 -DNDEBUG(Release mode, same as SEAL's Homebrew bottle). - No SEAL modifications: Stock Microsoft SEAL 4.1.2 from Homebrew. Zero patches.
- Consistent timing: Both use high-resolution monotonic clocks.
4. What H33 Does Differently
- Montgomery NTT: Twiddle factors in Montgomery form. Zero division in the hot path. 44% faster NTT.
- Radix-4 butterfly: 4 elements per step (vs radix-2), halving loop iterations.
- Parallel moduli (Rayon): All CRT moduli processed concurrently. 1.8x multiply speedup.
- Batch CBD sampling: 1 RNG call per 10 coefficients. 5x faster noise sampling.
- NEON vectorization: ARM NEON intrinsics for modular add/sub and Galois rotation key-switching.
- Pre-NTT'd Galois keys: Eliminates 7 forward NTTs during rotation.
5. Measurement Details
| Parameter | H33 | SEAL |
|---|---|---|
| Benchmark tool | Criterion.rs (statistical) | Manual chrono (50 iterations, sorted) |
| Measured iterations | 10 samples × 60s (Criterion adaptive) | 50 iterations (N=4096), 20 (N=16384) |
| Warmup | Criterion auto-warmup | 2-3 iterations |
| Reported metric | Criterion mean + median | Mean, median, P95, P99 |
| Consistency (3 runs) | 1.17, 1.23, 1.34 ms (H33-128) | 2.85 ms (stable across runs) |
6. Platform
| CPU | Apple M4 Max (16 cores: 4P + 12E) |
| RAM | 128 GB unified memory |
| OS | macOS Darwin 24.5.0 (Sequoia) |
| Rust | stable, --features parallel (Rayon enabled) |
| SEAL | 4.1.2 via Homebrew (arm64 native bottle) |
| C++ compiler | AppleClang 17.0.0, -O2 |
Reproduce It Yourself
Run H33 Benchmarks
# Full pipeline (Criterion, ~5 min) cargo bench --features parallel \ --bench h33_full_auth -- "H-256 Full Pipeline" cargo bench --features parallel \ --bench h33_full_auth -- "H33-128 Full Pipeline" # Component breakdown cargo bench --features parallel \ --bench h33_full_auth -- "H-256 Components" # Scaling report cargo bench --features parallel \ --bench h33_full_auth -- "Pipeline Scaling"
Run SEAL Benchmark
# Install SEAL brew install seal # Build & run cd ~/seal-benchmark mkdir -p build && cd build cmake .. -DCMAKE_PREFIX_PATH=/opt/homebrew cmake --build . --config Release \ -j$(sysctl -n hw.ncpu) ./seal_biometric_bench
Source: ~/seal-benchmark/seal_biometric_bench.cpp
Reference: SEAL CKKS Baseline (N=8192)
| Component | SEAL CKKS N=8192 |
|---|---|
| Encode | 0.42 ms |
| Encrypt | 1.47 ms |
| Compute (inner product) | 4.34 ms |
| Decrypt | 0.32 ms |
| Pipeline Total | 6.56 ms |
| Auth/sec (single thread) | 153 |
CKKS requires N=8192 minimum for this workload. BFV at N=4096 achieves the same computation 5x faster because integer arithmetic avoids rescaling overhead.