BenchmarksStack RankingH33 FHEH33 ZKAPIsPricingPQCTokenDocsWhite PaperBlogAboutSecurity Demo
The Fastest FHE Implementation in Existence

Your Data Is Never Decrypted.
Not During Auth. Not During Matching. Not Ever.

Fully Homomorphic Encryption lets H33 verify your identity by computing directly on encrypted data. The server never sees your plaintext — because it never needs to.

0 bytes
Plaintext on server
1.28ms
Full encrypted auth
3
Proprietary libraries · 23,296 lines of Rust

Every Other Auth System Has a Fatal Flaw

Traditional biometric authentication has a fundamental design flaw: to compare your biometric against a stored template, the system must decrypt both.

That moment of decryption — even if it lasts only milliseconds — is when your biometric data can be intercepted, copied, or stolen. Unlike a password, you cannot change your fingerprint or face.

Every major biometric breach exploits this same window.

H33 eliminates the decryption window entirely. Matching happens on ciphertext. Your biometric template is never reconstituted as plaintext on the server.
Traditional Authentication
Biometric Encrypt DECRYPT Compare Result
Plaintext exposed for 2–50ms per auth
H33 FHE Authentication
Biometric Encrypt Match on Ciphertext Threshold Decrypt
Plaintext never exists on the server. Zero exposure.

One Problem, Three Solutions

H33 built three purpose-built FHE libraries from scratch — 23,296 lines of Rust, zero external dependencies. Each optimized for a different workload.

H33 BFV Production Default
Integer arithmetic FHE for identity verification
The workhorse. Verifies biometrics, compares encrypted integers, and handles exact-match operations — all without decryption. Powers every H33 auth call in production.
  • Biometric authentication (face, fingerprint, iris)
  • Identity verification and document matching
  • Encrypted database comparisons
4,574
Lines of Rust
32
Users per ciphertext
256KB
Per user (128× reduction)
~50µs
Per auth (batched)
Under the Hood
Bajard Full-RNS multiplication eliminates all BigInt operations — 4× faster multiply. Montgomery NTT with Harvey lazy reduction keeps values in [0,2q) between stages. SIMD batching packs 32 users per ciphertext (4096 slots ÷ 128 dims). CRT condition: t=65537, N=4096. NTT-form enrolled templates skip the forward transform in multiply_plain_accumulate. Pre-NTT public key eliminates clone+NTT per encrypt.
H33 CKKS ML / AI Workloads
Approximate arithmetic FHE for machine learning
Runs ML inference, scoring, and analytics on encrypted data. Full bootstrapping means unlimited computation depth — something Microsoft SEAL cannot do.
  • ML inference on encrypted patient data
  • Encrypted credit scoring and risk analysis
  • Privacy-preserving analytics and aggregation
2,786
Lines of Rust
Unlimited
Multiplicative depth
Auto
Turbo ↔ Precision switching
45.2µs
Encode real numbers
Under the Hood
Complex number encoding via canonical embedding. Full bootstrapping via Chebyshev polynomial approximation for unlimited computation depth. Hybrid bridge (ckks_bridge.rs, 613 lines) auto-switches between Turbo (N=4096, fast) and Precision (N=16384, deep) contexts with ~1.7ms one-time context switch cost. Rescaling for noise management across levels.
H33 BFV32 Mobile / Edge
ARM-native 32-bit FHE for mobile devices
Optimized for phones and edge devices. Uses native ARM NEON SIMD instructions for near-2× faster transforms on Apple Silicon. Same wire format as BFV — fully server-compatible.
  • On-device biometric auth (iPhone, Android)
  • Edge computing and IoT authentication
  • Low-power / battery-constrained environments
631
Lines of Rust
vmull_u32
4-lane ARM NEON SIMD
~2×
Faster NTT on M1–M4
Same
Wire format (server-compatible)
Under the Hood
32-bit coefficient representation with moduli < 2^30. Native vmull_u32 for 4-lane ARM NEON SIMD on Apple Silicon (M1–M4). Polynomial<u32>Ntt32BfvContext32 pipeline. Serialization normalizes to shared ciphertext wire format so the server can use 64-bit BfvContext with AVX-512.

Authentication Without Decryption

A single API call. Four cryptographic operations. Your plaintext never touches the server.

🤚
Capture
User’s device captures biometric
🔒
Encrypt
Client-side BFV encrypt on device
0.42ms
🛡
Match
FHE inner product on ciphertexts (server-side)
0.26ms
🔑
Result
Threshold k-of-n decrypt (distributed)
0.33ms
Total: 1.28ms

The server processes your auth request without ever reconstructing your biometric as plaintext.

BFV Encrypt 0.42ms + FHE Inner Product 0.26ms + Threshold Decrypt 0.33ms + ZKP STARK ~2.2µs + Dilithium ~106µs + Encode/Other ~0.12ms

23,296 Lines of Proprietary Cryptographic Infrastructure

Every H33 FHE library runs on a shared engine of optimized primitives — all written from scratch in Rust with zero external FHE dependencies.

23,296
Lines of Rust
4
SIMD Platforms
2
GPU Backends
0
External FHE Deps
Threshold FHE
1,006 lines
k-of-n Shamir secret sharing. Partial decryptions are noisy shares — no single server ever sees the plaintext result.
Montgomery NTT
1,549 lines
Radix-4 with Harvey lazy reduction. Twiddles in Montgomery form — zero division in the hot path.
Speculative Execution
674 lines
Evaluates multiple parameter sets in parallel. First valid result wins. 30–50% latency reduction.
Arena Allocators
638 lines
Lock-free pre-allocated buffers. Zero allocation in the hot path. CiphertextPool + PolynomialPool.
GPU Acceleration
2,152 lines
CUDA NTT offload (1,610 lines) + Apple Metal compute shaders (542 lines) for batch workloads.
Multi-Platform SIMD
ARM NEON · AVX-512 · CUDA · Metal
Platform-optimized NTT butterflies and branchless Galois rotation with vectorized key-switching.

The Fastest FHE — Verified on AWS Graviton4

c8g.metal-48xl · 192 vCPUs · 377 GiB · ARM NEON · February 2026

1.2M auth/sec sustained · ~50µs per auth (batched) · 96 workers
Dev/Testing Only
H0 Mode
N = 1,024 · ~57-bit security
1.28ms
Full auth pipeline · Classical only
NIST Level 1
H33 Mode
N = 4,096 · 128-bit security
0.42ms
BFV encrypt · Production default
NIST Level 5
H-256 Mode
N = 8,192 · 256-bit security
~1.1ms
Maximum security tier
1.28ms Pipeline Breakdown
0.42ms
0.26ms
0.33ms
0.27ms
BFV Encrypt FHE Inner Product Threshold Decrypt ZKP + Dilithium + Other
1.85ms
NTT 16384 Roundtrip
Target: <2.5ms ✓ PASS
45.2µs
CKKS Encode Real
Target: <100µs ✓ PASS
<100µs
FHE Computation Cache
10–100× speedup
67×
Cache Speedup
Repeated operations

How H33 Compares

For teams evaluating FHE implementations, here is how H33 stacks up against the two most common alternatives.

vs. Microsoft SEAL — Operation Benchmarks

OperationMicrosoft SEALH33-FHEH33 Advantage
BFV Multiply (n=16k)180ms45ms4× faster
Relinearize90ms25ms3.6× faster
Key Switch90ms25ms3.6× faster
NTT (n=64k)3.5ms<1ms3.5× faster
CKKS Multiply (n=16k)50ms20ms2.5× faster
FHE SchemesBFV + CKKSBFV + CKKS + BFV323 integrated
BootstrappingNot supportedFull CKKS bootstrapUnlimited depth
GPU AccelerationCPU onlyCUDA + MetalMulti-platform
AWS Graviton4 (c8g.metal-48xl, ARM NEON). February 2026.

vs. Zama — The Commercial Alternative

ZAMA
  • SpeedBaseline
  • License$100K–$500K+
  • Token$ZAMA required
  • StackFHE only
H33
  • Speed~100× faster
  • License$0. Ever.
  • TokenNone
  • StackFHE + ZK + PQ + Bio

10M Operations / Year

Zama (license + ops + token)
$150K–$600K+
H33 (operations only)
$35K–$70K
You save: 64% minimum

The Engineering Behind the Speed

For engineers and cryptographers who want to understand why H33 is architecturally faster — not just benchmarked faster.

1
Full-RNS BFV Multiplication

Eliminates expensive RNS ↔ BigInt conversions using Bajard et al.'s scaling technique. All operations stay in residue number system representation. bajard_rns.rs (766 lines).
Result: 4× faster multiply

2
Montgomery NTT with Harvey Lazy Reduction

Radix-4 butterfly with twiddles in Montgomery form. Values stay in [0,2q) between stages — single final reduction. Pre-NTT pk0 at keygen. INTT post-processing fusion: precomputed fused_inv_mont (3 REDC → 2). ntt.rs (1,549 lines) + montgomery.rs (654 lines).
Result: 3.5× faster transforms

3
CKKS Bootstrapping + Hybrid Bridge

Full bootstrapping via Chebyshev polynomial approximation for unlimited computation depth. Hybrid bridge (ckks_bridge.rs, 613 lines) auto-switches between Turbo (N=4096) and Precision (N=16384) contexts. ckks.rs (2,786 lines).
Result: Unlimited multiplicative depth

4
SIMD Batching Architecture

4096 slots ÷ 128 biometric dimensions = 32 users per ciphertext. CRT condition: t=65537, N=4096 (ψ=6561). Template storage: 32MB/user → 256KB/user (128× reduction). Batch verify is constant time: ~1.04ms for 1–32 users.
Result: Amortized cost of ~50µs per auth

5
Threshold Decryption

Shamir secret sharing over BFV secret keys. k-of-n threshold — each party computes a partial decryption (noisy share). Lagrange interpolation mod q. SHA3-256 attestation hash per partial decryption. threshold.rs (1,006 lines).
Result: No single party ever sees the plaintext

6
Hardware Acceleration

ARM NEON: branchless Galois permutation, vectorized key-switch (galois_neon.rs, 384 lines). x86_64 AVX-512: 8 coefficients per instruction. CUDA: GPU NTT offload for batch workloads (1,610 lines). Metal: Apple Silicon compute shaders (542 lines). BFV32: native vmull_u32 4-lane SIMD for mobile.
Result: Platform-optimal performance everywhere

7
Production Optimizations

Arena allocators for zero allocation in hot path. Speculative execution engine for parallel parameter evaluation. Batch attestation: 1 Dilithium sign+verify per 32-user batch (31× savings). Batch CBD sampling: 1 RNG call per 10 coefficients (5× faster). Montgomery domain persistence across the full pipeline.
Result: Production-grade from day one

Deploy Encrypted Authentication Today

One API call. Full post-quantum security. Your data never decrypted.

View Per-Auth Pricing →
🛡 FIPS 203/204 Compliant
🔐 128-bit NIST Level 1
AWS Graviton4
1,743 Tests Passing
📜 33 Patent Claims
External Crypto Review →