Benchmark Report v10.0

H33 Full Pipeline Benchmark
FHE + Real STARK Proofs + Dilithium + ML

Post-quantum biometric authentication pipeline with fully homomorphic encryption, real STARK zero-knowledge proofs with 7-column AIR, Dilithium lattice signatures, and native Rust ML threat agents. Single API call. No decryption.

Date: March 10, 2026
Hardware: AWS Graviton4 c8g.metal-48xl
Spec: 192 vCPUs, 377 GiB RAM
Arch: AArch64 (ARM Neoverse V2)
OS: Amazon Linux 2023
Sustained Auth/Sec
2,172,518
120-second window, 96 workers
Per-Auth Latency
38.5 µs
Full pipeline: FHE + STARK + Dilithium + ML
Variance
±0.71%
Collapsed from ±6% (v9.0)
Batch Latency
1,232 µs
32 users per ciphertext batch
01

Pipeline Breakdown

Each authentication passes through four stages in a single API call. All operations are post-quantum secure. Biometric data never leaves FHE encryption.

Component Latency % of Pipeline PQ-Secure Notes
FHE Batch BFV 939 µs 76.2% Yes (lattice) Montgomery NTT, Harvey lazy reduction, 32 users/CT
Dilithium Attestation ML-DSA 291 µs 23.6% Yes (ML-DSA) 1 sign+verify per 32-user batch
STARK ZKP Cached 0.059 µs <0.01% Yes (SHA3-256) DashMap in-process, 7-column AIR
ML Threat Agents 3 agents ~2.35 µs 0.19% N/A Harvest + side-channel + crypto health
Total Pipeline 1,232 µs 100% Full PQ 38.5 µs per auth (32 users/batch)
Pipeline composition: FHE inner product on encrypted biometric templates (BFV, N=4096) → STARK proof lookup (cached DashMap) → Dilithium batch attestation (1 signature per 32 users) → ML threat classification. All stages are fused in a single Rayon task per batch.
02

STARK Proof Performance

Real STARK zero-knowledge proofs with a 7-column algebraic intermediate representation (AIR). Cold proofs are generated once per enrollment; production auth hits the DashMap cache at sub-microsecond latency.

Metric Value
Generate (cold) 68.093052 ms
Verify (cold) 14.366931 ms
Cache cold 14.400565 ms
Cache hot (DashMap) 1.159 µs
Production lookup 0.059 µs
AIR Configuration Detail
Columns 7
Column names enrolled, fresh, dot_acc, norm_a, norm_b, poseidon, step
Transition constraints 5 per row
Public inputs 7
Hash function SHA3-256
Cache strategy: STARK proofs are generated at enrollment time (68ms cold, amortized over account lifetime). During auth, proofs are retrieved from an in-process DashMap at 0.059µs. This eliminates the TCP serialization bottleneck that caused an 11x regression when using external RESP cache at 96 workers.
03

Throughput Results

Sustained throughput measured over a 120-second window with per-second sampling. Peak and low values represent single-second snapshots within the window.

Window Auth/Sec Batches/Sec Batch Latency Variance
Peak (1s) 2,190,496 ~1,778 1,228 µs
Sustained (120s) 2,172,518 ~1,763 1,232 µs ±0.71%
Low (1s) 2,159,776 ~1,753 1,236 µs
Spread 30,720 ~25 8 µs

Throughput Notes

  • 96 Rayon workers across 192 vCPUs (2 HW threads/core)
  • System allocator — jemalloc causes 8% regression on Graviton4
  • NUMA-aware thread pinning for cache locality
  • Variance collapsed from ±6% (v9) to ±0.71% (v10) via thermal management

Scaling Context

  • 2.17M auth/sec = 187.5 billion auth/day on a single node
  • Each auth includes full FHE + ZKP + PQ signature + ML
  • Cost: ~$1.80–$2.30/hr (spot pricing)
  • Per-auth cost: < $0.000001 at sustained throughput
04

ML / AI Agent Performance

Three native Rust AI agents run inline on every authentication for real-time threat intelligence. Zero external dependencies — no Python, no ONNX, no GPU. Pure Rust compiled to ARM.

Agent Function Latency Method Source
Harvest Detection Threat Detects harvest-now-decrypt-later attack patterns 0.69 µs Bayesian classifier ai_harvest.rs
Side-Channel Analysis Monitor Detects timing and power analysis attack vectors 1.14 µs Statistical anomaly detection ai_sidechannel.rs
Crypto Health Monitor Health Runtime health scoring of FHE/ZK/PQ parameters 0.52 µs Parameter drift detection ai_crypto_health.rs
Total ML Overhead All 3 agents combined ~2.35 µs Included in pipeline total (0.19% of batch)

Harvest Detection

  • Monitors for store-now-decrypt-later (SNDL) patterns
  • Bayesian prior updated from traffic distribution
  • Flags suspicious bulk-capture behavior in real time
  • Outputs threat score [0.0, 1.0] per batch

Side-Channel Analysis

  • Statistical detection of timing correlations
  • Monitors variance in FHE operation durations
  • Flags potential power/EM side-channel probes
  • Constant-time verification of critical paths

Crypto Health Monitor

  • Validates FHE noise budget hasn't drifted
  • Checks ZKP constraint satisfaction rates
  • Monitors Dilithium key freshness and rotation schedule
  • Composite health score per authentication
05

FHE Engine Comparison

H33 ships four FHE engine configurations, each optimized for different security levels and use cases. All engines share the same Montgomery NTT core with Harvey lazy reduction.

Engine Scheme Parameters Security Use Case Cycle Time
H33-128 Production BFV-64 N=4096, Q=56-bit, t=65537 NIST L1 Biometric auth, high throughput ~1.36 ms
H33-256 BFV-64 N=8192, multi-Q RNS NIST L5 High-security, government ~5.98 ms
H33-CKKS CKKS N=4096, approximate NIST L1 ML inference, float ops ~2.1 ms
H33-BFV32 BFV-32 N=4096, 32-bit modulus NIST L1 ARM mobile / edge devices ~0.7 ms (ARM)
SIMD batching (all engines): 4096 polynomial slots ÷ 128 biometric dimensions = 32 users per ciphertext. CRT condition: t ≡ 1 (mod 2N) → t=65537, N=4096 (ψ=6561). Template storage reduction: ~32MB/user → ~256KB/user (128x).
06

Microsoft SEAL Comparison

Performance comparison against Microsoft SEAL, the most widely adopted open-source FHE library. SEAL provides FHE only; H33 includes the full post-quantum pipeline (FHE + STARK ZKP + Dilithium signatures + ML agents).

Metric H33 Microsoft SEAL Ratio
Single-thread FHE cycle 1.36 ms 2.85 ms 2.3x faster
Sustained throughput (96 cores) 2,172,518/sec ~92,000*/sec 23.6x
Pipeline scope FHE + STARK + Dilithium + ML FHE only
Post-quantum ZKP STARK (SHA3) None
Post-quantum signatures Dilithium (ML-DSA) None
Threat intelligence 3 ML agents None
SIMD batch auth 32 users/CT None
Caveat: *SEAL throughput is extrapolated from single-thread benchmarks; Microsoft does not publish a production batch-authentication benchmark. The 23.6x ratio reflects H33's full-stack optimization including Montgomery NTT, Harvey lazy reduction, NEON Galois ops, NTT-domain persistence, and in-process proof caching — none of which apply to SEAL's general-purpose design.
07

Product Suite Overview

18 products spanning identity, encryption, key management, data protection, and AI. All products share the same post-quantum cryptographic core and are accessible via a unified credit-based API.

Product Category Primary Crypto Unit Cost Description
H33-Vault Storage AES-256-GCM + Kyber 1 credit Encrypted vault for secrets, keys, and sensitive data
H33-Share MPC Shamir + Dilithium 3 credits Threshold secret sharing with PQ attestation
H33-Shield Identity BFV + STARK + Dilithium 5 credits Full biometric auth pipeline (single API call)
H33-Key KMS Kyber + X25519 1 credit Hybrid PQ key exchange and management
H33-Gateway Network Kyber TLS + Dilithium 2 credits PQ-secure API gateway with mutual auth
H33-Health Identity BFV + CKKS 5 credits HIPAA-grade encrypted health data processing
H33-128 FHE BFV N=4096 1 credit NIST L1 FHE compute (production engine)
H33-256 FHE BFV N=8192 3 credits NIST L5 FHE compute (high-security)
H33-CKKS FHE CKKS N=4096 2 credits Approximate FHE for ML inference on encrypted data
H33-BFV32 FHE BFV 32-bit 1 credit Lightweight FHE for ARM mobile and edge
H33-MPC MPC Garbled circuits + OT 5 credits Multi-party computation for joint analytics
H33-3-Key KMS Triple-wrap Kyber 3 credits 3-layer key wrapping for sovereign data
Biometrics Identity BFV + STARK 5 credits Encrypted biometric enrollment and matching
Encrypted Search FHE BFV + PIR 3 credits Private information retrieval on encrypted indexes
PQ Video Network Kyber + AES-256-GCM 2 credits/min Post-quantum encrypted video streaming
Storage Encryption Storage Kyber + AES-256-GCM 1 credit/GB At-rest PQ encryption for cloud storage
AI Detection AI Rust ML agents 1 credit Harvest/side-channel/health threat detection
FHE-IQ AI CKKS + ML 5 credits ML inference on fully encrypted data
08

Integration & API

Unified REST API with SDK support for four languages. Credit-based billing with per-call metering.

Core Endpoints

# Biometric auth (single API call) POST /api/v1/auth/verify # Enrollment POST /api/v1/auth/enroll # FHE compute POST /api/v1/fhe/encrypt POST /api/v1/fhe/compute POST /api/v1/fhe/decrypt # Key management POST /api/v1/keys/generate POST /api/v1/keys/exchange # Vault POST /api/v1/vault/store GET /api/v1/vault/retrieve # Health & billing GET /api/v1/health GET /api/v1/billing/balance

Authentication

# API key in header Authorization: Bearer h33_pk_... # Example: verify biometric curl -X POST https://api.h33.ai/api/v1/auth/verify \ -H "Authorization: Bearer h33_pk_..." \ -H "Content-Type: application/json" \ -d '{"template": "base64...", "user_id": "..."}'

SDK Languages

Rust (native) Python JavaScript / TypeScript Go

Rate Limits by Tier

TierRate Limit
Free5 req/sec
Starter100 req/sec
Pro1,000 req/sec
EnterpriseUnlimited
Webhook support: Async operations (large FHE computations, batch enrollments) deliver results via configurable webhooks. Configure at /api/v1/webhooks.
09

Test Suite

Comprehensive test coverage across all cryptographic primitives, with deterministic unit tests, randomized property testing, and integration tests for the full pipeline.

2,227
Deterministic tests passing
300,000+
Proptest random inputs
5
Test categories
100%
Critical path coverage
Category Count Scope Notes
Unit 1,247 Individual function correctness NTT, BFV ops, Galois, modular arithmetic
Integration 389 Cross-module pipelines FHE → ZKP → Dilithium → ML full flow
Fuzz (proptest) 300,000+ Random input generation Polynomial arithmetic, encrypt/decrypt round-trip
Benchmark 142 Performance regression detection Criterion + custom Graviton4 harness
Cross-module 150 Module boundary correctness Domain form transitions, CRT combine, batch verify
10

Version History

Throughput progression from v7.0 baseline to v10.0 with real STARK proofs and variance collapse.

Version Date Key Change Sustained Auth/Sec Improvement
v7.0 Feb 14, 2026 Baseline Montgomery NTT + Harvey lazy reduction 1,291,207
v8.0 Feb 26, 2026 NTT-form multiply_plain, skip INTT per call 1,595,071 +23.5%
v9.0 Mar 5, 2026 In-process DashMap cache, batch attestation 1,714,496 +7.5%
v10.0 Mar 9, 2026 Real STARK proofs + variance collapse (±0.71%) 2,172,518 +26.7%
Cumulative improvement: v7.0 → v10.0 represents a 68.3% throughput increase (1,291,207 → 2,172,518 auth/sec) while adding real STARK proofs, 3 ML agents, and collapsing variance from ±6% to ±0.71%. Per-auth latency dropped from ~55µs to 38.5µs.
1.29M
v7.0
1.60M
v8.0
1.71M
v9.0
2.17M
v10.0
Sustained throughput progression
Graviton4 c8g.metal-48xl, 96 workers, 120s window
+68.3% cumulative improvement
11

Methodology & Reproduction

All benchmarks are reproducible on identical hardware. Thermal management is critical for sustained results on bare-metal Graviton4.

Configuration

  • Workers: 96 Rayon threads
  • Allocator: System (glibc), NOT jemalloc
  • FHE mode: biometric_fast() — N=4096, Q=56-bit, t=65537
  • Batch size: 32 users per ciphertext
  • ZKP cache: In-process DashMap (CACHEE_MODE=inprocess)
  • NTT: Montgomery radix-4 with Harvey lazy reduction
  • NEON: Galois rotation + key-switching

Measurement

  • Sustained window: 120 seconds
  • Sampling: Per-second throughput counters
  • Warm-up: 10 seconds excluded from results
  • Thermal: CPU freq pinning, NUMA-aware thread placement
  • Variance: Reported as (max-min)/mean over 120s
  • Instance: c8g.metal-48xl (192 vCPU, 377 GiB)

Reproduction

# 1. Deploy to Graviton4 ./scripts/graviton4_deploy.sh # 2. Run benchmark (120s sustained, 96 workers) CACHEE_MODE=inprocess RAYON_NUM_THREADS=96 \ cargo run --release --example graviton4_bench # 3. Expected output # Sustained (120s): 2,172,518 auth/sec ±0.71% # Batch latency: 1,232µs (32 users) # Per-auth: 38.5µs
Known anti-patterns (do not retry): jemalloc on Graviton4 (8% regression), fused NTT twiddles (L1 cache pollution at 96 workers), TCP RESP proxy at high concurrency (11x regression), NEON 64-bit butterfly (no native vmull_u64), arena pooling for Vec<Vec<u64>> (zeroing overhead).