H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 96, Latency: 15s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
Cachee RESP: redis://localhost:6380/
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1107 µs     1112 µs     1116 µs
  ZKP raw:           3.753 µs    3.787 µs    3.804 µs
  ZKP via Cachee:    1.018 µs    1.047 µs    1.124 µs  (L1 cache hit)
  Dilithium sign:      148 µs      412 µs      601 µs
  Dilithium verify:     74 µs       74 µs       75 µs
  ─────────────────────────
  Total (single):     1330 µs  (FHE 83% + ZKP 0.08% + sign 11% + verify 6%)
  Cachee speedup:  4× vs raw ZKP (3.753 µs → 1.018 µs)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        243 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:              8 µs  (amortized)
  vs 32× individual: 29× faster

  --- Full Pipeline (32-user batch, ZKP via Cachee) ---
  FHE batch:          1107 µs
  ZKP (32 Cachee):    32.6 µs  (1.018 µs/lookup)
  Batch attest:        243 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ─────────────────────────
  Total batch:        1383 µs  (43 µs/auth)
  FHE share:       80%
  ZKP share:       2.4%
  Dilithium share: 18%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1529 µs
  P95:    1535 µs
  P99:    1538 µs
  Min:    1521 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1130 µs
  P95:    1135 µs
  P99:    1143 µs
  Min:    1122 µs

  Speedup: 1.4x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1119 µs total  (35.0 µs/auth)
  P95:    1127 µs
  P99:    1133 µs
  Single-thread throughput: 28597 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + Cachee ZKP + Dilithium) ===
=== 96 workers, 60 seconds ===
======================================================================
Pipeline: FHE → Cachee ZKP (32 lookups) → SHA3 → Dilithium sign+verify
Allocator: system
Cachee RESP: redis://localhost:6380/ → PONG
Setting up 96 worker contexts...
