H33 Graviton4 Benchmark Suite
=============================
Sustained: 30s, Workers: 48, Latency: 10s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
ZKP cache: In-process DashMap (zero network)
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:          945 µs      953 µs      957 µs
  ZKP raw:           3.620 µs    3.656 µs    3.671 µs
  ZKP cached:        0.054 µs    0.054 µs    0.056 µs  (in-process DashMap)
  Dilithium sign:       71 µs       74 µs       75 µs
  Dilithium verify:    108 µs      108 µs      113 µs
  ─────────────────────────
  Total (single):     1124 µs  (FHE 84% + ZKP 0.00% + sign 6% + verify 10%)
  Cachee speedup:  67× vs raw ZKP (3.620 µs → 0.054 µs)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        394 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:             12 µs  (amortized)
  vs 32× individual: 15× faster

  --- Full Pipeline (32-user batch, ZKP via Cachee) ---
  FHE batch:           945 µs
  ZKP (32 Cachee):     1.7 µs  (0.054 µs/lookup)
  Batch attest:        394 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ─────────────────────────
  Total batch:        1341 µs  (42 µs/auth)
  FHE share:       70%
  ZKP share:       0.1%
  Dilithium share: 29%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1439 µs
  P95:    1447 µs
  P99:    1453 µs
  Min:    1431 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 945 µs
  P95:    952 µs
  P99:    956 µs
  Min:    937 µs

  Speedup: 1.5x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 934 µs total  (29.2 µs/auth)
  P95:    943 µs
  P99:    949 µs
  Single-thread throughput: 34261 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + ZKP + Dilithium) ===
=== 48 workers, 30 seconds, cache: In-process DashMap (zero network) ===
======================================================================
Pipeline: FHE → ZKP cache (32 lookups) → SHA3 → Dilithium sign+verify
Allocator: system
ZKP cache: in-process DashMap (zero network overhead)
Setting up 48 worker contexts...
Setup: 91203.4ms
Warming up (populating ZKP cache)...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         31515       1008480       1008480           1.0
   2         31870       1019840       2028320           1.0
   3         31846       1019072       3047392           1.0
   4         31936       1021952       4069344           1.0
   5         31814       1018048       5087392           1.0
   6         31892       1020544       6107936           1.0
   7         31759       1016288       7124224           1.0
   8         31836       1018752       8142976           1.0
   9         31895       1020640       9163616           1.0
  10         31857       1019424      10183040           1.0
  11         32040       1025280      11208320           1.0
  12         31796       1017472      12225792           1.0
  13         31729       1015328      13241120           1.0
  14         31765       1016480      14257600           1.0
  15         31728       1015296      15272896           1.0
  16         31848       1019136      16292032           1.0
  17         31754       1016128      17308160           1.0
  18         31759       1016288      18324448           1.0
  19         31873       1019936      19344384           1.0
  20         31743       1015776      20360160           1.0
  21         31798       1017536      21377696           1.0
  22         31687       1013984      22391680           1.0
  23         31691       1014112      23405792           1.0
  24         31775       1016800      24422592           1.0
  25         31653       1012896      25435488           1.0
  26         31778       1016896      26452384           1.0
  27         31835       1018720      27471104           1.0
  28         31888       1020416      28491520           1.0
  29         31666       1013312      29504832           1.0
  30         31800       1017600      30522432           1.0

--- Sustained Throughput Summary (FHE + ZKP + Attestation) ---
Duration:         30.01s
Workers:          48
Cache mode:       In-process DashMap (zero network)
Batch ops:        954030
Effective auths:  30528960 (32 users/batch)
Batch throughput: 31791 batch/sec
Auth throughput:  1017316 auth/sec  (FHE + ZKP + Dilithium)
Per-auth latency: 1.0 µs

ZKP cache stats:
  Cache hits:   30528960 (100.0%)
  Cache misses: 0
  DashMap entries: 1536

Pipeline: FHE → ZKP cache (In-process DashMap (zero network)) → SHA3 → Dilithium sign → verify

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.016       4.023       4.069       4.073
      1000        1000       3.086       4.093       4.130       4.260
      5000        4998       3.111       4.100       4.110       4.221
     10000        9990       2.094       4.095       4.139       4.241
     25000       24944       1.975       2.120       3.108       4.196
     50000       49780       1.033       2.117       2.972       4.018

--- Fallback Path (individual verify_encrypted) ---
  P50: 1437 µs
  P95: 1446 µs
  P99: 1449 µs

--- Accumulator Metrics ---
  Batches flushed:   33902
  Requests batched:  911010
  Requests fallback: 0

======================================================================
=== BENCHMARK 4: CKKS Operations (Approximate Arithmetic) ===
======================================================================
  CKKS mode: turbo (N=8192, slots=4096)

  CKKS Latency Table (N=8192, 4096 slots)
  -------------------------------------------------------
  Operation                           Latency   per slot
  -------------------------------------------------------
  Encode + Encrypt                 11129.3 µs   2.717 µs
  Addition                           201.3 µs   0.049 µs
  Multiply + Relin                 12798.9 µs   3.125 µs
  Rescale                          13723.6 µs   3.350 µs
  Rotation (step=1)                 4023.7 µs   0.982 µs
  Dot Product (4 terms)            22318.3 µs   5.449 µs
  Bootstrap                       252253.3 µs  61.585 µs
  CKKS → BFV switch                 1205.3 µs   0.294 µs
  -------------------------------------------------------

======================================================================
=== BENCHMARK COMPLETE ===
======================================================================

Full lifecycle: FHE → ZKP (In-process DashMap (zero network)) → Dilithium attestation
CKKS: encode/encrypt/add/multiply/rescale/rotate/bootstrap/scheme-switch
