H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 96, Latency: 10s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1405 µs     1411 µs     1414 µs
  Dilithium sign:      169 µs      401 µs      614 µs
  Dilithium verify:     74 µs       74 µs       75 µs
  ─────────────────────────
  Total (single):     1648 µs  (FHE 85% + sign 10% + verify 4%)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        247 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:              8 µs  (amortized)
  vs 32× individual: 31× faster

  --- Full Pipeline (32-user batch, batch attestation) ---
  FHE batch:          1405 µs
  Batch attest:        247 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ZKP (est):             8 µs  (single batch proof, not 32×)
  ─────────────────────────
  Total batch:        1660 µs  (52 µs/auth)
  FHE share:       85%
  Dilithium share: 15%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1565 µs
  P95:    1575 µs
  P99:    1687 µs
  Min:    1549 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1421 µs
  P95:    1429 µs
  P99:    1435 µs
  Min:    1410 µs

  Speedup: 1.1x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1399 µs total  (43.7 µs/auth)
  P95:    1413 µs
  P99:    1416 µs
  Single-thread throughput: 22873 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + Dilithium) ===
=== 96 workers, 60 seconds ===
======================================================================
Pipeline per batch: FHE batch_verify_multi → 32× Dilithium sign+verify
Allocator: system
Setting up 96 worker contexts...
Setup: 189032.8ms
Warming up...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         37615       1203680       1203680           0.8
   2         37877       1212064       2415744           0.8
   3         37874       1211968       3627712           0.8
   4         37898       1212736       4840448           0.8
   5         37834       1210688       6051136           0.8
   6         37846       1211072       7262208           0.8
   7         37983       1215456       8477664           0.8
   8         37857       1211424       9689088           0.8
   9         37758       1208256      10897344           0.8
  10         37900       1212800      12110144           0.8
  11         37747       1207904      13318048           0.8
  12         37881       1212192      14530240           0.8
  13         37813       1210016      15740256           0.8
  14         37862       1211584      16951840           0.8
  15         37840       1210880      18162720           0.8
  16         37814       1210048      19372768           0.8
  17         37846       1211072      20583840           0.8
  18         37690       1206080      21789920           0.8
  19         37795       1209440      22999360           0.8
  20         37711       1206752      24206112           0.8
  21         37839       1210848      25416960           0.8
  22         37810       1209920      26626880           0.8
  23         37797       1209504      27836384           0.8
  24         37817       1210144      29046528           0.8
  25         37806       1209792      30256320           0.8
  26         37803       1209696      31466016           0.8
  27         37808       1209856      32675872           0.8
  28         37784       1209088      33884960           0.8
  29         37829       1210528      35095488           0.8
  30         37823       1210336      36305824           0.8
  31         37901       1212832      37518656           0.8
  32         37870       1211840      38730496           0.8
  33         37940       1214080      39944576           0.8
  34         37798       1209536      41154112           0.8
  35         37851       1211232      42365344           0.8
  36         37826       1210432      43575776           0.8
  37         37916       1213312      44789088           0.8
  38         37810       1209920      45999008           0.8
  39         37898       1212736      47211744           0.8
  40         37964       1214848      48426592           0.8
  41         37820       1210240      49636832           0.8
  42         37902       1212864      50849696           0.8
  43         37861       1211552      52061248           0.8
  44         37907       1213024      53274272           0.8
  45         38005       1216160      54490432           0.8
  46         37799       1209568      55700000           0.8
  47         37868       1211776      56911776           0.8
  48         37841       1210912      58122688           0.8
  49         37904       1212928      59335616           0.8
  50         37746       1207872      60543488           0.8
  51         37844       1211008      61754496           0.8
  52         37853       1211296      62965792           0.8
  53         37851       1211232      64177024           0.8
  54         37946       1214272      65391296           0.8
  55         37758       1208256      66599552           0.8
  56         37743       1207776      67807328           0.8
  57         37945       1214240      69021568           0.8
  58         37848       1211136      70232704           0.8
  59         37780       1208960      71441664           0.8
  60         37923       1213536      72655200           0.8

--- Sustained Throughput Summary (FHE + Batch Attestation) ---
Duration:         60.01s
Workers:          96
Batch ops:        2270785
Effective auths:  72665120 (32 users/batch)
Batch throughput: 37838 batch/sec
Auth throughput:  1210820 auth/sec  (FHE + SHA3 + 1 Dilithium sign+verify)
Per-auth latency: 0.8 µs

Pipeline: FHE batch_verify_multi → SHA3(32 results) → 1 Dilithium sign → 1 verify
ZKP Stark Lookup (est): single batch proof ~8µs (negligible vs FHE batch)

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.495       4.504       4.548       4.557
      1000        1000       3.929       4.989       5.004       5.214
      5000        4997       3.937       5.034       5.046       5.252
     10000        9990       2.926       4.005       4.020       5.110
     25000       24936       1.877       2.914       3.455       4.582
     50000       49768       1.514       2.792       3.321       4.529

--- Fallback Path (individual verify_encrypted) ---
  P50: 1578 µs
  P95: 1587 µs
  P99: 1596 µs

--- Accumulator Metrics ---
  Batches flushed:   34389
  Requests batched:  911010
  Requests fallback: 0

======================================================================
=== BENCHMARK COMPLETE ===
======================================================================

H33 ZKP Stark Lookup addendum (from prior measurement, not re-measured here):
  Single batch proof: ~8 µs (one proof attesting batch computation)
  Negligible vs FHE batch time — does NOT scale 32x.
