H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 96, Latency: 10s/rate
Allocator: jemalloc
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1435 µs     1442 µs     1450 µs
  Dilithium sign:      167 µs      400 µs      632 µs
  Dilithium verify:     74 µs       74 µs       75 µs
  ─────────────────────────
  Total (single):     1676 µs  (FHE 86% + sign 10% + verify 4%)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        247 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:              8 µs  (amortized)
  vs 32× individual: 31× faster

  --- Full Pipeline (32-user batch, batch attestation) ---
  FHE batch:          1435 µs
  Batch attest:        247 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ZKP (est):             8 µs  (single batch proof, not 32×)
  ─────────────────────────
  Total batch:        1690 µs  (53 µs/auth)
  FHE share:       85%
  Dilithium share: 15%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1579 µs
  P95:    1587 µs
  P99:    1594 µs
  Min:    1570 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1435 µs
  P95:    1443 µs
  P99:    1451 µs
  Min:    1423 µs

  Speedup: 1.1x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1372 µs total  (42.9 µs/auth)
  P95:    1404 µs
  P99:    1448 µs
  Single-thread throughput: 23324 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + Dilithium) ===
=== 96 workers, 60 seconds ===
======================================================================
Pipeline per batch: FHE batch_verify_multi → 32× Dilithium sign+verify
Allocator: jemalloc
Setting up 96 worker contexts...
Setup: 188674.2ms
Warming up...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         35619       1139808       1139808           0.9
   2         37621       1203872       2343680           0.8
   3         34833       1114656       3458336           0.9
   4         36652       1172864       4631200           0.9
   5         36362       1163584       5794784           0.9
   6         35927       1149664       6944448           0.9
   7         35195       1126240       8070688           0.9
   8         37147       1188704       9259392           0.8
   9         36391       1164512      10423904           0.9
  10         37043       1185376      11609280           0.8
  11         38038       1217216      12826496           0.8
  12         37191       1190112      14016608           0.8
  13         37395       1196640      15213248           0.8
  14         37635       1204320      16417568           0.8
  15         37946       1214272      17631840           0.8
  16         37009       1184288      18816128           0.8
  17         36946       1182272      19998400           0.8
  18         37744       1207808      21206208           0.8
  19         37614       1203648      22409856           0.8
  20         37493       1199776      23609632           0.8
  21         37642       1204544      24814176           0.8
  22         37150       1188800      26002976           0.8
  23         37608       1203456      27206432           0.8
  24         37278       1192896      28399328           0.8
  25         37905       1212960      29612288           0.8
  26         38277       1224864      30837152           0.8
  27         37973       1215136      32052288           0.8
  28         37485       1199520      33251808           0.8
  29         37276       1192832      34444640           0.8
  30         38311       1225952      35670592           0.8
  31         37762       1208384      36878976           0.8
  32         38673       1237536      38116512           0.8
  33         36506       1168192      39284704           0.9
  34         36920       1181440      40466144           0.8
  35         37968       1214976      41681120           0.8
  36         36498       1167936      42849056           0.9
  37         36188       1158016      44007072           0.9
  38         35284       1129088      45136160           0.9
  39         36934       1181888      46318048           0.8
  40         36424       1165568      47483616           0.9
  41         37574       1202368      48685984           0.8
  42         36918       1181376      49867360           0.8
  43         36804       1177728      51045088           0.8
  44         36916       1181312      52226400           0.8
  45         37064       1186048      53412448           0.8
  46         35244       1127808      54540256           0.9
  47         36683       1173856      55714112           0.9
  48         37092       1186944      56901056           0.8
  49         36392       1164544      58065600           0.9
  50         36638       1172416      59238016           0.9
  51         37213       1190816      60428832           0.8
  52         36921       1181472      61610304           0.8
  53         36137       1156384      62766688           0.9
  54         37008       1184256      63950944           0.8
  55         37985       1215520      65166464           0.8
  56         37111       1187552      66354016           0.8
  57         36506       1168192      67522208           0.9
  58         36160       1157120      68679328           0.9
  59         36775       1176800      69856128           0.8

--- Sustained Throughput Summary (FHE + Batch Attestation) ---
Duration:         60.01s
Workers:          96
Batch ops:        2219858
Effective auths:  71035456 (32 users/batch)
Batch throughput: 36994 batch/sec
Auth throughput:  1183803 auth/sec  (FHE + SHA3 + 1 Dilithium sign+verify)
Per-auth latency: 0.8 µs

Pipeline: FHE batch_verify_multi → SHA3(32 results) → 1 Dilithium sign → 1 verify
ZKP Stark Lookup (est): single batch proof ~8µs (negligible vs FHE batch)

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.537       4.556       4.594       7.779
      1000         999       3.975       5.035       5.083       5.143
      5000        4997       4.021       5.118       5.132       5.298
Connection to 98.92.245.221 closed by remote host.
