H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 48, Latency: 5s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1407 µs     1414 µs     1418 µs
  Dilithium sign:      167 µs      454 µs      586 µs
  Dilithium verify:     74 µs       74 µs       75 µs
  ─────────────────────────
  Total (single):     1648 µs  (FHE 85% + sign 10% + verify 4%)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        247 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:              8 µs  (amortized)
  vs 32× individual: 31× faster

  --- Full Pipeline (32-user batch, batch attestation) ---
  FHE batch:          1407 µs
  Batch attest:        247 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ZKP (est):             8 µs  (single batch proof, not 32×)
  ─────────────────────────
  Total batch:        1662 µs  (52 µs/auth)
  FHE share:       85%
  Dilithium share: 15%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1572 µs
  P95:    1579 µs
  P99:    1584 µs
  Min:    1561 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1428 µs
  P95:    1435 µs
  P99:    1438 µs
  Min:    1418 µs

  Speedup: 1.1x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1407 µs total  (44.0 µs/auth)
  P95:    1424 µs
  P99:    1518 µs
  Single-thread throughput: 22743 auth/sec

======================================================================
=== WORKER SWEEP: Testing 5 worker counts ===
======================================================================

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + Dilithium) ===
=== 80 workers, 60 seconds ===
======================================================================
Pipeline per batch: FHE batch_verify_multi → 32× Dilithium sign+verify
Allocator: system
Setting up 80 worker contexts...
Setup: 157496.9ms
Warming up...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         33973       1087136       1087136           0.9
   2         33925       1085600       2172736           0.9
   3         33997       1087904       3260640           0.9
   4         33943       1086176       4346816           0.9
   5         33948       1086336       5433152           0.9
   6         34016       1088512       6521664           0.9
   7         33980       1087360       7609024           0.9
   8         34042       1089344       8698368           0.9
   9         33968       1086976       9785344           0.9
  10         34016       1088512      10873856           0.9
  11         33921       1085472      11959328           0.9
  12         34025       1088800      13048128           0.9
  13         33948       1086336      14134464           0.9
  14         33977       1087264      15221728           0.9
  15         33974       1087168      16308896           0.9
  16         33979       1087328      17396224           0.9
  17         33956       1086592      18482816           0.9
  18         34016       1088512      19571328           0.9
  19         33938       1086016      20657344           0.9
  20         33937       1085984      21743328           0.9
  21         33945       1086240      22829568           0.9
  22         33897       1084704      23914272           0.9
  23         34038       1089216      25003488           0.9
  24         33988       1087616      26091104           0.9
  25         33970       1087040      27178144           0.9
  26         33939       1086048      28264192           0.9
  27         33950       1086400      29350592           0.9
  28         33951       1086432      30437024           0.9
  29         33983       1087456      31524480           0.9
  30         34013       1088416      32612896           0.9
  31         33947       1086304      33699200           0.9
  32         33963       1086816      34786016           0.9
  33         33967       1086944      35872960           0.9
  34         33947       1086304      36959264           0.9
  35         33929       1085728      38044992           0.9
  36         33923       1085536      39130528           0.9
  37         33945       1086240      40216768           0.9
  38         34010       1088320      41305088           0.9
  39         33989       1087648      42392736           0.9
  40         33918       1085376      43478112           0.9
  41         33972       1087104      44565216           0.9
  42         33922       1085504      45650720           0.9
  43         33896       1084672      46735392           0.9
  44         33983       1087456      47822848           0.9
  45         33989       1087648      48910496           0.9
  46         33966       1086912      49997408           0.9
  47         33963       1086816      51084224           0.9
  48         33943       1086176      52170400           0.9
  49         33978       1087296      53257696           0.9
  50         33983       1087456      54345152           0.9
  51         33954       1086528      55431680           0.9
  52         33941       1086112      56517792           0.9
  53         34005       1088160      57605952           0.9
  54         33960       1086720      58692672           0.9
  55         33977       1087264      59779936           0.9
  56         33943       1086176      60866112           0.9
  57         33994       1087808      61953920           0.9
  58         33951       1086432      63040352           0.9
  59         33974       1087168      64127520           0.9
  60         33995       1087840      65215360           0.9

--- Sustained Throughput Summary (FHE + Batch Attestation) ---
Duration:         60.01s
Workers:          80
Batch ops:        2038122
Effective auths:  65219904 (32 users/batch)
Batch throughput: 33964 batch/sec
Auth throughput:  1086856 auth/sec  (FHE + SHA3 + 1 Dilithium sign+verify)
Per-auth latency: 0.9 µs

Pipeline: FHE batch_verify_multi → SHA3(32 results) → 1 Dilithium sign → 1 verify
ZKP Stark Lookup (est): single batch proof ~8µs (negligible vs FHE batch)

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + Dilithium) ===
=== 96 workers, 60 seconds ===
======================================================================
Pipeline per batch: FHE batch_verify_multi → 32× Dilithium sign+verify
Allocator: system
Setting up 96 worker contexts...
