H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 32, Latency: 15s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 32
ZKP cache: In-process DashMap (zero network)
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1077 µs     1083 µs     1087 µs
  ZKP raw:           3.622 µs    3.652 µs    3.667 µs
  ZKP cached:        0.054 µs    0.055 µs    0.056 µs  (in-process DashMap)
  Dilithium sign:      436 µs      441 µs      443 µs
  Dilithium verify:    108 µs      108 µs      112 µs
  ─────────────────────────
  Total (single):     1621 µs  (FHE 66% + ZKP 0.00% + sign 27% + verify 7%)
  Cachee speedup:  67× vs raw ZKP (3.622 µs → 0.054 µs)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        260 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:              8 µs  (amortized)
  vs 32× individual: 67× faster

  --- Full Pipeline (32-user batch, ZKP via Cachee) ---
  FHE batch:          1077 µs
  ZKP (32 Cachee):     1.7 µs  (0.054 µs/lookup)
  Batch attest:        260 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ─────────────────────────
  Total batch:        1339 µs  (42 µs/auth)
  FHE share:       80%
  ZKP share:       0.1%
  Dilithium share: 19%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1488 µs
  P95:    1496 µs
  P99:    1499 µs
  Min:    1480 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1090 µs
  P95:    1098 µs
  P99:    1101 µs
  Min:    1084 µs

  Speedup: 1.4x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1085 µs total  (33.9 µs/auth)
  P95:    1099 µs
  P99:    1121 µs
  Single-thread throughput: 29493 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + ZKP + Dilithium) ===
=== 32 workers, 60 seconds, cache: In-process DashMap (zero network) ===
======================================================================
Pipeline: FHE → ZKP cache (32 lookups) → SHA3 → Dilithium sign+verify
Allocator: system
ZKP cache: in-process DashMap (zero network overhead)
Setting up 32 worker contexts...
Setup: 60800.8ms
Warming up (populating ZKP cache)...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         16193        518176        518176           1.9
   2         16208        518656       1036832           1.9
   3         16237        519584       1556416           1.9
   4         16207        518624       2075040           1.9
   5         16184        517888       2592928           1.9
   6         16178        517696       3110624           1.9
   7         16176        517632       3628256           1.9
   8         16179        517728       4145984           1.9
   9         16177        517664       4663648           1.9
  10         16214        518848       5182496           1.9
  11         16202        518464       5700960           1.9
  12         16157        517024       6217984           1.9
  13         16164        517248       6735232           1.9
  14         16185        517920       7253152           1.9
  15         16206        518592       7771744           1.9
  16         16209        518688       8290432           1.9
  17         16252        520064       8810496           1.9
  18         16225        519200       9329696           1.9
  19         16199        518368       9848064           1.9
  20         16178        517696      10365760           1.9
  21         16181        517792      10883552           1.9
  22         16219        519008      11402560           1.9
  23         16196        518272      11920832           1.9
  24         16248        519936      12440768           1.9
  25         16251        520032      12960800           1.9
  26         16206        518592      13479392           1.9
  27         16250        520000      13999392           1.9
  28         16200        518400      14517792           1.9
  29         16227        519264      15037056           1.9
  30         16268        520576      15557632           1.9
  31         16251        520032      16077664           1.9
  32         16225        519200      16596864           1.9
  33         16285        521120      17117984           1.9
  34         16175        517600      17635584           1.9
  35         16213        518816      18154400           1.9
  36         16232        519424      18673824           1.9
  37         16189        518048      19191872           1.9
  38         16162        517184      19709056           1.9
  39         16202        518464      20227520           1.9
  40         16178        517696      20745216           1.9
  41         16207        518624      21263840           1.9
  42         16154        516928      21780768           1.9
  43         16109        515488      22296256           1.9
  44         16181        517792      22814048           1.9
  45         16167        517344      23331392           1.9
  46         16179        517728      23849120           1.9
  47         16177        517664      24366784           1.9
  48         16144        516608      24883392           1.9
  49         16169        517408      25400800           1.9
  50         16102        515264      25916064           1.9
  51         16192        518144      26434208           1.9
  52         16170        517440      26951648           1.9
  53         16184        517888      27469536           1.9
  54         16201        518432      27987968           1.9
  55         16221        519072      28507040           1.9
  56         16175        517600      29024640           1.9
  57         16236        519552      29544192           1.9
  58         16198        518336      30062528           1.9
  59         16159        517088      30579616           1.9

--- Sustained Throughput Summary (FHE + ZKP + Attestation) ---
Duration:         60.01s
Workers:          32
Cache mode:       In-process DashMap (zero network)
Batch ops:        971875
Effective auths:  31100000 (32 users/batch)
Batch throughput: 16196 batch/sec
Auth throughput:  518283 auth/sec  (FHE + ZKP + Dilithium)
Per-auth latency: 1.9 µs

ZKP cache stats:
  Cache hits:   31100000 (100.0%)
  Cache misses: 0
  DashMap entries: 1024

Pipeline: FHE → ZKP cache (In-process DashMap (zero network)) → SHA3 → Dilithium sign → verify

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.158       4.165       4.211       4.213
      1000        1000       3.259       4.339       4.345       4.364
      5000        4997       3.271       4.361       4.372       4.415
     10000        9991       2.239       3.381       4.359       4.446
     25000       24943       2.163       2.273       3.162       4.326
     50000       49760       1.174       2.251       3.012       3.300

--- Fallback Path (individual verify_encrypted) ---
  P50: 1487 µs
  P95: 1494 µs
  P99: 1504 µs

--- Accumulator Metrics ---
  Batches flushed:   51651
  Requests batched:  1366510
  Requests fallback: 0

======================================================================
=== BENCHMARK 4: CKKS Operations (Approximate Arithmetic) ===
======================================================================
  CKKS mode: turbo (N=8192, slots=4096)
bash: line 1: 1054799 Aborted                 (core dumped) SUSTAINED_SECS=60 WORKERS=32 LATENCY_SECS=15 RAYON_NUM_THREADS=32 CACHEE_MODE=inprocess cargo run --release --features parallel --example graviton4_bench 2> /dev/null
