H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 32, Latency: 15s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 32
ZKP cache: In-process DashMap (zero network)
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:         1079 µs     1085 µs     1090 µs
  ZKP raw:           3.622 µs    3.654 µs    3.679 µs
  ZKP cached:        0.056 µs    0.057 µs    0.058 µs  (in-process DashMap)
  Dilithium sign:      150 µs      152 µs      154 µs
  Dilithium verify:    108 µs      109 µs      112 µs
  ─────────────────────────
  Total (single):     1337 µs  (FHE 81% + ZKP 0.00% + sign 11% + verify 8%)
  Cachee speedup:  64× vs raw ZKP (3.622 µs → 0.056 µs)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        498 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:             16 µs  (amortized)
  vs 32× individual: 17× faster

  --- Full Pipeline (32-user batch, ZKP via Cachee) ---
  FHE batch:          1079 µs
  ZKP (32 Cachee):     1.8 µs  (0.056 µs/lookup)
  Batch attest:        498 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ─────────────────────────
  Total batch:        1579 µs  (49 µs/auth)
  FHE share:       68%
  ZKP share:       0.1%
  Dilithium share: 32%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1489 µs
  P95:    1496 µs
  P99:    1500 µs
  Min:    1481 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1093 µs
  P95:    1101 µs
  P99:    1112 µs
  Min:    1087 µs

  Speedup: 1.4x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1083 µs total  (33.8 µs/auth)
  P95:    1091 µs
  P99:    1098 µs
  Single-thread throughput: 29548 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + ZKP + Dilithium) ===
=== 32 workers, 60 seconds, cache: In-process DashMap (zero network) ===
======================================================================
Pipeline: FHE → ZKP cache (32 lookups) → SHA3 → Dilithium sign+verify
Allocator: system
ZKP cache: in-process DashMap (zero network overhead)
Setting up 32 worker contexts...
Setup: 60803.2ms
Warming up (populating ZKP cache)...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         16411        525152        525152           1.9
   2         16408        525056       1050208           1.9
   3         16488        527616       1577824           1.9
   4         16511        528352       2106176           1.9
   5         16504        528128       2634304           1.9
   6         16551        529632       3163936           1.9
   7         16505        528160       3692096           1.9
   8         16503        528096       4220192           1.9
   9         16477        527264       4747456           1.9
  10         16485        527520       5274976           1.9
  11         16472        527104       5802080           1.9
  12         16418        525376       6327456           1.9
  13         16456        526592       6854048           1.9
  14         16470        527040       7381088           1.9
  15         16443        526176       7907264           1.9
  16         16475        527200       8434464           1.9
  17         16435        525920       8960384           1.9
  18         16394        524608       9484992           1.9
  19         16438        526016      10011008           1.9
  20         16475        527200      10538208           1.9
  21         16458        526656      11064864           1.9
  22         16523        528736      11593600           1.9
  23         16485        527520      12121120           1.9
  24         16451        526432      12647552           1.9
  25         16445        526240      13173792           1.9
  26         16477        527264      13701056           1.9
  27         16459        526688      14227744           1.9
  28         16483        527456      14755200           1.9
  29         16453        526496      15281696           1.9
  30         16473        527136      15808832           1.9
  31         16449        526368      16335200           1.9
  32         16451        526432      16861632           1.9
  33         16417        525344      17386976           1.9
  34         16469        527008      17913984           1.9
  35         16414        525248      18439232           1.9
  36         16465        526880      18966112           1.9
  37         16403        524896      19491008           1.9
  38         16439        526048      20017056           1.9
  39         16433        525856      20542912           1.9
  40         16448        526336      21069248           1.9
  41         16420        525440      21594688           1.9
  42         16476        527232      22121920           1.9
  43         16500        528000      22649920           1.9
  44         16436        525952      23175872           1.9
  45         16498        527936      23703808           1.9
  46         16446        526272      24230080           1.9
  47         16492        527744      24757824           1.9
  48         16447        526304      25284128           1.9
  49         16502        528064      25812192           1.9
  50         16473        527136      26339328           1.9
  51         16450        526400      26865728           1.9
  52         16431        525792      27391520           1.9
  53         16466        526912      27918432           1.9
  54         16476        527232      28445664           1.9
  55         16467        526944      28972608           1.9
  56         16481        527392      29500000           1.9
  57         16482        527424      30027424           1.9
  58         16450        526400      30553824           1.9
  59         16432        525824      31079648           1.9

--- Sustained Throughput Summary (FHE + ZKP + Attestation) ---
Duration:         60.00s
Workers:          32
Cache mode:       In-process DashMap (zero network)
Batch ops:        987683
Effective auths:  31605856 (32 users/batch)
Batch throughput: 16460 batch/sec
Auth throughput:  526723 auth/sec  (FHE + ZKP + Dilithium)
Per-auth latency: 1.9 µs

ZKP cache stats:
  Cache hits:   31605856 (100.0%)
  Cache misses: 0
  DashMap entries: 1024

Pipeline: FHE → ZKP cache (In-process DashMap (zero network)) → SHA3 → Dilithium sign → verify

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.160       4.168       4.216       4.219
      1000        1000       3.260       4.339       4.345       4.380
      5000        4998       3.268       4.356       4.367       4.478
     10000        9991       2.240       3.386       4.359       4.431
     25000       24944       2.161       2.273       3.161       4.337
     50000       49757       1.177       2.254       3.017       4.320

--- Fallback Path (individual verify_encrypted) ---
  P50: 1488 µs
  P95: 1496 µs
  P99: 1517 µs

--- Accumulator Metrics ---
  Batches flushed:   51676
  Requests batched:  1366510
  Requests fallback: 0

======================================================================
=== BENCHMARK 4: CKKS Operations (Approximate Arithmetic) ===
======================================================================
  CKKS mode: turbo (N=8192, slots=4096)
bash: line 1: 1053169 Aborted                 (core dumped) SUSTAINED_SECS=60 WORKERS=32 LATENCY_SECS=15 RAYON_NUM_THREADS=32 CACHEE_MODE=inprocess cargo run --release --features parallel --example graviton4_bench 2> /dev/null
