H33 Graviton4 Benchmark Suite
=============================
Sustained: 30s, Workers: 96, Latency: 10s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96
ZKP cache: In-process DashMap (zero network)
FHE mode: biometric_fast (N=4096, Q_bits=[56], t=65537)
NTT path: fused pre-twist (1 REDC) + fused inverse post-twist (1 REDC)
  Forward: twiddles_fused[i] = psi^i * R^2 mod q (saves 4096 REDCs/NTT)
  Inverse: fused_inv_post[i] = n^-1 * psi^-i mod q (saves 4096 REDCs/NTT)

======================================================================
=== BENCHMARK 0: Component Latencies (isolated) ===
======================================================================

  Component          Median       P95         P99
  ---------          ------       ---         ---
  FHE verify:          958 µs      963 µs      966 µs
  ZKP raw:           3.734 µs    3.762 µs    3.777 µs
  ZKP cached:        0.060 µs    0.061 µs    0.061 µs  (in-process DashMap)
  Dilithium sign:      275 µs      275 µs      277 µs
  Dilithium verify:    112 µs      112 µs      113 µs
  ─────────────────────────
  Total (single):     1345 µs  (FHE 71% + ZKP 0.00% + sign 20% + verify 8%)
  Cachee speedup:  62× vs raw ZKP (3.734 µs → 0.060 µs)

  --- Batch Attestation (SHA3 digest + 1 sign + 1 verify) ---
  Batch attest:        732 µs  (SHA3 + 1 sign + 1 verify for 32 results)
  Per auth:             23 µs  (amortized)
  vs 32× individual: 17× faster

  --- Full Pipeline (32-user batch, ZKP via Cachee) ---
  FHE batch:           958 µs
  ZKP (32 Cachee):     1.9 µs  (0.060 µs/lookup)
  Batch attest:        732 µs  (SHA3 + 1 Dilithium sign + 1 verify)
  ─────────────────────────
  Total batch:        1692 µs  (53 µs/auth)
  FHE share:       57%
  ZKP share:       0.1%
  Dilithium share: 43%

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1472 µs
  P95:    1478 µs
  P99:    1487 µs
  Min:    1464 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 958 µs
  P95:    964 µs
  P99:    967 µs
  Min:    952 µs

  Speedup: 1.5x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 949 µs total  (29.7 µs/auth)
  P95:    956 µs
  P99:    960 µs
  Single-thread throughput: 33720 auth/sec

======================================================================
=== BENCHMARK 2: Sustained Throughput (FHE + ZKP + Dilithium) ===
=== 96 workers, 30 seconds, cache: In-process DashMap (zero network) ===
======================================================================
Pipeline: FHE → ZKP cache (32 lookups) → SHA3 → Dilithium sign+verify
Allocator: system
ZKP cache: in-process DashMap (zero network overhead)
Setting up 96 worker contexts...
Setup: 188995.8ms
Warming up (populating ZKP cache)...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         62087       1986784       1986784           0.5
   2         68792       2201344       4188128           0.5
   3         68726       2199232       6387360           0.5
   4         68930       2205760       8593120           0.5
   5         68635       2196320      10789440           0.5
   6         68590       2194880      12984320           0.5
   7         68525       2192800      15177120           0.5
   8         68366       2187712      17364832           0.5
   9         68716       2198912      19563744           0.5
  10         68686       2197952      21761696           0.5
  11         68768       2200576      23962272           0.5
  12         68609       2195488      26157760           0.5
  13         68396       2188672      28346432           0.5
  14         68177       2181664      30528096           0.5
  15         68049       2177568      32705664           0.5
  16         67909       2173088      34878752           0.5
  17         67845       2171040      37049792           0.5
  18         68026       2176832      39226624           0.5
  19         68077       2178464      41405088           0.5
  20         67950       2174400      43579488           0.5
  21         67861       2171552      45751040           0.5
  22         68077       2178464      47929504           0.5
  23         68275       2184800      50114304           0.5
  24         68116       2179712      52294016           0.5
  25         68178       2181696      54475712           0.5
  26         68137       2180384      56656096           0.5
  27         68093       2178976      58835072           0.5
  28         68288       2185216      61020288           0.5
  29         68662       2197184      63217472           0.5
  30         68358       2187456      65404928           0.5

--- Sustained Throughput Summary (FHE + ZKP + Attestation) ---
Duration:         30.01s
Workers:          96
Cache mode:       In-process DashMap (zero network)
Batch ops:        2044148
Effective auths:  65412736 (32 users/batch)
Batch throughput: 68120 batch/sec
Auth throughput:  2179828 auth/sec  (FHE + ZKP + Dilithium)
Per-auth latency: 0.5 µs

ZKP cache stats:
  Cache hits:   65412736 (100.0%)
  Cache misses: 0
  DashMap entries: 3072

Pipeline: FHE → ZKP cache (In-process DashMap (zero network)) → SHA3 → Dilithium sign → verify

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.045       4.053       4.101       4.105
      1000        1000       3.116       4.124       4.182       4.344
      5000        4998       3.134       4.156       4.208       4.371
     10000        9990       2.120       3.211       4.165       4.215
     25000       24941       2.019       2.148       3.160       4.176
     50000       49777       1.063       2.149       2.991       3.958

--- Fallback Path (individual verify_encrypted) ---
  P50: 1489 µs
  P95: 1499 µs
  P99: 1503 µs

--- Accumulator Metrics ---
  Batches flushed:   34158
  Requests batched:  911010
  Requests fallback: 0

======================================================================
=== BENCHMARK 4: CKKS Operations (Approximate Arithmetic) ===
======================================================================
  CKKS mode: turbo (N=8192, slots=4096)

  CKKS Latency Table (N=8192, 4096 slots)
  -------------------------------------------------------
  Operation                           Latency   per slot
  -------------------------------------------------------
  Encode + Encrypt                 11684.9 µs   2.853 µs
  Addition                           213.1 µs   0.052 µs
  Multiply + Relin                 13946.4 µs   3.405 µs
  Rescale                          14836.3 µs   3.622 µs
  Rotation (step=1)                 4226.7 µs   1.032 µs
  Dot Product (4 terms)            24419.8 µs   5.962 µs
  Bootstrap                       267611.3 µs  65.335 µs
  CKKS → BFV switch                 1253.1 µs   0.306 µs
  -------------------------------------------------------

======================================================================
=== BENCHMARK COMPLETE ===
======================================================================

Full lifecycle: FHE → ZKP (In-process DashMap (zero network)) → Dilithium attestation
CKKS: encode/encrypt/add/multiply/rescale/rotate/bootstrap/scheme-switch
