H33 Graviton4 Benchmark Suite
=============================
Sustained: 60s, Workers: 96, Latency: 15s/rate
Allocator: system
Architecture: aarch64 (ARM/Graviton)
Rayon threads: 96

======================================================================
=== BENCHMARK 1: SIMD Single-User Inner Product ===
======================================================================
Comparing: verify_encrypted (chunked) vs batch_verify_multi (SIMD packed)

verify_encrypted (chunked):
  Median: 1576 µs
  P95:    1586 µs
  P99:    1596 µs
  Min:    1567 µs

batch_verify_multi (SIMD, 1 probe):
  Median: 1431 µs
  P95:    1441 µs
  P99:    1465 µs
  Min:    1422 µs

  Speedup: 1.1x (median)

batch_verify_multi (SIMD, 32 probes):
  Median: 1375 µs total  (43.0 µs/auth)
  P95:    1405 µs
  P99:    1423 µs
  Single-thread throughput: 23273 auth/sec

======================================================================
=== BENCHMARK 2: 32-User Batch Sustained Throughput ===
=== 96 workers, 60 seconds ===
======================================================================
Allocator: system
Setting up 96 worker contexts...
Setup: 189036.2ms
Warming up...

Starting sustained load...
 Sec    BatchOps/s     EffAuth/s    Total Auth   Per-auth µs
-----------------------------------------------------------------
   1         39518       1264576       1264576           0.8
   2         40056       1281792       2546368           0.8
   3         40308       1289856       3836224           0.8
   4         40425       1293600       5129824           0.8
   5         40316       1290112       6419936           0.8
   6         40185       1285920       7705856           0.8
   7         40343       1290976       8996832           0.8
   8         40210       1286720      10283552           0.8
   9         40221       1287072      11570624           0.8
  10         40274       1288768      12859392           0.8
  11         40312       1289984      14149376           0.8
  12         40130       1284160      15433536           0.8
  13         40295       1289440      16722976           0.8
  14         40279       1288928      18011904           0.8
  15         40276       1288832      19300736           0.8
  16         40221       1287072      20587808           0.8
  17         40261       1288352      21876160           0.8
  18         40382       1292224      23168384           0.8
  19         40447       1294304      24462688           0.8
  20         40425       1293600      25756288           0.8
  21         40301       1289632      27045920           0.8
  22         40420       1293440      28339360           0.8
  23         40216       1286912      29626272           0.8
  24         40279       1288928      30915200           0.8
  25         40424       1293568      32208768           0.8
  26         40335       1290720      33499488           0.8
  27         40248       1287936      34787424           0.8
  28         40532       1297024      36084448           0.8
  29         40372       1291904      37376352           0.8
  30         40390       1292480      38668832           0.8
  31         40334       1290688      39959520           0.8
  32         40330       1290560      41250080           0.8
  33         40256       1288192      42538272           0.8
  34         40458       1294656      43832928           0.8
  35         40506       1296192      45129120           0.8
  36         40417       1293344      46422464           0.8
  37         40384       1292288      47714752           0.8
  38         40442       1294144      49008896           0.8
  39         40518       1296576      50305472           0.8
  40         40653       1300896      51606368           0.8
  41         40289       1289248      52895616           0.8
  42         40377       1292064      54187680           0.8
  43         40449       1294368      55482048           0.8
  44         40293       1289376      56771424           0.8
  45         40536       1297152      58068576           0.8
  46         40335       1290720      59359296           0.8
  47         40455       1294560      60653856           0.8
  48         40428       1293696      61947552           0.8
  49         40356       1291392      63238944           0.8
  50         40484       1295488      64534432           0.8
  51         40481       1295392      65829824           0.8
  52         40516       1296512      67126336           0.8
  53         40479       1295328      68421664           0.8
  54         40432       1293824      69715488           0.8
  55         40453       1294496      71009984           0.8
  56         40319       1290208      72300192           0.8
  57         40468       1294976      73595168           0.8
  58         40358       1291456      74886624           0.8
  59         40460       1294720      76181344           0.8
  60         40553       1297696      77479040           0.8

--- Sustained Throughput Summary ---
Duration:         60.01s
Workers:          96
Batch ops:        2421502
Effective auths:  77488064 (32 users/batch)
Batch throughput: 40350 batch/sec
Auth throughput:  1291207 auth/sec
Per-auth latency: 0.8 µs

======================================================================
=== BENCHMARK 3: BatchAccumulator Latency Distribution ===
======================================================================

      Rate    Achieved      P50 ms      P95 ms      P99 ms      Max ms
----------------------------------------------------------------------
       100         100       4.522       4.536       4.579       4.752
      1000        1000       3.936       5.052       5.064       5.222
      5000        4997       3.993       5.083       5.100       5.288
     10000        9990       2.985       4.059       4.114       5.162
     25000       24941       2.509       2.981       3.466       4.699
     50000       49788       1.543       2.845       3.346       4.545

--- Fallback Path (individual verify_encrypted) ---
  P50: 1595 µs
  P95: 1603 µs
  P99: 1610 µs

--- Accumulator Metrics ---
  Batches flushed:   51450
  Requests batched:  1366510
  Requests fallback: 0

======================================================================
=== BENCHMARK COMPLETE ===
======================================================================
