ML-DSA-65 (Dilithium) Performance

Version: 1.0.0
Status: Production
Last Updated: 2026-05-23
Hardware: AWS c8g.metal-48xl (Graviton4, 192 vCPU, 371 GiB)
NIST Standard: FIPS 204 (ML-DSA)
Canonical URL: https://h33.ai/benchmarks/dilithium/

1. Scope

This document presents production benchmark data for ML-DSA-65 (formerly CRYSTALS-Dilithium Level 3) as implemented in the H33 cryptographic pipeline. All measurements were recorded on AWS Graviton4 hardware under sustained load conditions (minimum 30 seconds per measurement). Numbers represent median latency with p99 bounds.

2. Definitions

ML-DSA-65: Module-Lattice-Based Digital Signature Algorithm at NIST security level 3 (128-bit classical / 128-bit quantum). Standardized in NIST FIPS 204. Based on the hardness of Module-Learning With Errors (MLWE) and Module-Short Integer Solution (MSIS).
Sign Latency: The wall-clock time to generate a signature over a 32-byte message digest, excluding key generation and I/O.
Verify Latency: The wall-clock time to verify a signature against a 32-byte message digest and public key.
Batch Signing: The process of signing multiple message digests sequentially using the same key pair, with amortized key loading overhead.

3. Hardware and Environment

Property	Value
Instance	AWS c8g.metal-48xl
Processor	Graviton4 (Arm Neoverse V2)
vCPUs	192
Memory	371 GiB
OS	Amazon Linux 2023 (kernel 6.1)
Rust	1.78.0 (stable)
Allocator	System (glibc)
Build flags	`target-cpu=native`
SIMD	NEON + SVE2 (256-bit)

4. Core Operations

4.1. Single-Operation Latencies

Operation	Median	p99	Min	Samples
Key generation	28 us	34 us	26 us	100,000
Sign (32B message)	72 us	91 us	68 us	1,000,000
Verify (32B message)	24 us	29 us	22 us	1,000,000
Sign + Verify	96 us	118 us	92 us	1,000,000

4.2. Key and Signature Sizes

Artifact	Size (bytes)
Public key	1,952
Secret key	4,032
Signature	3,309

5. Batch Signing Throughput

Batch signing amortizes key loading overhead across multiple sign operations. The following table shows throughput at different batch sizes on a single core.

Batch Size	Total Time	Per-Sign	Throughput (signs/sec)
1	72 us	72 us	13,889
32	2,210 us	69.1 us	14,480
100	6,840 us	68.4 us	14,620
1,000	68,100 us	68.1 us	14,684

At batch sizes above 32, per-sign latency converges to approximately 68 us (amortized). Single-core throughput plateaus at approximately 14,700 signs/sec.

5.1. Multi-Core Scaling

With independent key pairs per thread (no shared mutable state), ML-DSA-65 signing scales linearly across cores:

Cores	Throughput (signs/sec)	Scaling Factor
1	14,684	1.00x
16	234,200	15.95x
48	698,400	47.56x
96	1,389,000	94.60x
192	2,752,000	187.43x

Near-linear scaling (97.6% efficiency at 192 cores) is achieved because ML-DSA-65 signing has no shared state and fits entirely within L1 cache. The slight sub-linearity at 192 cores is attributable to memory bandwidth saturation, not lock contention.

6. Comparison: ML-DSA-65 vs Classical Schemes

The following table compares ML-DSA-65 against Ed25519 and RSA-2048, all measured on the same Graviton4 hardware under identical conditions.

Scheme	Security Level	Sign	Verify	Signature Size	PQ-Secure
ML-DSA-65	NIST Level 3	72 us	24 us	3,309 B	Yes
Ed25519	128-bit classical	18 us	52 us	64 B	No
RSA-2048	112-bit classical	1,420 us	38 us	256 B	No

ML-DSA-65 sign latency is 4x slower than Ed25519 but 19.7x faster than RSA-2048. ML-DSA-65 verify latency is 2.2x faster than Ed25519 verify and 1.6x faster than RSA-2048 verify. The signature size tradeoff (3,309 bytes vs 64 bytes for Ed25519) is the cost of post-quantum security.

Ed25519 and RSA-2048 are not quantum-resistant. The comparison is provided for migration planning purposes. Organizations transitioning from Ed25519 will observe a 4x sign latency increase and a 52x signature size increase, offset by a 2.2x verify latency improvement.

7. NIST FIPS 204 Compliance

The H33 ML-DSA-65 implementation conforms to NIST FIPS 204 (Module-Lattice-Based Digital Signature Standard). Compliance is verified through:

KAT vectors. All 100 Known Answer Test vectors from the NIST reference implementation pass. KAT validation runs as part of the CI pipeline on every commit.
Parameter set. ML-DSA-65 uses (k=6, l=5, eta=4, gamma1=2^19, gamma2=(q-1)/32, tau=49, beta=196, omega=55). These match the FIPS 204 specification exactly.
Deterministic signing. The implementation uses the deterministic variant (no hedging with additional randomness). Given identical inputs, identical signatures are produced.
Side-channel mitigation. Constant-time NTT, constant-time rejection sampling, and constant-time polynomial arithmetic. No data-dependent branches in the hot path.

8. Integration in H33 Pipeline

In the H33 production authentication pipeline, ML-DSA-65 is one of three signature families applied to each governance node. The batch attestation stage (Stage 2 in the pipeline) includes SHA3-256 hashing, ML-DSA-65 sign, FALCON-512 sign, SLH-DSA-SHA2-128f sign, and triple verify. ML-DSA-65 contributes approximately 35% of the Stage 2 latency.

Pipeline Stage	Component	Latency	ML-DSA-65 Contribution
Stage 2: Batch attest	SHA3-256 hash	2 us	--
Stage 2: Batch attest	ML-DSA-65 sign	72 us	72 us
Stage 2: Batch attest	FALCON-512 sign	89 us	--
Stage 2: Batch attest	SLH-DSA sign	204 us	--
Stage 2: Batch attest	ML-DSA-65 verify	24 us	24 us
Stage 2: Batch attest	Total	391 us	96 us (24.6%)

See Agent Infrastructure Benchmarks for the full pipeline breakdown.

9. Reproducibility

To reproduce these benchmarks:

# Clone and build (requires Rust 1.78+)
$ cargo build --release --features dilithium_bench

# Run single-operation benchmarks
$ ./target/release/examples/dilithium_bench --iterations 1000000

# Run batch benchmarks
$ ./target/release/examples/dilithium_bench --batch 32 --iterations 100000

# Run multi-core scaling test
$ ./target/release/examples/dilithium_bench --threads 192 --iterations 100000

Benchmarks MUST be run with target-cpu=native in .cargo/config.toml. Without this flag, the compiler will not emit SVE2 instructions, and results will be approximately 35% slower.

Related Benchmarks & Specifications

Benchmarks Index BFV FHE TFHE Gates STARK Proofs Agent Infrastructure NIST FIPS 203/204 H33-3-Key