Engineering / March 17, 2026 / 5 min read

Sub-20ms ZK-STARK Proofs on Mobile: Benchmarking Post-Quantum Device Attestation on ARM Silicon

We have been asked repeatedly whether zero-knowledge proofs are fast enough for mobile. The assumption behind the question is usually that ZK proof generation is a server-side operation — too computationally expensive for the thermal and power constraints of a phone. That assumption is wrong, and we have the numbers to prove it.

This post publishes benchmark results from our ZK-STARK device attestation system running on ARM silicon in release mode. The results represent the same instruction set architecture used in Apple's A-series and M-series chips, which means they are directly representative of what an iPhone or iPad achieves in production.

What We Measured

Our device attestation system generates three independent STARK proofs per API call, then aggregates them into a single proof bundle that travels in an HTTP header alongside the standard Bearer token. The three proofs cover device integrity, network jurisdiction, and endpoint binding. Each proof uses Poseidon as the in-circuit hash primitive — a ZK-native construction that requires roughly 8x fewer constraints than SHA-256 in R1CS, which is the difference between a sub-second mobile experience and a multi-second one.

We ran 20 iterations of each operation in release mode on ARM silicon and measured wall-clock time at the microsecond level. Here are the results.

OperationMedianP95
Poseidon commitment (single)43 us43 us
Device integrity proof generation5.17 ms5.66 ms
Network jurisdiction proof generation2.98 ms3.44 ms
Endpoint binding proof generation3.10 ms3.23 ms
Full aggregated attestation16.15 ms17.10 ms
Aggregated proof verification<1 us<1 us
Session commitment (50 behavioral events)17 us18 us
16.15ms
Full attestation (median)
<1us
Verification
192 bytes
Proof size

What These Numbers Mean

A full aggregated attestation — proving device integrity, network jurisdiction, and endpoint binding in a single bundle — completes in 16 milliseconds on ARM. On an actual iPhone, accounting for lower clock frequencies and thermal constraints compared to a desktop ARM chip, we project 25 to 35 milliseconds. Our original engineering target was 200 milliseconds. We are 6x to 12x under budget.

Verification is under one microsecond. This matters because verification runs at the API gateway on every inbound request. At sub-microsecond cost, attestation verification adds effectively zero latency to the request path. A gateway processing 100,000 concurrent requests pays less than 100 milliseconds of total verification compute per second.

The aggregated proof is 192 bytes. Our target was under 200 bytes to remain compatible with standard HTTP header size limits without requiring custom transport. We hit 192. The proof travels alongside the Bearer token in the Authorization header. No protocol changes, no WebSocket upgrades, no custom framing.

Why Poseidon Changes the Mobile Calculus

The reason ZK proofs were historically impractical on mobile is SHA-256. In an R1CS constraint system, a single SHA-256 hash invocation costs roughly 25,000 constraints. A proof circuit that hashes three inputs — which is the minimum for any useful attestation — requires 75,000 constraints before you add any application logic. On a phone with a 3 to 4 second UX budget, that math does not work.

Poseidon is a hash function designed specifically for arithmetic circuits. It operates over field elements using multiplications and additions rather than bitwise operations. A single Poseidon hash costs roughly 300 to 3,000 constraints depending on the parameterization — an 8x to 80x reduction compared to SHA-256. Our configuration sits at the lower end of that range because we optimize the number of full rounds for our specific security level.

We keep SHA-256 at the boundary layer where external systems expect standard hash outputs — TLS handshakes, X.509 certificate chains, blockchain transaction hashes. Inside our proof circuits, every hash is Poseidon. This division is invisible to the developer. The SDK handles the boundary translation automatically.

Continuous Authentication Without Killing Battery

The session commitment benchmark — 50 behavioral events hashed into a single commitment in 17 microseconds — is the number that unlocks continuous authentication on mobile. Our architecture accumulates behavioral signals throughout a session: keystroke timing patterns, input velocity, device orientation changes, and other signals that characterize the current user's interaction pattern. Rather than generating a proof per event, we batch an entire session window into a single commitment and prove the batch. This amortizes the proof cost across dozens or hundreds of events instead of paying it per interaction.

At 17 microseconds per 50-event batch, the battery and compute impact of continuous behavioral authentication is below the noise floor of normal application operation. The user never perceives it. The security team gets a cryptographic session integrity guarantee that updates continuously.

Post-Quantum by Default

Every attestation proof in these benchmarks is signed with a CRYSTALS-Dilithium digital signature, which is the NIST-standardized post-quantum signature algorithm (ML-DSA). The Dilithium signing cost is included in the aggregated attestation time. When we say 16 milliseconds, that includes the post-quantum signature. There is no additional step to make the attestation quantum-resistant — it is quantum-resistant by default because we do not use any classical signature algorithm in the attestation pipeline.

This matters for the same reason it matters everywhere in our stack: data and proofs generated today need to remain valid and unforgeable for decades. An attestation proof signed with ECDSA today could be forged by a quantum computer in the future. An attestation proof signed with Dilithium cannot.

Methodology and Reproducibility

The benchmark binary is written in Rust, compiled in release mode with full optimizations, and executed on ARM silicon. Each operation is measured over 20 iterations using monotonic clock timing at microsecond precision. We report median, mean, P95, min, and max for each operation. The benchmark source is available in our repository for independent verification.

These numbers are measured on desktop ARM silicon, which shares the same instruction set architecture as Apple's A-series mobile chips but runs at higher sustained clock frequencies. We apply a conservative 1.5x to 2x adjustment factor for projected iPhone performance. We will publish on-device iPhone benchmark results separately once our iOS SDK enters external testing.

See it happen live

Our Blind Mode Demo lets you watch FHE encryption, blind AI processing, and ZK proof generation in real time. No video. Live cryptography.

Try Blind Mode