PricingDemo
Log InGet API Key
How-To Guide

How to Add Post-Quantum Signatures to Any API

A practical engineering guide to integrating ML-DSA-65 digital signatures into REST and gRPC services using header injection and H33-74 attestation

Every API call your system makes today is signed with cryptography that a sufficiently powerful quantum computer will break. RSA-2048 signatures, ECDSA over P-256 curves, Ed25519 -- all of these rely on mathematical problems that Shor's algorithm solves in polynomial time. The question is not whether you need to migrate. The question is whether you do it now, on your schedule, or later, on someone else's. This guide walks you through adding ML-DSA-65 post-quantum signatures to any existing API without rewriting your business logic, without changing your payload formats, and without disrupting clients that have not yet upgraded.

We will cover the header injection pattern for REST APIs, metadata propagation for gRPC services, key management considerations, the integration path with H33-74 attestation, and the production performance characteristics that make this practical today rather than theoretical. Every code example in this guide reflects patterns running in production at H33, where the complete signing pipeline operates at 2,293,766 authentications per second on AWS Graviton4 hardware.

Why ML-DSA-65 and Why Now

ML-DSA-65, standardized as NIST FIPS 204, is the Module-Lattice-Based Digital Signature Algorithm at security level 3. It was formerly known as CRYSTALS-Dilithium during the NIST Post-Quantum Cryptography standardization process. ML-DSA-65 produces signatures of 3,309 bytes with public keys of 1,952 bytes. The security of ML-DSA rests on the hardness of the Module Learning With Errors (MLWE) problem, which is believed to resist both classical and quantum attacks.

The reason to start with ML-DSA-65 specifically is straightforward: it is the NIST-recommended general-purpose post-quantum signature algorithm. It has the best balance of signature size, verification speed, and key generation time among the NIST finalists. FALCON offers smaller signatures but requires careful constant-time floating-point arithmetic that is notoriously difficult to implement securely. SLH-DSA (SPHINCS+) is based on hash functions rather than lattices, providing a different hardness assumption, but its signatures are significantly larger. ML-DSA-65 is the practical starting point for most API integrations.

The urgency comes from the harvest-now-decrypt-later threat. Adversaries are already recording encrypted API traffic. When quantum computers become capable of running Shor's algorithm at scale, every recorded signature can be forged retroactively. Every signed API response your system has ever produced becomes unverifiable. For financial transactions, medical records, legal documents, and compliance audit trails, this retroactive exposure is not hypothetical -- it is a strategic certainty that intelligence agencies and sophisticated threat actors are actively exploiting through collection programs today.

The Header Injection Pattern for REST APIs

The cleanest way to add post-quantum signatures to an existing REST API is the header injection pattern. Instead of modifying request and response bodies, you compute ML-DSA-65 signatures over the payload and attach them as HTTP headers. This approach has three critical advantages: existing clients continue to work because they simply ignore headers they do not recognize; signature verification is optional during the migration period; and the signing logic lives in middleware rather than business code.

Here is the pattern in Rust, using an Axum middleware layer:

use axum::{middleware::Next, http::{Request, Response, HeaderValue}};
use h33_pqc::mldsa65::{SigningKey, sign};

async fn pq_sign_middleware(
    request: Request<Body>,
    next: Next,
) -> Response<Body> {
    let mut response = next.run(request).await;

    // Extract response body bytes for signing
    let body_bytes = get_response_bytes(&response).await;

    // Sign with ML-DSA-65
    let signing_key = get_signing_key();  // From secure key store
    let signature = sign(&signing_key, &body_bytes);

    // Inject signature as HTTP header (base64-encoded)
    let sig_b64 = base64_encode(&signature);
    response.headers_mut().insert(
        "X-PQ-Signature",
        HeaderValue::from_str(&sig_b64).unwrap()
    );

    // Include algorithm identifier for crypto-agility
    response.headers_mut().insert(
        "X-PQ-Algorithm",
        HeaderValue::from_static("ML-DSA-65")
    );

    // Include key identifier for rotation support
    response.headers_mut().insert(
        "X-PQ-Key-Id",
        HeaderValue::from_str(&get_active_key_id()).unwrap()
    );

    response
}

The three headers work together. X-PQ-Signature carries the actual ML-DSA-65 signature over the response body. X-PQ-Algorithm identifies which algorithm was used, enabling crypto-agility when you later add FALCON or SLH-DSA as alternatives. X-PQ-Key-Id references which public key the verifier should use, supporting key rotation without breaking verification during rollover periods.

Request Signing: The Client Side

For request signing, the same pattern applies in reverse. The client computes an ML-DSA-65 signature over the request body (and optionally over critical headers like the request path and timestamp) and includes it in the outgoing request:

async fn sign_request(
    mut request: Request<Body>,
    signing_key: &SigningKey,
) -> Request<Body> {
    // Canonical representation: method + path + body
    let method = request.method().as_str();
    let path = request.uri().path();
    let body = get_request_bytes(&request).await;

    let mut sign_input = Vec::new();
    sign_input.extend_from_slice(method.as_bytes());
    sign_input.push(b'\n');
    sign_input.extend_from_slice(path.as_bytes());
    sign_input.push(b'\n');
    sign_input.extend_from_slice(&body);

    let signature = sign(signing_key, &sign_input);

    request.headers_mut().insert(
        "X-PQ-Signature",
        HeaderValue::from_str(&base64_encode(&signature)).unwrap()
    );
    request.headers_mut().insert(
        "X-PQ-Algorithm",
        HeaderValue::from_static("ML-DSA-65")
    );

    request
}

The canonical representation is critical. Both client and server must agree on exactly which bytes are signed. We concatenate the HTTP method, path, and body with newline separators. This prevents cross-method replay attacks where a signed GET response is replayed as a POST. In production, you should also include a timestamp header and a nonce to prevent replay attacks entirely.

gRPC Integration via Metadata

gRPC uses metadata (analogous to HTTP headers) for out-of-band information. The pattern is identical in concept but uses gRPC interceptors rather than HTTP middleware. Here is the server-side interceptor in Rust using tonic:

use tonic::{Request, Response, Status};

async fn pq_sign_interceptor(
    request: Request<()>,
    next: impl tonic::service::interceptor::Interceptor,
) -> Result<Response<()>, Status> {
    let mut response = next.call(request).await?;

    let body_bytes = serialize_response(&response);
    let signature = sign(&get_signing_key(), &body_bytes);

    response.metadata_mut().insert(
        "x-pq-signature-bin",
        tonic::metadata::MetadataValue::from_bytes(
            &signature
        ),
    );

    Ok(response)
}

Note the -bin suffix on the metadata key. gRPC requires binary metadata values to use keys ending in -bin, which triggers automatic base64 encoding on the wire. This is important because ML-DSA-65 signatures are binary data, not valid ASCII strings. Using the -bin suffix avoids the double-encoding problem where you base64-encode the signature yourself and then gRPC base64-encodes it again.

Bidirectional Streaming Considerations

For gRPC bidirectional streaming, you cannot sign the entire stream as a single unit because messages arrive incrementally. The standard approach is to sign each message individually and include the message sequence number in the signed data. This prevents message reordering attacks within a stream. For high-throughput streams, consider signing every Nth message and including a running hash of all preceding messages, which amortizes the signing cost while maintaining tamper detection.

Key Management for PQ Signatures

ML-DSA-65 key pairs are larger than their classical counterparts. A public key is 1,952 bytes compared to 32 bytes for Ed25519. A secret key is 4,032 bytes. This changes key management in several practical ways.

First, key distribution. If you currently distribute public keys via DNS TXT records, JWK sets, or certificate transparency logs, you need to verify that your infrastructure handles the larger key sizes. Some DNS resolvers truncate TXT records over 255 bytes. JWK endpoints may have payload size limits. Test your actual infrastructure, not just the specifications.

Second, key rotation. ML-DSA-65 key generation takes approximately 0.1 milliseconds on modern hardware, which means you can rotate keys frequently without performance concerns. We recommend a rotation period of 24 hours for high-security applications, with a 48-hour overlap window where both the old and new keys are accepted for verification. This gives clients time to fetch the new public key without experiencing verification failures during the transition.

Third, key storage. The 4,032-byte secret key must be stored in a hardware security module (HSM) or equivalent secure enclave for production deployments. Software key stores are acceptable for development and testing but introduce key extraction risks in production. H33's key management layer uses a pool architecture that pre-generates key pairs during idle periods, eliminating key generation latency from the signing hot path.

// Key rotation with overlap window
struct KeyManager {
    active_key: SigningKeyPair,
    previous_key: Option<SigningKeyPair>,
    rotation_interval: Duration,
    overlap_window: Duration,
    last_rotation: Instant,
}

impl KeyManager {
    fn should_rotate(&self) -> bool {
        self.last_rotation.elapsed() > self.rotation_interval
    }

    fn rotate(&mut self) {
        let new_key = SigningKeyPair::generate();
        self.previous_key = Some(
            std::mem::replace(&mut self.active_key, new_key)
        );
        self.last_rotation = Instant::now();
    }

    fn verify(&self, key_id: &str, msg: &[u8], sig: &[u8]) -> bool {
        if key_id == self.active_key.id() {
            return self.active_key.verify(msg, sig);
        }
        if let Some(ref prev) = self.previous_key {
            if key_id == prev.id() {
                return prev.verify(msg, sig);
            }
        }
        false
    }
}

Integrating H33-74 Attestation

The header injection pattern gives you post-quantum signatures on individual API calls. H33-74 attestation goes further: it distills the full three-family signature bundle (ML-DSA-65 + FALCON-512 + SLH-DSA-SHA2-128f) into a 74-byte attestation that can be verified independently. This is not compression. Compression implies you can decompress back to the original. Distillation is a one-way process that preserves the cryptographic guarantees -- specifically, the property that the attestation is unforgeable unless MLWE lattices, NTRU lattices, and stateless hash functions are all simultaneously broken -- while reducing the on-wire footprint by orders of magnitude.

The integration is straightforward. Instead of signing with ML-DSA-65 alone, you call the H33 attestation endpoint:

// Direct ML-DSA-65 signing: 3,309 bytes per signature
// H33-74 attestation: 74 bytes per attestation

async fn attest_response(body: &[u8]) -> AttestationResult {
    let client = H33Client::new(api_key);

    let attestation = client
        .attest(body)
        .await
        .expect("attestation failed");

    // attestation.substrate is exactly 74 bytes
    // 32 bytes on-chain hash + 42 bytes Cachee verification
    assert_eq!(attestation.substrate.len(), 74);

    attestation
}

// Inject the 74-byte attestation as a header
response.headers_mut().insert(
    "X-H33-Attestation",
    HeaderValue::from_str(
        &base64_encode(&attestation.substrate)
    ).unwrap()
);

The 74-byte attestation replaces three separate signatures (ML-DSA at 3,309 bytes, FALCON at 690 bytes, and SLH-DSA at 17,088 bytes) with a single compact proof. For APIs that handle thousands of requests per second, the bandwidth savings alone justify the integration. But the real value is that the attestation rests on three independent hardness assumptions rather than one. A break in MLWE lattices does not compromise the attestation because the NTRU and hash-based components remain intact.

Verification on the Receiving End

The verification side mirrors the signing side. A middleware extracts the signature or attestation header, retrieves the appropriate public key, and verifies before passing the request to the handler:

async fn pq_verify_middleware(
    request: Request<Body>,
    next: Next,
) -> Result<Response<Body>, StatusCode> {
    // Check for H33-74 attestation first
    if let Some(attestation) = request.headers()
        .get("X-H33-Attestation")
    {
        let substrate = base64_decode(
            attestation.to_str().unwrap()
        );
        let body = get_request_bytes(&request).await;

        if !h33_verify(&substrate, &body).await {
            return Err(StatusCode::UNAUTHORIZED);
        }

        return Ok(next.run(request).await);
    }

    // Fall back to direct ML-DSA-65 verification
    if let Some(sig_header) = request.headers()
        .get("X-PQ-Signature")
    {
        let algorithm = request.headers()
            .get("X-PQ-Algorithm")
            .map(|v| v.to_str().unwrap_or(""))
            .unwrap_or("");

        if algorithm != "ML-DSA-65" {
            return Err(StatusCode::BAD_REQUEST);
        }

        let key_id = request.headers()
            .get("X-PQ-Key-Id")
            .ok_or(StatusCode::BAD_REQUEST)?
            .to_str()
            .map_err(|_| StatusCode::BAD_REQUEST)?;

        let signature = base64_decode(
            sig_header.to_str().unwrap()
        );
        let body = get_request_bytes(&request).await;
        let pub_key = fetch_public_key(key_id).await
            .ok_or(StatusCode::UNAUTHORIZED)?;

        if !mldsa65_verify(&pub_key, &body, &signature) {
            return Err(StatusCode::UNAUTHORIZED);
        }
    }

    Ok(next.run(request).await)
}

The verification middleware checks for H33-74 attestation first and falls back to direct ML-DSA-65 verification. This layered approach supports the migration path where some clients send attestations and others send raw signatures. During the migration period, you can also accept classical signatures (ECDSA, Ed25519) as a third fallback, giving clients time to upgrade at their own pace.

Performance in Production

The concern with post-quantum signatures is always size and speed. Here are the real numbers from H33's production pipeline running on AWS Graviton4 (c8g.metal-48xl, 192 vCPUs):

ML-DSA-65 sign: 38 microseconds per operation. ML-DSA-65 verify: 22 microseconds per operation. H33-74 full attestation (three-family sign + distillation): 38 microseconds per authentication in the batched pipeline. Sustained throughput: 2,293,766 authentications per second. Per-batch latency (32 users per ciphertext, including FHE + ZKP + signing): under 1,400 microseconds.

These are not microbenchmark numbers isolated from everything else. They are measured in the full production pipeline that includes BFV homomorphic encryption with 4,096 SIMD slots, STARK-based zero-knowledge proof verification, and three-family post-quantum signature generation and verification. The signing step represents approximately 29% of total pipeline time, with FHE at 70% and ZKP at less than 1%.

For an API that currently handles 10,000 requests per second with ECDSA signatures, switching to ML-DSA-65 adds approximately 16 microseconds of additional latency per request (38 microseconds for ML-DSA-65 signing minus roughly 22 microseconds for ECDSA signing). At 10,000 RPS, this is 160 milliseconds of additional compute per second spread across your available cores. On a modern 8-core server, this is invisible.

Migration Strategy: The Three-Phase Approach

Phase one is shadow signing. Run ML-DSA-65 signing alongside your existing classical signatures. Include the post-quantum signature as an additional header but do not require it for verification. This phase validates that your signing pipeline works correctly in production without risk. Log any cases where the PQ signature would have failed verification so you can debug serialization or canonicalization issues.

Phase two is dual verification. Begin verifying PQ signatures when present but continue accepting classical signatures from clients that have not upgraded. Set a deadline for migration and communicate it clearly. During this phase, monitor verification failure rates and key distribution lag. Most issues surface here: clock skew affecting timestamp validation, CDN caching of stale JWK sets, and load balancers that strip custom headers.

Phase three is PQ-only enforcement. After the migration deadline, reject requests that do not carry a valid post-quantum signature. Classical signatures are no longer accepted. This is the point of no return, and it should only happen after you have confirmed that all critical clients have migrated successfully. Keep the classical verification code in place but behind a feature flag for emergency rollback.

Common Migration Pitfalls

Header size limits are the most common surprise. ML-DSA-65 signatures are 3,309 bytes, which base64-encodes to approximately 4,412 bytes. Some reverse proxies default to 4KB header limits. Nginx, for example, uses large_client_header_buffers with a default of 8KB total across all headers. If you have many existing headers, one 4.4KB signature header may push you over the limit. Audit your header budgets before deploying.

The second pitfall is canonicalization. If your API response includes fields that vary between serializations (floating-point precision, map key ordering, whitespace), the signature will not verify. Define a canonical serialization format and enforce it. JSON Canonicalization Scheme (JCS, RFC 8785) is one option. Another is to sign the raw bytes as they leave your server and verify the raw bytes as they arrive, never re-serializing between signing and verification.

The third pitfall is certificate chain integration. If your PQ signatures need to chain to a root of trust, you will need a PQ certificate authority. Traditional X.509 CAs do not yet issue ML-DSA certificates at scale. H33-74 attestation sidesteps this problem entirely because the attestation is self-verifying against the H33 substrate rather than requiring a certificate chain.

Beyond Single Signatures: Batch Attestation

For high-throughput APIs, signing every individual response is wasteful. H33's batch signing pattern groups 32 operations into a single signing cycle. The BFV homomorphic encryption uses 4,096 SIMD slots, and the downstream Dilithium signature covers the entire batch at once. This amortizes the signing cost across 32 operations, which is why the per-authentication cost drops to 38 microseconds even though an individual ML-DSA-65 sign operation itself takes longer.

You can implement a similar batching pattern without H33 by collecting responses in a buffer, signing the batch, and distributing individual Merkle proofs to each response. The batch signature covers the Merkle root, and each individual response carries its Merkle branch as proof of inclusion. This pattern reduces signing overhead by a factor equal to the batch size while maintaining individual verifiability.

Testing Your Integration

Post-quantum signature testing requires specific attention that classical signature testing does not. First, test with known answer tests (KATs) from the NIST submission. These are fixed input/output pairs that validate your implementation produces correct signatures. Second, test cross-implementation verification: sign with one library and verify with another. If both produce and accept the same test vectors, your serialization is correct. Third, test performance under load. ML-DSA-65 signing allocates more memory than ECDSA signing, and garbage collection pauses (in languages that have GC) can cause latency spikes that do not appear in unit tests.

H33's testing suite runs over 3,700 tests across the full pipeline, including specific tests for header injection, key rotation overlap, canonicalization edge cases, and batch signing boundary conditions. The most revealing test is the round-trip integration test: generate a key pair, sign a payload, serialize the signature to a header, transmit through an actual HTTP stack, deserialize, and verify. This catches serialization bugs that unit tests miss.

What Comes Next

Adding ML-DSA-65 signatures to your API is the first step. The natural progression is to add a second signature family (FALCON or SLH-DSA) for defense in depth, then integrate H33-74 attestation to distill the multi-family bundle into 74 bytes. From there, you can layer in FHE for encrypted computation and STARK proofs for verifiable execution, building toward a fully post-quantum API stack.

The header injection pattern makes each step incremental. You never need to rewrite your API. You add middleware layers, each one strengthening the cryptographic guarantees without touching the business logic underneath. That is the right way to migrate: incrementally, verifiably, and on your own schedule rather than in a panic when the first cryptographically relevant quantum computer is announced.

Contact us at support@h33.ai if you need help planning your migration. We have done this across banking, healthcare, government, and AI compliance deployments, and the patterns are well-understood. The hardest part is not the cryptography. It is getting the infrastructure team to raise the header size limits.

Ready to Add PQ Signatures to Your API?

Start with the free tier or schedule a migration planning session with our team.

Verify It Yourself