Optimizing for H33 Rate Limits

Update — February 2026

H33 has removed all per-tier rate limits. We decided not to discriminate against smaller teams — if you're paying for units, you should be able to use them however you want. Burst or steady, it's your call. The batching and caching strategies below are still useful for reducing unit consumption, but you no longer need to worry about 429s from rate limiting.

Why API Efficiency Still Matters

Even with per-tier rate limits removed, calling a cryptographic authentication API carelessly wastes compute, inflates your bill, and adds unnecessary latency to your user-facing flows. H33's production stack processes 1.595 million authentications per second at ~42µs per auth on Graviton4 hardware, but that throughput only benefits you if your client-side integration is designed to take advantage of it. A poorly structured integration can turn sub-millisecond server work into multi-second round trips simply through redundant calls, missing caches, and serial request chains.

This guide covers four strategies that reduce your unit consumption while maintaining the full security guarantees of BFV fully homomorphic encryption, zero-knowledge proof verification, and Dilithium post-quantum attestation. Every technique described here is in production at H33 itself.

1. Client-Side Result Caching

The most effective optimization is the simplest: do not re-verify what you have already verified. An H33 authentication response includes a signed attestation — a Dilithium signature over a SHA3-256 digest of the verification result. That attestation is cryptographically bound to a specific user, biometric template, and timestamp. Until the attestation expires (configurable, default 300 seconds), there is zero security benefit to repeating the call.

Rule of Thumb

Cache the full attestation object, not just a boolean. When downstream services need proof of verification, pass the signed attestation rather than making a second API call. The Dilithium signature is independently verifiable without contacting H33.

In practice, this means storing the attestation response in a short-lived cache keyed by user identifier. A simple TTL cache works well:

import { H33Client } from "@h33/sdk";

const client = new H33Client({ apiKey: process.env.H33_API_KEY });
const cache = new Map();

async function verifyUser(userId, biometric) {
  const cached = cache.get(userId);
  if (cached && cached.expiresAt > Date.now()) {
    return cached.attestation; // Signed Dilithium attestation
  }

  const result = await client.verify({ userId, biometric });
  cache.set(userId, {
    attestation: result.attestation,
    expiresAt: Date.now() + (result.ttl * 1000),
  });
  return result.attestation;
}

For server-side applications handling thousands of concurrent users, replace the in-memory Map with Redis or an equivalent shared store. The attestation payload is compact — typically under 4KB including the Dilithium signature — so storage overhead is negligible.

2. Request Batching

H33's internal architecture uses SIMD batching to process 32 users per ciphertext in a single BFV operation. The batch verification endpoint exposes this directly: instead of sending 32 individual requests, you send one batch request containing up to 32 user templates. The server packs them into a single ciphertext, runs one FHE inner product, one ZKP lookup, and one Dilithium attestation for the entire group.

Approach	API Calls	Server Compute	Billed Units
32 individual requests	32	32 × ~42µs = ~1,344µs	32
1 batch request (32 users)	1	~1,109µs (single BFV batch)	1

The savings are dramatic: one billed unit instead of thirty-two, with roughly equivalent server-side latency. Batch verification is constant-time for 1 to 32 users — filling the batch with fewer users does not reduce compute time, but it does reduce your call count and simplifies error handling.

// Batch verify up to 32 users in a single API call
const batchResult = await client.batchVerify({
  users: userBiometrics.slice(0, 32), // Max 32 per batch
});

// Each user gets an individual pass/fail result
// but the batch shares a single Dilithium attestation
for (const entry of batchResult.results) {
  console.log(`${entry.userId}: ${entry.verified}`);
}

3. Exponential Backoff with Jitter

Although H33 no longer enforces per-tier rate limits, transient failures still occur — network timeouts, load balancer hiccups, brief maintenance windows. A retry strategy without backoff can amplify these into cascading failures. The standard approach is exponential backoff with full jitter:

async function verifyWithRetry(userId, biometric, maxRetries = 4) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await client.verify({ userId, biometric });
    } catch (err) {
      if (attempt === maxRetries) throw err;
      // Full jitter: random delay in [0, baseDelay * 2^attempt]
      const baseDelay = 100; // milliseconds
      const maxDelay = baseDelay * Math.pow(2, attempt);
      const jitter = Math.random() * maxDelay;
      await new Promise(r => setTimeout(r, jitter));
    }
  }
}

Full jitter (as opposed to equal jitter or decorrelated jitter) spreads retries across the entire delay window, which prevents the "thundering herd" effect where many clients retry at exactly the same moment. For H33's workloads, a base delay of 100ms with four retries provides a maximum backoff of ~1.6 seconds — well within acceptable UX thresholds for authentication flows.

4. ZKP Cache Awareness

Internally, H33 uses an in-process DashMap for ZKP proof caching, achieving 0.085µs per lookup — 44 times faster than recomputing raw STARK proofs. The API exposes this indirectly: repeated verifications of the same user within a short window benefit from warm ZKP caches on the server side. You cannot control this cache directly, but you can structure your requests to take advantage of it.

If your application re-verifies the same user population cyclically (for example, a workforce badge system checking the same 500 employees every shift), those verifications will consistently hit warm ZKP cache entries on H33's servers. The first cycle populates the cache; subsequent cycles are faster.

This means you should avoid randomizing user verification order unnecessarily. If you verify users A through Z in order, keep that order stable across cycles. Cache locality on the server side translates directly into lower latency for your requests.

Putting It All Together

The optimal integration pattern combines all four strategies: cache attestations client-side to eliminate redundant calls, batch concurrent verifications into groups of up to 32, apply exponential backoff with jitter for transient failures, and maintain stable request ordering to benefit from server-side ZKP caches. For a typical deployment verifying 10,000 users per hour, this reduces API calls from 10,000 to roughly 315 batch requests — a 97% reduction in billed units.

Performance Summary

H33 sustains 2.17M auth/sec at ~42µs per auth (see benchmarks). Each batch processes 32 users in ~1,109µs using BFV FHE inner products, with ZKP lookups at 0.085µs via DashMap and a single Dilithium sign+verify for attestation at ~244µs. Your client-side optimizations determine how efficiently you consume that capacity.

The H33 SDK (@h33/sdk for Node.js, h33-rs for Rust) implements caching and batching out of the box. If you are building a custom integration against the REST API directly, the patterns above are the minimum recommended baseline. For further optimization guidance specific to your deployment, reach out to the H33 engineering team or consult the API documentation.

Ready to Go Quantum-Secure?

Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.

Get Free API Key →

Optimizing for H33 Rate Limits:
Efficient API Usage

Why API Efficiency Still Matters

1. Client-Side Result Caching

2. Request Batching

3. Exponential Backoff with Jitter

4. ZKP Cache Awareness

Putting It All Together

Ready to Go Quantum-Secure?

Build With Post-Quantum Security

Why API Efficiency Still Matters

1. Client-Side Result Caching

2. Request Batching

3. Exponential Backoff with Jitter

4. ZKP Cache Awareness

Putting It All Together

Ready to Go Quantum-Secure?

Build With Post-Quantum Security

Related Articles