Today we're publishing measured TFHE throughput numbers from a production Graviton4 deployment. Not projections. Not single-gate extrapolations. Real encrypted comparison and equality circuits, running across 96 parallel channels, sustained for 30 seconds each, on a single ARM node with no GPU.
These numbers matter because encrypted comparison is the primitive that turns FHE from a research curiosity into a deployable product. "Is the fraud score above the threshold?" "Does the encrypted credit score qualify?" "Is this transaction amount within the approved range?" Every one of those questions is a comparison on encrypted data, and every one of them requires TFHE — because BFV and CKKS, the other two FHE families, fundamentally cannot do comparisons. They can add and multiply. They cannot branch.
Hardware: AWS c8g.metal-48xl (Graviton4, 192 vCPUs, 371 GiB). Single node, no GPU, no cluster. CACHEE_MODE=inprocess for zero-latency cache lookup.
| Operation | Bit Width | TPS (96 channels) | Per-Channel Latency |
|---|---|---|---|
| Greater-Than | 8-bit | 768 | 125 ms |
| Greater-Than | 16-bit | 372 | 258 ms |
| Greater-Than | 32-bit | 182 | 526 ms |
| Greater-Than | 64-bit | 91 | 1,058 ms |
| Equality | 16-bit | 769 | 125 ms |
The fundamental gate throughput is 11,520 AND gates per second across all 96 channels, sustained. Every number in the table above is derived from this single constant divided by the circuit's AND gate count. The gate is the atom; everything else is chemistry.
A fraud score is typically 0–255 (8-bit). "Is this transaction's encrypted fraud score above the cutoff?" is a single 8-bit comparison: 768 decisions per second on encrypted data, on one node. For a bank processing 50M transactions per day, that's ~579 TPS average — a single Graviton4 node handles it with headroom.
Credit scores range 300–850 (10-bit, but evaluated as 16-bit). "Does the applicant's encrypted score meet the threshold?" is a 16-bit comparison at 372 TPS. A large lender doing 100K applications per day needs ~1.2 TPS average. A single node handles roughly 300x that volume.
"Is the encrypted transaction amount above $10,000?" expressed in cents is a 32-bit comparison. At 182 TPS, a compliance engine can evaluate every transaction from a mid-size bank's daily volume on a single node.
"Does the encrypted SSN fragment match the reference?" is a 16-bit equality test. Equality is structurally simpler than comparison — it uses an AND tree instead of a ripple chain — and runs at 769 TPS. This is the primitive for encrypted identity matching, cross-bank fraud detection, and KYC attribute verification without exposing the underlying data.
The TFHE landscape is GPU-dominated. Zama's TFHE-rs achieves sub-millisecond gate latency on NVIDIA H100 hardware. That's impressive, and we respect the engineering. But GPU deployment has costs that don't appear in the benchmark:
Our approach is different: run TFHE on ARM CPUs with massive channel parallelism. Graviton4's 192 vCPUs give us 96 independent TFHE channels, each running its own bootstrap pipeline. The per-gate latency (8.3ms) is higher than GPU, but the per-node cost is lower and the deployment model is simpler. For workloads where you need hundreds of encrypted decisions per second — not millions — the CPU path is the right economic choice.
H33's FHE stack doesn't force you to choose between BFV and TFHE. The FHE-IQ router makes the decision automatically based on the operation you're performing.
The routing rule is clean:
Is the operation polynomial (add, multiply, inner product)? → BFV (35 µs per auth, 2.2M auth/sec) Is the operation non-polynomial (compare, branch, match, sort)? → TFHE (125 ms per 8-bit comparison, 768 TPS)
In a typical fraud-detection pipeline, the score computation (weighted sum of features) runs on BFV at microsecond latency. The threshold decision ("is the score above the cutoff?") routes to TFHE. The handoff is invisible to the caller — the IQ router inspects the computation graph and selects the optimal engine for each node.
This is not a theoretical architecture. It's running in production with 142 routing tests covering 100 realistic scenarios across banking, healthcare, legal, cybersecurity, insurance, IoT, and governance workloads. Every non-polynomial operation routes to TFHE. Every polynomial operation stays on BFV. No manual engine selection required.
A quick note on how encrypted comparison actually works, because the gate counts matter for understanding the numbers.
An n-bit greater-than comparison uses a ripple comparator that propagates a "greater" flag from LSB to MSB. Each bit position requires 2 AND gates (one for "a > b at this bit" and one for "propagate previous result through equality"). Total: 2n - 1 AND gates. XOR and NOT operations are free in TFHE — they're direct LWE additions and negations with no bootstrap required.
An n-bit equality test first computes per-bit XNOR (free: XOR + NOT), then reduces the n equality bits via a binary AND tree. Total: n - 1 AND gates. This is why 16-bit equality (15 AND gates) runs at the same speed as 8-bit comparison (15 AND gates) — same gate count, same throughput.
Every AND gate requires a programmable bootstrap — the operation that refreshes noise and evaluates the gate function. At 8.3ms per bootstrap on Graviton4, this is the fundamental throughput limiter. XOR and NOT are noise-accumulating but free; AND is the expensive reset. SHA3-256 requires 38,400 AND gates across 24 Keccak rounds, putting it firmly outside TFHE's performance envelope on any current hardware. We measured 0.30 TPS. We are not claiming SHA3-under-TFHE is production-viable. It isn't.
Every TFHE operation in H33's stack is post-quantum secure. TFHE is lattice-based (Learning With Errors over a torus), which means the same mathematical hardness assumption that protects BFV also protects TFHE. The comparison result — the single encrypted bit that says "yes, the score is above the threshold" — can be attested via our three-family post-quantum signature bundle and committed to the H33-74 substrate.
The full pipeline for an encrypted threshold decision:
The input value is never exposed. The threshold is never exposed. Only the yes/no decision crosses the encryption boundary, and it's immediately signed under three independent post-quantum assumptions.
We believe in reproducible benchmarks. Here's exactly how these numbers were produced:
target-cpu=neoverse-v2The benchmark source is available for audit. We do not publish numbers we cannot reproduce.
Every comparison width in the table maps to a specific class of real-world encrypted decision. The doubling curve (768, 372, 182, 91) is perfectly linear — each width doubling doubles the gate count and halves throughput — which means you can predict the cost of any integer-width comparison without running another benchmark. 128-bit would be approximately 45 TPS, 256-bit approximately 23 TPS.
The 8-bit/16-bit cluster is the deployable core. Fraud scores (0–255), credit risk bands (below 40, 40–70, above 70), eligibility flags, and categorical thresholds all fit in 8 or 16 bits. Cross-bank fraud matching — "does this encrypted account identifier match any entry in the encrypted watchlist?" — is a 16-bit equality test at 769 TPS per node. Medical eligibility checks, insurance qualification thresholds, and compliance zone transitions are all 8-bit comparisons at 768 TPS.
These are not hypothetical use cases. They are the exact operations that regulated industries need to perform on data they are not allowed to see in plaintext. A bank evaluating fraud risk on an encrypted transaction, a hospital checking encrypted insurance eligibility, an insurer comparing an encrypted claim amount to a coverage threshold — each of these is a single TFHE comparison that returns an encrypted yes/no without ever exposing the underlying value.
Transaction amounts in cents fit in 32 bits up to $21 million. "Is this encrypted wire transfer above the reporting threshold?" is a 32-bit comparison at 182 TPS — more than sufficient for any single institution's compliance pipeline. Encrypted timestamp comparisons for session-age validation, rate limiting on encrypted request counts, and full-precision credit score evaluation also land here.
Full Unix timestamps (millisecond precision), monetary amounts in micro-units (supporting sub-cent precision for high-frequency settlement), and encrypted indexed lookups on 64-bit keys. At 91 TPS, a single node handles over 7.8 million encrypted timestamp comparisons per day. For temporal access control — "has the encrypted session token expired?" — this is more than enough throughput.
These numbers establish TFHE as a production-viable engine for encrypted decision-making on CPU infrastructure. The IQ router makes it invisible — callers submit computations, the router picks the right engine, results come back attested.
Three areas we're investing in next:
The BFV pipeline handles arithmetic at 2.2 million operations per second. The TFHE pipeline handles decisions at hundreds per second. Together, through the IQ router, they cover the full spectrum of encrypted computation — from high-throughput batch processing to precise threshold logic — on commodity ARM hardware, post-quantum secured, no GPU required.
Patent pending. H33-74 substrate technology. All benchmarks measured on AWS c8g.metal-48xl Graviton4, April 2026.
© 2026 H33.ai, Inc. All rights reserved.