Table of Contents
We Had 5 Blind Spots. Now We Have Zero.
H33 runs 4 FHE engines—BFV-128, BFV-256, CKKS, and BFV-32—with automatic routing via FHE-IQ. That covers 95% of encrypted computation workloads. Polynomial arithmetic on integers, approximate arithmetic on floats, SIMD batching for throughput, high-security parameters for sovereign data. The engine selection is automatic: you describe the computation, FHE-IQ picks the scheme, parameters, and optimization path.
But there were 5 things we couldn't do—or couldn't do well. Five gaps where a customer could reasonably say "I need this, and H33 can't deliver it." We fixed all of them in one sprint.
Real-time FHE
Encrypted ML
Logic Gates
Multi-Party
Unlimited Depth
Here's what we built and why each one matters.
P1: Streaming Mode
Adaptive Batch Sizer for Homomorphic Encryption Streaming
The problem: BFV batches 32 users per ciphertext for maximum throughput. Our SIMD batching packs 4,096 slots into a single ciphertext—128 dimensions per user, 32 users per batch. That's the architecture that delivers 2.17 million authentications per second on a single Graviton4 instance. But batching means waiting. You can't process user 1 until users 2 through 32 arrive. For real-time use cases—live video analysis, sensor streams, real-time fraud scoring—that batching delay is unacceptable.
What we built: An adaptive batch sizer that trades throughput for latency dynamically. In streaming mode, batch size drops to 1—process immediately on arrival. As arrival rate increases, the batcher scales up to 32 automatically. The algorithm monitors the incoming request queue depth and adjusts the batch window in real time. One API flag: streaming: true.
The trade-off is explicit and measurable:
| Mode | Batch Size | Latency | Throughput |
|---|---|---|---|
| Standard | 32 | 2ms + 939µs | 2.17M/sec |
| Streaming | 1 | 0ms + 939µs | ~67K/sec |
| Adaptive | 1–32 | 0–2ms + 939µs | 67K–2.17M/sec |
The FHE cost (~939µs per batch) is constant regardless of how many users are packed into the ciphertext. Streaming mode pays the same computational cost for fewer results per batch. But for a live fraud scoring pipeline that needs answers in under 1ms, that's the right trade. The adaptive mode is the sweet spot for most production deployments: it starts in streaming mode with single-user processing, detects when arrival rates are high enough to justify batching, and scales the batch window up automatically. No configuration tuning required.
This is the first production implementation of real-time homomorphic encryption streaming that we're aware of. Every other FHE system we've tested assumes batch processing. The gap between "encrypted computation" and "real-time encrypted computation" was an industry-wide blind spot.
P2: CKKS Activation Functions
Chebyshev Polynomial Approximations for FHE Activation Functions in ML
The problem: Neural network inference on encrypted data requires activation functions—sigmoid, ReLU, tanh, GELU. These are non-polynomial functions. CKKS can only compute polynomials natively. The gap: we could encrypt data, multiply encrypted matrices, and accumulate encrypted sums, but we couldn't run a single neural network layer because the activation step was impossible.
What we built: A Chebyshev polynomial approximation library that computes any activation function on CKKS ciphertexts. Configurable degree (7 to 31) trades accuracy for multiplicative depth. Auto-degree selection from the declared input range—H33-Compile uses this at compilation time to pick the minimum degree that satisfies the target accuracy.
The key innovation is encrypted input clamping. Before applying the polynomial approximation, we clamp the encrypted value to the valid approximation range. This costs one encrypted comparison circuit (depth +1) but guarantees the Chebyshev polynomial never receives out-of-range inputs. Without clamping, a single outlier value produces a wildly wrong polynomial evaluation—and because the data is encrypted, you can't inspect it to catch the error. Silent wrong answers from outlier data are worse than no answer at all.
Supported activation functions: sigmoid, ReLU (piecewise approximation), tanh, GELU (for transformer architectures), softmax (component-wise), and custom functions via user-provided Chebyshev coefficients. Evaluation uses Clenshaw's algorithm for numerical stability—the recurrence relation avoids catastrophic cancellation that direct polynomial evaluation would cause at high degrees.
With this module, H33-CKKS can run encrypted inference on any feedforward, convolutional, or transformer neural network. The depth cost scales with the number of layers times the activation degree, and BFV bootstrapping (P5 below) removes the depth ceiling entirely when needed. FHE activation functions for ML inference are no longer a research problem—they're a production capability.
P3: Boolean Evaluator
Boolean Gates on BFV Encrypted Bits with Comparison Circuits
The problem: BFV does polynomial arithmetic—add, multiply, rotate. It doesn't natively do boolean logic: "if bit 3 is set AND bit 7 is clear." Compliance rules are boolean: "Is this person on the sanctions list AND the transfer exceeds $10K AND the destination is a restricted country?" We could do the encrypted math but couldn't express the encrypted logic.
What we built: A complete boolean gate library operating on BFV ciphertexts encrypting {0, 1} values, plus encrypted comparison circuits for integer operands, plus pre-compiled compliance templates.
The four fundamental gates, expressed as polynomial operations on encrypted bits:
Multi-input gates use balanced binary trees: AND(a, b, c, d, ..., n) evaluates at depth log₂(n) instead of linear n. This matters for compliance chains where you're evaluating 10 or 20 conditions simultaneously—a linear chain would blow through your noise budget, while the tree keeps depth manageable.
Encrypted comparison circuits extend the boolean gates to integer operands: equality check, greater-than, less-than, range check (is x in [min, max]?), list membership (is x in the set?), and threshold gates (do at least k of n conditions hold?). These use bit decomposition to extract individual encrypted bits from an encrypted integer using modular inverse (2^i mod t). Each bit becomes its own ciphertext that the boolean gates operate on.
Pre-compiled compliance templates combine these primitives into ready-to-use encrypted decision circuits: range check templates for AML transaction thresholds, sanctions list membership for OFAC screening, and multi-condition chains for composite compliance rules like (A AND B) OR (C AND D). The templates are parameterized—you provide the encrypted data and the plaintext thresholds, and the circuit evaluates entirely on encrypted inputs.
P4: Proxy Re-Encryption
Proxy Re-Encryption for Multi-Party FHE Computation
The problem: BFV operates on data encrypted under one key. Two banks sharing encrypted fraud signals can't compute on each other's ciphertexts because they use different keys. We had H33-MPC for threshold signing but no way to transform ciphertexts between keys without decrypting. The data had to be decrypted, transferred, and re-encrypted—defeating the entire purpose of homomorphic encryption in multi-party scenarios.
What we built: Proxy re-encryption that transforms EncA(m) into EncB(m) without any party seeing m. The algebra is identical to our existing Galois key-switching—same digit decomposition, same polynomial multiplication—but targeting a different party's key instead of a rotated version of the same key.
The protocol works in three steps. First, Party A and Party B jointly generate a re-encryption key rkA→B. Neither party shares their secret key during this process—the re-encryption key is constructed from public components using a two-round protocol. Second, a proxy (H33's server, or any untrusted intermediary) applies rkA→B to transform ciphertexts encrypted under A's key into ciphertexts encrypted under B's key. Third, Party B decrypts with their own secret key and gets the original plaintext.
The proxy never sees plaintext at any point in this process. The re-encryption key rkA→B only enables the specific transformation from A's key to B's key—it cannot be used to decrypt, and it cannot be reversed to derive either party's secret key. SHA3-256 integrity verification on all key material ensures tampering is detected.
This unblocks two major product capabilities. H33-Share enables cross-bank encrypted intelligence sharing—multiple financial institutions can compute on each other's encrypted fraud data without any institution exposing raw customer records. Multi-institution tokenization allows multiple custodians to operate on the same encrypted asset data under different keys, with proxy re-encryption handling the key translation transparently.
P5: BFV Bootstrapping
BFV Bootstrapping for Unlimited Depth Production Circuits
The problem: Every BFV multiplication adds noise to the ciphertext. After approximately 15 multiplications, the accumulated noise exceeds the decryption threshold and the ciphertext becomes undecryptable. Our production pipeline stays at depth 2–3 (biometric matching requires one inner product), but deep circuits—10-layer neural networks, recursive algorithms, 20-stage compliance chains—hit the ceiling. Without bootstrapping, BFV is a leveled scheme: powerful but depth-limited.
What we built: The Ducas-Micciancio bootstrapping approach adapted for our BFV parameter sets, with automatic noise monitoring and on-demand refresh.
The bootstrapping procedure has three phases:
- Bootstrapping key generation: Encrypt the secret key bits under the public key. This produces a bootstrapping key of approximately 500MB, computed once at setup time and reused for all subsequent bootstrapping operations.
- Homomorphic decryption: When the noise budget drops below a configurable threshold, homomorphically evaluate the BFV decryption circuit using the bootstrapping key. This is the expensive step—it's computing decryption on encrypted data, which requires evaluating modular arithmetic homomorphically.
- Output: A fresh ciphertext encrypting the same plaintext with a fully restored noise budget. The computation can continue with the same depth capacity as a freshly encrypted value.
The bootstrapper includes auto-bootstrap: a noise monitor tracks the remaining noise budget after each operation and triggers bootstrapping only when the budget approaches the floor. Shallow circuits—which represent 99% of our production workload—never pay the bootstrapping cost. Deep circuits bootstrap automatically and transparently.
Cost: approximately 500ms to 1 second per bootstrap operation. That's expensive compared to our 939µs batch cost. But it removes the depth ceiling entirely. Any computation that's expressible in plaintext is now expressible on encrypted data, regardless of multiplicative depth. BFV bootstrapping in production transforms a leveled FHE scheme into a fully homomorphic one—the "fully" in Fully Homomorphic Encryption was previously aspirational for BFV. Now it's operational.
Why This Matters
Before This Sprint
- Zama's TFHE could compute arbitrary boolean functions on encrypted bits. We couldn't.
- Zama's TFHE could run unlimited-depth circuits via programmable bootstrapping. We couldn't.
- Nobody had streaming FHE at production scale. Nobody.
- Encrypted neural network inference required external libraries. We didn't ship one.
- Multi-party FHE required decryption round-trips. We had no proxy re-encryption.
After This Sprint
- Boolean gates on encrypted bits with log₂(n) depth tree reduction.
- Unlimited-depth circuits via BFV bootstrapping with auto-refresh.
- Streaming mode with adaptive batch sizing from 1 to 32.
- Chebyshev activation functions for encrypted neural network inference.
- Proxy re-encryption for multi-party encrypted computation.
The difference between H33 and TFHE is no longer about feature coverage. Both systems can now compute arbitrary boolean functions on encrypted data. Both can run circuits of unlimited depth. The difference is performance and breadth.
TFHE bootstraps on every gate—that's how it achieves arbitrary computation. Each boolean gate takes approximately 13ms on their fastest implementation. A 64-bit addition (which chains ~192 gates) takes 124ms. H33 does polynomial arithmetic natively without bootstrapping, reserves bootstrapping for when it's actually needed, and runs 4 different FHE schemes with automatic routing. Our per-authentication cost is 38.5µs. Zama's equivalent operation takes 3,000x longer.
H33 is the only platform that combines all of these capabilities under one API:
- 4 FHE engines (BFV-128, BFV-256, CKKS, BFV-32) with FHE-IQ automatic routing
- FHE boolean gates on encrypted bits with encrypted comparison circuits
- BFV bootstrapping for unlimited-depth circuits
- Homomorphic encryption streaming with adaptive batch sizing
- CKKS activation functions for encrypted ML inference
- Proxy re-encryption for multi-party FHE computation
- 2.17M authentications per second on a single CPU instance
- Post-quantum security across the entire stack (Dilithium attestation, Kyber key exchange, STARK proofs)
Universal AND fast. That's the combination nobody else has.
Links
Build Encrypted Applications With Zero Blind Spots
Get your API key and access all 5 new FHE modules. Boolean gates, bootstrapping, proxy re-encryption, streaming, and ML activations—all through one API.
Get Free API Key