Production CKKS fully homomorphic encryption on Graviton4 metal (c8g.metal-48xl, 192 vCPUs). Encrypted addition in 0.68 milliseconds. Full multiply pipeline (multiply + relinearization + rescale) in 61 milliseconds. 4,096 slots per ciphertext. How approximate FHE fits into a three-engine stack that handles arithmetic, inference, and decisions — all on encrypted data.
These are measured numbers from a production CKKS implementation running on AWS Graviton4 (c8g.metal-48xl, 192 vCPUs, ARM Neoverse V2). The RNS-native pipeline eliminates the BigInt path entirely — all modular arithmetic runs in native 64-bit residue channels. The scheme is Cheon-Kim-Kim-Song — the approximate arithmetic variant of fully homomorphic encryption designed for real-number computation on encrypted data.
| Operation | Latency | Ops/sec |
|---|---|---|
| Encrypted Addition | 0.68ms | 141,216 |
| Encrypted Multiply (full pipeline: multiply + relin + rescale) | 61ms | 1,574 |
| Encrypt | ~10ms | ~100 |
| Decrypt | <1ms | >1,000 |
Parameters: N=8192, multiplicative depth 4, 128-bit security per the Homomorphic Encryption Standard v1.1. Each ciphertext packs 4,096 complex slots — one operation on a single ciphertext computes across 4,096 independent values simultaneously.
CKKS is the FHE scheme built for real numbers. Unlike BFV (exact integers) and TFHE (encrypted bits), CKKS operates on approximate arithmetic — encrypted floating-point values with bounded precision loss per operation. This makes it the natural choice for machine learning inference, statistical computation, and any workload where the inputs are continuous values rather than discrete integers or Boolean flags.
The core use cases for CKKS in production privacy systems:
The 4,096-slot advantage: Each CKKS ciphertext packs 4,096 independent values into a single encrypted polynomial. One encrypted addition operates on all 4,096 values simultaneously for 0.68ms — an effective per-element cost of 166 nanoseconds. This SIMD-style batching is what makes CKKS practical for vector and matrix operations.
Every CKKS operation introduces a small amount of precision loss. An encrypted addition preserves high precision. An encrypted multiplication preserves working precision but consumes one level of the modulus chain — a finite resource that determines the total multiplicative depth available before the ciphertext must be refreshed via bootstrapping.
With our production parameters, the system supports 4 sequential multiplications before bootstrapping. Each multiplication followed by rescaling restores the working precision. This is sufficient for most ML inference and statistical workloads.
In practice, 4 levels of multiplicative depth is sufficient for most ML inference tasks: a two-layer neural network with polynomial activations, or a degree-4 polynomial approximation of a sigmoid or ReLU function. Deeper networks can be evaluated by interleaving bootstrapping operations at the cost of additional latency.
CKKS encodes multiple values into a single ciphertext using the algebraic structure of the encryption scheme. With our parameters, each ciphertext provides 4,096 independent slots. Each slot holds one value. Operations on the ciphertext — addition, multiplication, rotation — apply to all slots simultaneously.
This means a single encrypted matrix-vector product with a 64-dimensional feature vector can be computed using approximately 64 encrypted rotations and additions, each operating on all 4,096 slots in parallel. The effective per-element throughput is orders of magnitude higher than evaluating one element at a time.
CKKS uses a leveled approach: each multiplication consumes one level of a finite precision budget. After each multiplication, rescaling restores working precision by consuming the next modulus in the chain. After 4 multiplications, the depth budget is exhausted and bootstrapping is required to continue.
This is the fundamental tradeoff of leveled CKKS: more depth levels enable deeper computation but require larger polynomial degrees for the same security level. Our parameters sit at the HE Standard v1.1 boundary for 128-bit security.
CKKS is not a standalone system. In H33's architecture, it is one of three FHE engines, each optimized for a different class of computation. The IQ routing engine automatically selects the appropriate engine based on the operation requested:
| Engine | Best For | Measured Throughput |
|---|---|---|
| BFV (exact integer) | Biometric matching, inner products | 2,209,429 auths/sec |
| CKKS (approximate real) | ML inference, statistics, scoring | 1,574 TPS multiply pipeline · 141,216 TPS add |
| TFHE (Boolean gates) | Comparisons, thresholds, decisions | 768 TPS 8-bit GT, 96 channels |
These engines are not interchangeable. BFV cannot efficiently compute polynomial activations on real-valued features. CKKS cannot efficiently evaluate comparison operations (greater-than, less-than). TFHE can evaluate arbitrary Boolean circuits but operates on individual encrypted bits, not packed vectors. Each engine does what it does best, and the routing layer selects automatically.
A typical encrypted ML inference pipeline flows through multiple engines:
The IQ routing engine manages these transitions automatically. The developer submits an operation; the system determines the engine, manages scheme transitions, and attests the result. No manual engine selection. No scheme-switching code. One API.
Addition is the cheapest CKKS operation. Two ciphertexts are added directly with no additional cryptographic overhead. No rescaling needed. The result has the same level and precision as the inputs.
At 0.68ms per addition with 4,096 slots, the effective per-element addition cost is 166 nanoseconds. At 96-core scale on Graviton4 metal, this yields 141,216 additions per second. Addition does not consume any multiplicative depth.
The full multiply pipeline includes three stages: the polynomial multiplication itself, relinearization (key-switching to reduce the ciphertext back to standard form), and rescaling to restore precision. The 61ms figure measures all three stages end-to-end on Graviton4 metal — this is the honest cost of one complete CKKS multiplication.
Relinearization dominates the pipeline cost. This is characteristic of all CKKS implementations — the mathematical structure requires significant computation to maintain the ciphertext format after multiplication.
With 192 vCPUs processing independent multiplications in parallel on the RNS-native pipeline: 1,574 full multiply operations per second. Each multiplication operates on all 4,096 slots simultaneously, giving an effective throughput of over 6.4 million encrypted element-multiplications per second.
The implementation uses parameters that satisfy the Homomorphic Encryption Standard v1.1 for 128-bit security:
| Parameter | Value | Purpose |
|---|---|---|
| Polynomial degree (N) | 8,192 | Ring dimension for security |
| Modulus chain depth | 5 levels | 4 multiplications before bootstrap |
| Precision | ~12 decimal digits | Per operation working precision |
| Security level | 128-bit | HE Standard v1.1 compliant |
| Multiplicative depth | 4 | Sequential multiplications before bootstrap |
| Slots | 4,096 complex / 8,192 real | SIMD parallel values |
The security guarantee rests on the Ring Learning With Errors (RLWE) problem. No known classical or quantum algorithm can break RLWE at these parameters in polynomial time. The scheme is lattice-based — the same mathematical foundation as the NIST-standardized ML-KEM and ML-DSA post-quantum algorithms.
Every CKKS computation in the H33 stack produces a post-quantum attestation. The computation result, the routing decision that selected CKKS as the engine, and the authorization metadata are committed to a 74-byte H33-74 primitive signed under three independent post-quantum signature families: ML-DSA-65 (lattice-based), FALCON-512 (NTRU-based), and SLH-DSA-SHA2-128f (hash-based).
The attestation cost is negligible: less than 1 millisecond for commitment construction and triple signing. Against a 61ms full multiply pipeline, attestation adds less than 2% overhead. The result is a quantum-resistant proof that the computation was performed correctly, on the claimed data, by an authorized party, using the specified engine and parameters.
CKKS is powerful for continuous computation but has fundamental limitations that the multi-engine architecture addresses:
These limitations are not weaknesses — they are the design boundaries that make CKKS fast. Exact arithmetic, comparison, and unlimited depth would require a different scheme (BFV or TFHE), and those schemes cannot do efficient real-number computation. The three-engine architecture exists precisely because no single FHE scheme does everything well.
The CKKS implementation runs on the same Graviton4 infrastructure as the rest of the H33 FHE stack:
The CKKS context initialization (key generation and precomputation) takes approximately 4 seconds. This is a one-time startup cost, amortized across all subsequent operations. Key material persists for the session lifetime.
Most FHE deployments use a single scheme and accept its limitations. H33 runs three schemes simultaneously with automatic routing between them. The practical consequence: an application can encrypt a feature vector, run ML inference (CKKS), quantize the result to an integer (scheme transition), compare against a threshold (TFHE), and attest the entire chain (H33-74) — in a single API call.
No other production system provides this. Individual FHE libraries (SEAL, OpenFHE, Lattigo) implement one or two schemes but leave routing, scheme transitions, and attestation to the application developer. H33 integrates these at the infrastructure level, so the developer writes "compare score against threshold" and the system handles CKKS → BFV → TFHE → attestation automatically.
The complete stack, measured: BFV at 2,209,429 auths/sec for exact matching. CKKS at 141,216 adds/sec and 1,574 TPS full multiply pipeline for approximate computation. TFHE at 768 TPS (8-bit GT, 96 channels) for encrypted decisions. All post-quantum attested. All on the same Graviton4 metal instance. All measured, not projected.
CKKS scales when it runs alongside engines that cover its blind spots. Approximate arithmetic is powerful for ML inference, statistical aggregation, and risk scoring — but a production privacy system also needs exact matching and encrypted decisions. The three-engine architecture delivers all three, with automatic routing and post-quantum attestation on every result.
The numbers are real. The stack is production. The attestation is quantum-resistant.