← Blog
April 28, 2026 · Engineering

Encrypted Machine Learning: CKKS vs TFHE for Inference

ML inference requires seeing the data. Medical diagnosis, credit scoring, fraud detection — the model needs the input. But the input is sensitive. Two FHE schemes divide the work: CKKS handles the forward pass, TFHE handles the decision. Here is how they fit together, measured on production hardware.


The Privacy Problem in ML

A hospital sends a chest X-ray to a cloud-hosted diagnostic model. A bank submits a loan application to an underwriting algorithm. A fraud detection system ingests a transaction stream. In every case, the model must see the input to produce an output. The input is sensitive. The model operator is a third party. The data subject has no technical guarantee that the input is not logged, leaked, or repurposed.

Regulation tries to solve this with contracts and policies. HIPAA requires a BAA. GDPR requires a legal basis. CCPA requires disclosure. But contracts are not cryptography. A contract says "you must not look at the data." Fully homomorphic encryption says "you cannot look at the data." The model computes on ciphertext. The input is never decrypted during inference. The output is encrypted under the data owner's key. The model operator sees nothing — not the input, not the output, not the intermediate activations.

This is not theoretical. Encrypted inference is running in production today. The question is not whether it is possible, but what it costs and which FHE scheme handles which part of the pipeline.

Two Schemes, Two Jobs

Fully homomorphic encryption is not a single algorithm. It is a family of schemes, each optimized for different operations on encrypted data. For ML inference, two schemes matter:

CKKS (Cheon-Kim-Kim-Song) is designed for approximate arithmetic on real numbers. It encodes floating-point vectors into polynomial ciphertexts and supports addition, multiplication, and rotation. This is the scheme that handles matrix-vector products, dot products, and polynomial activation functions — the linear algebra that constitutes a neural network forward pass.

TFHE (Torus Fully Homomorphic Encryption) operates on encrypted bits. It supports Boolean gates — AND, OR, NOT — and by composing them, arbitrary Boolean circuits. This gives it something CKKS fundamentally cannot do: non-polynomial operations. Greater-than comparisons. Equality checks. Threshold decisions. The comparison "is this score above 0.7?" is non-polynomial. CKKS cannot evaluate it. TFHE can.

The division is clean. CKKS computes the forward pass: weighted sums, activations, scoring. TFHE makes the decision: "is the encrypted risk score above the encrypted threshold?" Together, they cover the full inference pipeline from input to classification.

This is not a limitation — it is architecture. Each scheme does what it does best. The system routes operations to the correct engine automatically.

CKKS for the Forward Pass

A neural network forward pass is matrix-vector multiplication interleaved with activation functions. The input is a feature vector. The weights are matrices. Each layer multiplies the weight matrix by the input vector, applies an activation function, and passes the result to the next layer.

In encrypted inference, the feature vector is a CKKS ciphertext. The weight matrix is plaintext — the model owner provides the weights in the clear. The encrypted input is multiplied by plaintext weights to produce an encrypted output. The model never sees the input. The data owner never sees the weights (in the common split-inference setting).

The operations decompose as follows:

1.56s
Encrypted dense layer (64 inputs → 4 outputs) · Graviton4 c8g.metal-48xl · N=8192 · 128-bit security

A 64-dimensional encrypted dot product — the core primitive of the forward pass — completes in 333ms. A full dense layer with 64 inputs and 4 outputs completes in 1,555ms. These are measured numbers on production cloud hardware, not projections from isolated primitive benchmarks.

TFHE for the Decision

The forward pass produces an encrypted score. A risk score. A probability. A confidence value. Now the system needs to make a decision: "Is this score above 0.7?" or "Does this patient's predicted risk exceed the treatment threshold?"

CKKS cannot answer this question. Comparison is a non-polynomial operation. You cannot express "greater than" as a polynomial over encrypted data — it requires evaluating a step function, which has infinite polynomial degree. This is a fundamental mathematical limitation, not an implementation gap.

TFHE operates on encrypted bits and evaluates Boolean circuits. A greater-than comparison on two 8-bit encrypted values decomposes into a sequence of Boolean gates operating bit-by-bit from the most significant bit downward. The result is a single encrypted bit: 1 if the score exceeds the threshold, 0 if it does not.

The score stays encrypted throughout. The threshold stays encrypted. The comparison result is an encrypted bit. No party sees any plaintext value at any point in the pipeline.

768 TPS
TFHE 8-bit greater-than comparison · Graviton4 c8g.metal-48xl · 96 channels
OperationBit WidthThroughput
Greater-than8-bit768 TPS
Greater-than16-bit372 TPS
Greater-than32-bit182 TPS
Greater-than64-bit91 TPS
Equality16-bit769 TPS

For most ML classification decisions, 8-bit precision is sufficient. A risk score quantized to 256 levels provides more than enough resolution for a binary classification threshold. At 768 TPS, the decision layer adds approximately 1.3ms per inference — negligible compared to the CKKS forward pass.

The Complete Pipeline

A complete encrypted ML inference pipeline combines both schemes with an attestation layer. The flow looks like this:

  1. CKKS forward pass. Encrypted feature vector enters. Weight matrices applied. Polynomial activations evaluated. Output: encrypted score vector. (~1.56s for a 64→4 dense layer)
  2. Scheme transition. The encrypted CKKS score is converted to an encrypted TFHE representation. The score is quantized to the required bit width and re-encrypted under TFHE parameters. (~10ms)
  3. TFHE threshold decision. The encrypted score is compared against an encrypted threshold. Output: encrypted classification bit. (~1.3ms for 8-bit)
  4. H33-74 attestation. The entire computation — inputs, routing decisions, scheme transitions, output — is committed to a 74-byte post-quantum attestation (ML-DSA + FALCON + SLH-DSA). Permanently verifiable.

The FHE-IQ routing engine manages this automatically. The developer submits a workload — "run this model on this encrypted input and compare the output against this threshold" — and the system handles engine selection, scheme transitions, and attestation. One API call. The developer does not need to know which FHE scheme handles which operation.

Total pipeline latency for a single-layer encrypted inference with threshold decision: approximately 1.6 seconds. For a two-layer network (64→4→1) with polynomial activations and a final TFHE threshold: approximately 3 seconds. On production cloud hardware you can deploy today.

Depth Budget and Bootstrapping

Every CKKS ciphertext carries a chain of moduli. Each multiplication consumes one modulus (via rescale). When the chain is exhausted, the ciphertext can no longer support multiplication. This is the depth budget — the number of multiplicative levels available before bootstrapping is required.

At our production parameters (N=8192, 128-bit security), the depth budget is approximately 4 multiplicative levels. One dense layer with a degree-2 polynomial activation consumes 2 levels: one for the matrix-vector product and one for the activation. A two-layer network consumes all 4 levels.

For deeper networks, three options:

ApproachDepthTradeoff
N=8192 + bootstrapUnlimited~100ms per refresh. Bootstrap after every 4 levels.
N=16384 (larger params)~9 levels2x slower per operation. 4-5 layer networks without bootstrap.
N=32768 (largest practical)~15 levels4x slower per operation. 7-8 layer networks without bootstrap.

The depth-throughput tradeoff is the central design decision in encrypted ML. Larger parameters give more levels but slower operations. Bootstrapping gives unlimited depth but adds latency at each refresh point. The optimal choice depends on the network architecture. A shallow classifier (2-3 layers) runs best at N=8192 with no bootstrapping. A deeper network benefits from N=16384 to avoid frequent bootstrap cycles.

This is why the FHE-IQ router exists. It selects parameters based on the workload, not a static configuration. Submit a 2-layer model and it selects N=8192. Submit a 6-layer model and it selects N=16384. The developer specifies the computation; the system manages the cryptographic parameters.

What About Training?

This question comes up in every conversation about encrypted ML. The answer is straightforward: FHE inference is production-ready. FHE training is not practical at scale.

The reason is depth. A single forward pass through a 3-layer network requires 6 multiplicative levels. Backpropagation through the same network requires computing gradients at every layer, which roughly doubles the depth requirement to 12+ levels. A full training epoch over a dataset repeats this thousands of times. The depth accumulates multiplicatively with the number of gradient updates.

Even with bootstrapping, the overhead is prohibitive. Each bootstrap adds ~100ms. Training a small network for 1,000 epochs with bootstrapping every 4 levels would add approximately 25,000 bootstraps — 42 minutes of pure bootstrapping overhead, plus the actual computation. For any network of practical size, FHE training takes days where plaintext training takes minutes.

The practical pattern is clear and widely adopted:

This is not a workaround. It is the correct architecture. The threat model for most applications is: "the model operator should not see user inputs during inference." FHE inference solves this exactly. FHE training solves a different problem (hiding training data from the training infrastructure) that has better solutions in secure enclaves and federated learning.

Performance Reality

Every number in this post was measured on AWS Graviton4 c8g.metal-48xl (192 vCPUs, 96 physical Neoverse V2 cores). All CKKS operations use N=8192, 128-bit security, RNS-native representation with Montgomery NTT. All TFHE operations use 96-channel parallelism.

CKKS Operations

OperationLatencyPer-Core TPS96-Core TPS
Add0.68ms1,471141,216
Multiply pipeline61ms16.41,574
Polynomial eval (x²)133ms7.5720
Slot sum (64 slots)293ms3.4327
Dot product (64-dim)333ms3.0288
Dense layer (64→4)1,555ms0.6461

TFHE Operations

OperationBit WidthThroughput (96-ch)
Greater-than8-bit768 TPS
Greater-than16-bit372 TPS
Greater-than32-bit182 TPS
Greater-than64-bit91 TPS
Equality16-bit769 TPS

All numbers measured, not projected. All correctness-verified against plaintext computation. All results post-quantum attested via H33-74. The full benchmarks page publishes these numbers with methodology details and correctness bounds.

The Right Scheme for the Right Job

The FHE industry sometimes frames CKKS and TFHE as competitors. They are not. They solve different problems. Comparing them is like comparing matrix multiplication to a conditional branch — they are different operations that appear in the same program.

For encrypted ML inference, the architecture is:

A single encrypted ML inference call touches three FHE engines and an attestation layer. The developer sees one API call. The system sees a computation graph with typed edges, and routes each node to the engine that handles it best.

This is not complexity for its own sake. It is the minimum architecture required to run a neural network on encrypted data and produce a verifiable, post-quantum-attested classification decision without ever decrypting the input. Every component exists because the math requires it.


What Comes Next

The current pipeline handles single-layer and shallow multi-layer networks at interactive latencies. The next phase targets three improvements: fused CKKS-to-TFHE scheme switching that eliminates the explicit transition step, batched inference across multiple inputs sharing the same model (amortizing key-switch costs across a batch of 32+ inputs), and an operation planner that minimizes total key-switches across the computation graph.

The goal is a sub-second encrypted inference for a two-layer classifier on production cloud hardware. The math says it is achievable. The engineering is underway.


Eric Beans
CEO, H33.ai, Inc.
Patent pending. U.S. Patent Application Nos. 19/309,560 and 19/645,499. Additional applications pending.
All benchmarks measured on AWS c8g.metal-48xl (Graviton4, 192 vCPUs, Neoverse V2), April 2026. Rust 1.94.0.
All NIST security tests passed: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), FIPS 205 (SLH-DSA). FIPS 140-3 KATs operational. 20,000+ tests across the platform.
H33-74 is a trademark of H33.ai, Inc. AWS and Graviton4 are trademarks of Amazon Web Services, Inc.