Can machine learning run on encrypted data?

Yes. Fully homomorphic encryption (FHE) allows ML models to process encrypted inputs and produce encrypted outputs without ever decrypting the data. CKKS handles the linear algebra of neural network forward passes (matrix-vector products, polynomial activations), while TFHE handles encrypted Boolean decisions (threshold comparisons, classification). H33 measures a 64-input, 4-output encrypted dense layer at 1.56 seconds on AWS Graviton4.

What is the difference between CKKS and TFHE for machine learning?

CKKS is designed for approximate arithmetic on real numbers — matrix-vector products, dot products, polynomial activation functions. It handles the forward pass of a neural network. TFHE operates on encrypted bits and supports non-polynomial operations like greater-than comparisons and equality checks. It handles the decision layer. A complete encrypted ML pipeline typically uses CKKS for inference and TFHE for the final threshold decision.

How fast is encrypted ML inference in production?

On AWS Graviton4 (c8g.metal-48xl), H33 measures a 64-dimensional encrypted dot product at 333ms, a full encrypted dense layer (64 inputs, 4 outputs) at 1.56 seconds, and TFHE 8-bit comparison at 768 TPS. A two-layer encrypted neural network with polynomial activations and a threshold decision executes in approximately 3 seconds. These are measured production numbers, not projections.

Can you train a neural network on encrypted data?

FHE training is not practical at scale. Backpropagation requires extreme multiplicative depth that exceeds what current FHE schemes can support efficiently. The practical pattern is to train on plaintext (with differential privacy if needed) and deploy for encrypted inference. The privacy guarantee protects the user's data during inference, not the training data.

What is the depth budget in encrypted ML and why does it matter?

Every CKKS ciphertext has a limited number of multiplicative levels before the noise overwhelms the signal. At N=8192, you get approximately 4 levels — enough for one dense layer plus activation. Deeper networks require bootstrapping (~100ms per refresh) or larger parameters (N=16384 for 9 levels, N=32768 for 15 levels). The tradeoff is depth versus throughput: larger parameters give more levels but slower individual operations.

← Blog

April 28, 2026 · Engineering

Encrypted Machine Learning: CKKS vs TFHE for Inference

ML inference requires seeing the data. Medical diagnosis, credit scoring, fraud detection — the model needs the input. But the input is sensitive. Two FHE schemes divide the work: CKKS handles the forward pass, TFHE handles the decision. Here is how they fit together, measured on production hardware.

The Privacy Problem in ML

A hospital sends a chest X-ray to a cloud-hosted diagnostic model. A bank submits a loan application to an underwriting algorithm. A fraud detection system ingests a transaction stream. In every case, the model must see the input to produce an output. The input is sensitive. The model operator is a third party. The data subject has no technical guarantee that the input is not logged, leaked, or repurposed.

Regulation tries to solve this with contracts and policies. HIPAA requires a BAA. GDPR requires a legal basis. CCPA requires disclosure. But contracts are not cryptography. A contract says "you must not look at the data." Fully homomorphic encryption says "you cannot look at the data." The model computes on ciphertext. The input is never decrypted during inference. The output is encrypted under the data owner's key. The model operator sees nothing — not the input, not the output, not the intermediate activations.

This is not theoretical. Encrypted inference is running in production today. The question is not whether it is possible, but what it costs and which FHE scheme handles which part of the pipeline.

Two Schemes, Two Jobs

Fully homomorphic encryption is not a single algorithm. It is a family of schemes, each optimized for different operations on encrypted data. For ML inference, two schemes matter:

CKKS (Cheon-Kim-Kim-Song) is designed for approximate arithmetic on real numbers. It encodes floating-point vectors into polynomial ciphertexts and supports addition, multiplication, and rotation. This is the scheme that handles matrix-vector products, dot products, and polynomial activation functions — the linear algebra that constitutes a neural network forward pass.

TFHE (Torus Fully Homomorphic Encryption) operates on encrypted bits. It supports Boolean gates — AND, OR, NOT — and by composing them, arbitrary Boolean circuits. This gives it something CKKS fundamentally cannot do: non-polynomial operations. Greater-than comparisons. Equality checks. Threshold decisions. The comparison "is this score above 0.7?" is non-polynomial. CKKS cannot evaluate it. TFHE can.

The division is clean. CKKS computes the forward pass: weighted sums, activations, scoring. TFHE makes the decision: "is the encrypted risk score above the encrypted threshold?" Together, they cover the full inference pipeline from input to classification.

This is not a limitation — it is architecture. Each scheme does what it does best. The system routes operations to the correct engine automatically.

CKKS for the Forward Pass

A neural network forward pass is matrix-vector multiplication interleaved with activation functions. The input is a feature vector. The weights are matrices. Each layer multiplies the weight matrix by the input vector, applies an activation function, and passes the result to the next layer.

In encrypted inference, the feature vector is a CKKS ciphertext. The weight matrix is plaintext — the model owner provides the weights in the clear. The encrypted input is multiplied by plaintext weights to produce an encrypted output. The model never sees the input. The data owner never sees the weights (in the common split-inference setting).

The operations decompose as follows:

Plaintext-ciphertext multiply. Each weight is multiplied into the encrypted input vector. Each CKKS ciphertext at N=8192 packs 4,096 SIMD slots, so one operation processes 4,096 values simultaneously. This is cheaper than ciphertext-ciphertext multiply because one operand is in the clear.
Accumulation. The products are summed via encrypted addition (0.68ms per add). Addition in CKKS is fast — it does not consume multiplicative depth.
Rotation and slot sum. The partial sums across SIMD slots are reduced to a single value via a binary rotation tree (293ms for 64 slots). This uses Galois automorphisms with key-switching.
Polynomial activation. ReLU and sigmoid are non-polynomial, so CKKS approximates them with Chebyshev polynomials. A degree-2 approximation (one multiplication level) costs 133ms. A degree-4 approximation provides better accuracy at 2 levels.

1.56s

Encrypted dense layer (64 inputs → 4 outputs) · Graviton4 c8g.metal-48xl · N=8192 · 128-bit security

A 64-dimensional encrypted dot product — the core primitive of the forward pass — completes in 333ms. A full dense layer with 64 inputs and 4 outputs completes in 1,555ms. These are measured numbers on production cloud hardware, not projections from isolated primitive benchmarks.

TFHE for the Decision

The forward pass produces an encrypted score. A risk score. A probability. A confidence value. Now the system needs to make a decision: "Is this score above 0.7?" or "Does this patient's predicted risk exceed the treatment threshold?"

CKKS cannot answer this question. Comparison is a non-polynomial operation. You cannot express "greater than" as a polynomial over encrypted data — it requires evaluating a step function, which has infinite polynomial degree. This is a fundamental mathematical limitation, not an implementation gap.

TFHE operates on encrypted bits and evaluates Boolean circuits. A greater-than comparison on two 8-bit encrypted values decomposes into a sequence of Boolean gates operating bit-by-bit from the most significant bit downward. The result is a single encrypted bit: 1 if the score exceeds the threshold, 0 if it does not.

The score stays encrypted throughout. The threshold stays encrypted. The comparison result is an encrypted bit. No party sees any plaintext value at any point in the pipeline.

768 TPS

TFHE 8-bit greater-than comparison · Graviton4 c8g.metal-48xl · 96 channels

Operation	Bit Width	Throughput
Greater-than	8-bit	768 TPS
Greater-than	16-bit	372 TPS
Greater-than	32-bit	182 TPS
Greater-than	64-bit	91 TPS
Equality	16-bit	769 TPS

For most ML classification decisions, 8-bit precision is sufficient. A risk score quantized to 256 levels provides more than enough resolution for a binary classification threshold. At 768 TPS, the decision layer adds approximately 1.3ms per inference — negligible compared to the CKKS forward pass.

The Complete Pipeline

A complete encrypted ML inference pipeline combines both schemes with an attestation layer. The flow looks like this:

CKKS forward pass. Encrypted feature vector enters. Weight matrices applied. Polynomial activations evaluated. Output: encrypted score vector. (~1.56s for a 64→4 dense layer)
Scheme transition. The encrypted CKKS score is converted to an encrypted TFHE representation. The score is quantized to the required bit width and re-encrypted under TFHE parameters. (~10ms)
TFHE threshold decision. The encrypted score is compared against an encrypted threshold. Output: encrypted classification bit. (~1.3ms for 8-bit)
H33-74 attestation. The entire computation — inputs, routing decisions, scheme transitions, output — is committed to a 74-byte post-quantum attestation (ML-DSA + FALCON + SLH-DSA). Permanently verifiable.

The FHE-IQ routing engine manages this automatically. The developer submits a workload — "run this model on this encrypted input and compare the output against this threshold" — and the system handles engine selection, scheme transitions, and attestation. One API call. The developer does not need to know which FHE scheme handles which operation.

Total pipeline latency for a single-layer encrypted inference with threshold decision: approximately 1.6 seconds. For a two-layer network (64→4→1) with polynomial activations and a final TFHE threshold: approximately 3 seconds. On production cloud hardware you can deploy today.

Depth Budget and Bootstrapping

Every CKKS ciphertext carries a chain of moduli. Each multiplication consumes one modulus (via rescale). When the chain is exhausted, the ciphertext can no longer support multiplication. This is the depth budget — the number of multiplicative levels available before bootstrapping is required.

At our production parameters (N=8192, 128-bit security), the depth budget is approximately 4 multiplicative levels. One dense layer with a degree-2 polynomial activation consumes 2 levels: one for the matrix-vector product and one for the activation. A two-layer network consumes all 4 levels.

For deeper networks, three options:

Approach	Depth	Tradeoff
N=8192 + bootstrap	Unlimited	~100ms per refresh. Bootstrap after every 4 levels.
N=16384 (larger params)	~9 levels	2x slower per operation. 4-5 layer networks without bootstrap.
N=32768 (largest practical)	~15 levels	4x slower per operation. 7-8 layer networks without bootstrap.

The depth-throughput tradeoff is the central design decision in encrypted ML. Larger parameters give more levels but slower operations. Bootstrapping gives unlimited depth but adds latency at each refresh point. The optimal choice depends on the network architecture. A shallow classifier (2-3 layers) runs best at N=8192 with no bootstrapping. A deeper network benefits from N=16384 to avoid frequent bootstrap cycles.

This is why the FHE-IQ router exists. It selects parameters based on the workload, not a static configuration. Submit a 2-layer model and it selects N=8192. Submit a 6-layer model and it selects N=16384. The developer specifies the computation; the system manages the cryptographic parameters.

What About Training?

This question comes up in every conversation about encrypted ML. The answer is straightforward: FHE inference is production-ready. FHE training is not practical at scale.

The reason is depth. A single forward pass through a 3-layer network requires 6 multiplicative levels. Backpropagation through the same network requires computing gradients at every layer, which roughly doubles the depth requirement to 12+ levels. A full training epoch over a dataset repeats this thousands of times. The depth accumulates multiplicatively with the number of gradient updates.

Even with bootstrapping, the overhead is prohibitive. Each bootstrap adds ~100ms. Training a small network for 1,000 epochs with bootstrapping every 4 levels would add approximately 25,000 bootstraps — 42 minutes of pure bootstrapping overhead, plus the actual computation. For any network of practical size, FHE training takes days where plaintext training takes minutes.

The practical pattern is clear and widely adopted:

Train on plaintext. Use standard ML frameworks. Apply differential privacy during training if the training data itself is sensitive. The model weights are not the privacy-sensitive component in most deployments.
Deploy for encrypted inference. The trained model processes encrypted user inputs. The user's data is never exposed to the model operator. This is where the privacy guarantee matters — at inference time, when real user data is being processed.

This is not a workaround. It is the correct architecture. The threat model for most applications is: "the model operator should not see user inputs during inference." FHE inference solves this exactly. FHE training solves a different problem (hiding training data from the training infrastructure) that has better solutions in secure enclaves and federated learning.

Performance Reality

Every number in this post was measured on AWS Graviton4 c8g.metal-48xl (192 vCPUs, 96 physical Neoverse V2 cores). All CKKS operations use N=8192, 128-bit security, RNS-native representation with Montgomery NTT. All TFHE operations use 96-channel parallelism.

CKKS Operations

Operation	Latency	Per-Core TPS	96-Core TPS
Add	0.68ms	1,471	141,216
Multiply pipeline	61ms	16.4	1,574
Polynomial eval (x²)	133ms	7.5	720
Slot sum (64 slots)	293ms	3.4	327
Dot product (64-dim)	333ms	3.0	288
Dense layer (64→4)	1,555ms	0.64	61

TFHE Operations

Operation	Bit Width	Throughput (96-ch)
Greater-than	8-bit	768 TPS
Greater-than	16-bit	372 TPS
Greater-than	32-bit	182 TPS
Greater-than	64-bit	91 TPS
Equality	16-bit	769 TPS

All numbers measured, not projected. All correctness-verified against plaintext computation. All results post-quantum attested via H33-74. The full benchmarks page publishes these numbers with methodology details and correctness bounds.

The Right Scheme for the Right Job

The FHE industry sometimes frames CKKS and TFHE as competitors. They are not. They solve different problems. Comparing them is like comparing matrix multiplication to a conditional branch — they are different operations that appear in the same program.

For encrypted ML inference, the architecture is:

CKKS for everything that is linear algebra: weighted sums, dot products, matrix-vector products, polynomial activations. The forward pass.
TFHE for everything that is a decision: threshold comparisons, equality checks, classification boundaries. The decision layer.
BFV for everything that is exact integer arithmetic: quantization, index lookup, biometric template matching. The precision layer.
H33-74 for attestation: every computation, regardless of engine, produces a 74-byte post-quantum proof that the result is authentic and unmodified.

A single encrypted ML inference call touches three FHE engines and an attestation layer. The developer sees one API call. The system sees a computation graph with typed edges, and routes each node to the engine that handles it best.

This is not complexity for its own sake. It is the minimum architecture required to run a neural network on encrypted data and produce a verifiable, post-quantum-attested classification decision without ever decrypting the input. Every component exists because the math requires it.

What Comes Next

The current pipeline handles single-layer and shallow multi-layer networks at interactive latencies. The next phase targets three improvements: fused CKKS-to-TFHE scheme switching that eliminates the explicit transition step, batched inference across multiple inputs sharing the same model (amortizing key-switch costs across a batch of 32+ inputs), and an operation planner that minimizes total key-switches across the computation graph.

The goal is a sub-second encrypted inference for a two-layer classifier on production cloud hardware. The math says it is achievable. The engineering is underway.

Eric Beans

CEO, H33.ai, Inc.

Patent pending. U.S. Patent Application Nos. 19/309,560 and 19/645,499. Additional applications pending.
All benchmarks measured on AWS c8g.metal-48xl (Graviton4, 192 vCPUs, Neoverse V2), April 2026. Rust 1.94.0.
All NIST security tests passed: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), FIPS 205 (SLH-DSA). FIPS 140-3 KATs operational. 20,000+ tests across the platform.
H33-74 is a trademark of H33.ai, Inc. AWS and Graviton4 are trademarks of Amazon Web Services, Inc.