PricingDemo
Log InGet API Key
FHE Engineering

Encrypted ML: CKKS vs TFHE for AI Inference

Approximate arithmetic or boolean gates? Choosing the right FHE scheme for your ML workload.

Encrypted machine learning inference requires choosing a fully homomorphic encryption scheme, and the choice matters more than most practitioners realize. CKKS and TFHE represent fundamentally different approaches to encrypted computation, optimized for different types of operations. Choosing the wrong scheme for your workload does not just reduce performance; it can make the computation impractical entirely.

CKKS is designed for approximate arithmetic on real and complex numbers. It encodes floating-point values directly into ciphertext slots and performs addition and multiplication on those values with controllable precision loss. A CKKS multiplication of two encrypted floats produces an encrypted float that is correct to a specified number of significant digits. This maps naturally to neural network inference, where every operation is approximate and floating-point precision loss is the norm.

TFHE is designed for exact boolean operations on individual encrypted bits. It provides encrypted AND, OR, NOT, and XOR gates that operate on single encrypted bits with perfect accuracy. There is no precision loss because the operations are exact. The trade-off is granularity: to add two 32-bit encrypted integers, TFHE must execute a full 32-bit ripple-carry adder circuit, one boolean gate at a time.

The choice between these schemes determines everything about your encrypted ML pipeline: the encoding of inputs, the representation of model weights, the implementation of activation functions, the memory footprint, and the end-to-end latency. Getting this choice right is the first and most consequential architectural decision in encrypted ML.

CKKS for Neural Networks

Neural network inference is a sequence of matrix multiplications and element-wise nonlinear activation functions. Each layer takes an input vector, multiplies it by a weight matrix, adds a bias vector, and applies an activation function. The input and output of every layer are real-valued vectors, and every intermediate computation involves floating-point arithmetic.

CKKS handles this workflow naturally. Input feature vectors are encoded into CKKS ciphertext slots as real numbers. Weight matrices are encoded as plaintext constants (assuming the model weights are not secret) or as additional ciphertexts (for private models). Matrix-vector multiplication uses homomorphic inner products: multiply corresponding slots and rotate-and-sum to accumulate the dot product.

The 4096 SIMD slots in a CKKS ciphertext enable two parallelization strategies. The first packs multiple neurons of the same layer into different slots, computing an entire layer's output in a single ciphertext operation sequence. The second packs the same neuron across multiple input samples, processing 4096 input samples simultaneously. The optimal strategy depends on the layer dimensions and the batch size.

Activation functions are the primary challenge for CKKS. ReLU, the most common activation, is a piecewise linear function that is not a polynomial. CKKS can only compute polynomials homomorphically, so ReLU must be approximated by a polynomial. Low-degree approximations (degree 2 or 3) are fast but diverge from true ReLU for large input values. High-degree approximations (degree 10 or higher) are accurate but consume multiple multiplicative levels, requiring larger encryption parameters.

H33 uses Chebyshev polynomial approximations tailored to the expected input range of each activation function. By analyzing the model's weight distribution and input characteristics, the system determines the effective input range and selects the minimum-degree polynomial that achieves the required accuracy within that range. This model-specific optimization is critical because a generic high-degree approximation wastes multiplicative depth on accuracy that the model does not need.

TFHE for Decision Logic

Not all ML models are neural networks. Decision trees, random forests, gradient-boosted trees, and rule-based systems are widely deployed in production for tabular data classification. These models operate on exact comparisons: is this value greater than this threshold? If yes, go left; if no, go right. The output is a discrete class label, not a continuous probability.

TFHE handles decision logic natively. A comparison between an encrypted integer and a threshold is a sequence of boolean operations on the encrypted bit representation. The result is an encrypted boolean that selects the next branch. The entire tree traversal can be expressed as a boolean circuit and evaluated gate by gate.

TFHE's programmable bootstrapping adds another dimension. After each gate operation, TFHE can bootstrap the result and simultaneously evaluate a lookup table. This means that complex nonlinear functions can be evaluated as table lookups during bootstrapping, without additional gates. For decision tree models, the entire branch evaluation can be fused into the bootstrapping step, significantly reducing the total gate count.

The trade-off is throughput. Each TFHE gate operates on a single encrypted bit, while each CKKS operation processes 4096 encrypted values simultaneously. For models with many parameters and wide feature vectors, CKKS achieves much higher throughput. For models with few comparisons and branching logic, TFHE can be faster because it avoids the polynomial approximation overhead that CKKS requires for comparisons.

Hybrid Approaches

Many production ML pipelines combine neural networks with decision logic. A common pattern is a neural network feature extractor followed by a gradient-boosted tree classifier. The neural network processes raw input data (images, text, signals) into a feature vector, and the tree model classifies the feature vector into a decision.

For encrypted inference on such pipelines, the optimal approach uses CKKS for the neural network feature extraction (leveraging SIMD batching and native floating-point support) and TFHE for the tree-based classification (leveraging native comparison and branching). The transition between schemes requires converting CKKS ciphertext slots into TFHE ciphertext bits, a scheme-switching operation that H33 handles transparently.

Scheme switching has a cost: it involves decrypting CKKS values to an intermediate representation and re-encrypting them in TFHE format. This does not expose plaintext data because the intermediate representation is protected by a shared key or a multi-party protocol. The latency overhead is proportional to the number of values being switched, but it occurs only once at the boundary between the CKKS and TFHE stages.

H33's compiler, H33-Compile, analyzes the full inference pipeline and automatically determines the optimal placement of scheme boundaries. For some models, pure CKKS is optimal. For others, pure TFHE is optimal. For many production models, a hybrid approach with one or two scheme switches achieves better performance than either pure approach.

Precision and Accuracy Trade-offs

CKKS introduces precision loss that TFHE does not. This precision loss must be evaluated in the context of the specific ML model to determine whether it affects prediction accuracy. A model that is robust to small perturbations in intermediate values will tolerate CKKS precision loss without degradation. A model that is sensitive to exact intermediate values may produce different predictions under CKKS than under plaintext evaluation.

H33 provides a model validation tool that compares CKKS-encrypted inference results against plaintext inference results on a test dataset. The tool measures the distribution of prediction differences and flags any cases where CKKS precision loss causes a different classification. For most neural network models trained with standard regularization, the CKKS precision loss has no measurable effect on prediction accuracy. For models with sharp decision boundaries or extreme weight magnitudes, the tool recommends either increasing CKKS precision parameters or switching to TFHE for the sensitive layers.

TFHE's exact arithmetic eliminates precision concerns entirely but introduces a different kind of overhead. Representing a 32-bit floating-point number in TFHE requires 32 encrypted bits, and operations on those bits require circuits with hundreds of gates. The computational cost per floating-point operation is orders of magnitude higher in TFHE than in CKKS. This is why TFHE is practical for models with few operations per prediction (decision trees with tens of comparisons) but impractical for models with many operations per prediction (neural networks with millions of multiplications).

Memory and Bandwidth Considerations

CKKS ciphertexts are compact relative to the data they encode. A single CKKS ciphertext with 4096 SIMD slots might consume 32KB while encoding 4096 real values. The per-value overhead is approximately 8 bytes, comparable to the plaintext size of a double-precision float.

TFHE ciphertexts are larger relative to the data they encode. A single encrypted bit in TFHE might consume several hundred bytes. A 32-bit encrypted integer requires 32 such ciphertexts, totaling several kilobytes for a single integer value. For models with many features, the total ciphertext size can be substantial.

This difference affects both storage and network bandwidth. Submitting an encrypted inference request with 100 features requires approximately one CKKS ciphertext (100 values packed into 4096 slots) versus 3200 TFHE ciphertexts (100 features times 32 bits each). The bandwidth difference is orders of magnitude and directly impacts request latency in network-bound deployments.

H33 mitigates TFHE bandwidth through ciphertext packing techniques that encode multiple bits into a single TFHE ciphertext and evaluate multi-bit operations in batched fashion. These techniques reduce the effective ciphertext count but add computational overhead. The optimal packing strategy depends on the number of features, the bit width of each feature, and the network latency between client and server.

Making the Choice

The decision framework is straightforward once you understand the trade-offs. Use CKKS when: your model operates on real-valued data, your model is a neural network or linear model, you need high throughput through SIMD batching, and your model tolerates the precision loss inherent in approximate arithmetic. Use TFHE when: your model operates on discrete comparisons, your model is a decision tree or rule-based system, you need exact results with no precision loss, and your model has a small number of operations per prediction.

Use a hybrid approach when: your pipeline combines neural network feature extraction with tree-based classification, or when specific layers require exact arithmetic (like argmax for classification) while other layers tolerate approximation.

H33 supports all three approaches through the same API. Specify your model architecture, and the system recommends the optimal scheme. Or specify the scheme explicitly if you have already made the determination. Either way, every inference includes post-quantum attestation through H33-74, ensuring that the encrypted computation was performed correctly regardless of which FHE scheme is used.

The choice between CKKS and TFHE is not a judgment about which scheme is better. It is an engineering decision about which scheme matches your workload. Both are powerful tools. Both are production-ready on H33. The right choice is the one that matches what you are computing.

Deploy Encrypted ML

H33 supports both CKKS and TFHE for encrypted ML inference. Choose the right scheme for your workload.

Get API Key Explore FHE Options
Verify It Yourself