FHE · 27 min read

BFV vs CKKS:
Choosing the Right FHE Scheme for Your Application

A comprehensive engineering guide to the two dominant FHE schemes. When exact integer arithmetic matters, when approximate is good enough, and the production parameters behind H33's 1.2M auth/sec pipeline.

~50µs
Per Auth (BFV)
1.2M/s
Throughput
128-bit
Security
32
Users/Batch

Fully Homomorphic Encryption lets you compute on encrypted data without ever decrypting it. That single sentence sounds simple, but the moment you sit down to build something real, the first fork in the road stops you cold: which FHE scheme should I use?

The answer is not academic. Choosing the wrong scheme means either sacrificing correctness (using approximate arithmetic where exact results are required) or leaving performance on the table (using exact arithmetic for workloads that are naturally approximate). For H33, the choice directly determines whether we can authenticate 1.2 million users per second or not.

The modern FHE landscape includes four major scheme families:

This guide focuses on BFV and CKKS because they are the two most widely deployed schemes for SIMD-batched workloads. They share the same underlying Ring Learning With Errors (RLWE) hardness assumption and similar parameter structures, but they encode data in fundamentally different ways, which makes them optimal for fundamentally different tasks.

The One-Sentence Decision

If your computation requires exact results — biometric matching, database equality checks, vote counting, financial ledgers — use BFV. If your computation tolerates controlled approximation — ML inference, statistical aggregation, signal processing — use CKKS.

The Fundamental Distinction: Exact vs. Approximate Arithmetic

Every FHE scheme encrypts data by adding carefully calibrated noise to a mathematical structure. This noise is what provides security — without it, the ciphertext would be trivially invertible. The defining question for any FHE scheme is: what happens to this noise during computation, and how does it affect the result?

BFV: Noise is an Obstacle to Remove

In BFV, the plaintext lives in a discrete integer ring Z_t[x] / (x^N + 1), where t is the plaintext modulus and N is the polynomial degree. The encryption process embeds the plaintext into a much larger ciphertext modulus Q, with noise added on top. During decryption, the noise is stripped away and the exact plaintext is recovered — provided the noise has not grown beyond the capacity of the scheme.

This means BFV computations are exact. If you encrypt the integer 42, perform a series of additions and multiplications, and decrypt the result, you get the mathematically precise answer modulo t. No rounding. No truncation. No precision loss. The noise is entirely internal to the scheme and invisible to the application, as long as you stay within the noise budget.

CKKS: Noise is Part of the Answer

CKKS takes a radically different approach. Instead of treating noise as something to be eliminated, CKKS treats noise as controlled approximation error that is tolerable for the application. Real numbers are encoded by scaling them up to integers, encrypting those integers, and accepting that each homomorphic operation introduces a small amount of additional imprecision — analogous to floating-point rounding error in conventional computing.

After a CKKS computation, the decrypted result is an approximation of the true answer. If you encrypt 3.14159 and multiply it by 2.0, you might get back 6.28318 plus or minus some small epsilon (say, 6.283180000000003). For ML inference, statistical aggregation, and signal processing, this is perfectly acceptable. For biometric matching where a single bit flip could change a match verdict, it is not.

Why This Matters in Practice

Consider a biometric cosine similarity threshold of 0.85. With BFV, the encrypted inner product yields an exact integer result that maps deterministically to a match or no-match decision. With CKKS, the result might be 0.8499997 or 0.8500003 — and the difference between those two values is the difference between granting and denying access. For authentication, exact arithmetic is not a luxury. It is a correctness requirement.

BFV Deep Dive

Mathematical Foundation

BFV is built on the Ring Learning With Errors (RLWE) problem, which operates in the polynomial ring R_Q = Z_Q[x] / (x^N + 1), where N is a power of two and Q is the ciphertext modulus. The security of the scheme reduces to the hardness of distinguishing RLWE samples from uniform random samples — a problem that is believed to be hard even for quantum computers, since it reduces to worst-case lattice problems (specifically, the Shortest Vector Problem on ideal lattices).

The key generation, encryption, and decryption procedures are as follows:

Key Generation

Sample a secret key s from a ternary distribution over R_Q (coefficients in {-1, 0, 1}). Sample a uniformly random polynomial a from R_Q and a small error polynomial e from a discrete Gaussian (or centered binomial) distribution. The public key is the pair pk = (pk0, pk1) = (-a*s + e, a). The secret key is s.

Encryption

To encrypt a plaintext polynomial m in R_t, sample a random polynomial u from the ternary distribution and two small error polynomials e1, e2. Compute:

ct = (pk0*u + e1 + floor(Q/t)*m, pk1*u + e2)

The term floor(Q/t)*m is the scaled plaintext. The errors e1 and e2 provide the RLWE security guarantee. The result is a pair of polynomials (c0, c1) in R_Q.

Decryption

Compute c0 + c1*s (mod Q), then scale by t/Q and round to the nearest integer modulo t. If the accumulated noise is below the threshold Q/(2t), the rounding recovers the exact plaintext m. This is the invariant that makes BFV exact: the noise occupies a fraction of the ciphertext space, and the scaling plus rounding eliminates it completely.

Noise Growth Analysis

Understanding noise growth is critical for parameter selection. In BFV, the noise budget is consumed at different rates depending on the operation:

H33 Production Insight

For biometric matching, H33 needs exactly one level of plaintext-ciphertext multiplication (the inner product between the encrypted probe template and the stored enrolled template) followed by accumulation (additions). This single multiplicative depth means we can use a minimal ciphertext modulus — a single 56-bit prime Q — which dramatically reduces computation time. Deeper circuits would require a larger Q or modulus chain, increasing latency proportionally.

Plaintext Modulus Selection: Why t = 65537

The plaintext modulus t determines the range of integers you can represent and the structure of the SIMD batching slots. For SIMD batching to work, t must satisfy a specific algebraic condition: t must be a prime such that t ≡ 1 (mod 2N). This ensures that the polynomial x^N + 1 splits completely modulo t, giving you N independent plaintext slots (or N divided by the order of the Galois group, depending on the splitting pattern).

For N = 4096, we need t ≡ 1 (mod 8192). The smallest such prime is t = 65537 = 2^16 + 1, which is also a Fermat prime. This choice gives us:

SIMD Batching via CRT

SIMD (Single Instruction, Multiple Data) batching is what makes BFV practical at scale. Rather than encrypting one integer per ciphertext, we exploit the Chinese Remainder Theorem (CRT) isomorphism to pack N independent values into a single ciphertext, and every homomorphic operation applies simultaneously to all N slots.

When x^N + 1 splits modulo t into N linear factors, the plaintext ring Z_t[x]/(x^N + 1) is isomorphic to Z_t^N via the CRT map. In concrete terms: a single polynomial encodes 4096 independent integers, and a single BFV addition or multiplication operates on all 4096 in parallel.

For biometric authentication, each user's template is a 128-dimensional vector. We pack 4096 / 128 = 32 user templates into a single ciphertext. One encrypted inner-product operation processes all 32 users simultaneously, which is why H33 measures performance in batches of 32:

SIMD Batching Architecture

Polynomial degree (N)4,096
SIMD slots4,096
Template dimensions128
Users per ciphertext32
Template storage reduction128x

The key insight is that SIMD batching is constant-time with respect to batch size. Processing 1 user or 32 users takes essentially the same ~1,375 microseconds for the FHE batch operation. Below 32 users, some slots are simply unused. This is why throughput scales linearly with worker count but is constant per batch — and why H33 achieves 1.2 million authentications per second on a 96-core Graviton4 instance.

CKKS Deep Dive

The Approximate Arithmetic Model

CKKS, introduced by Cheon, Kim, Kim, and Song in 2017, was designed from the ground up for approximate computation on real and complex numbers. Instead of the integer encoding used by BFV, CKKS uses a scaling factor (often called Δ) to map floating-point values into the integer polynomial ring.

To encode a vector of real numbers (r_1, r_2, ..., r_{N/2}), CKKS first maps them to complex numbers on the canonical embedding (using the inverse DFT), then scales the result by Δ and rounds to the nearest integer polynomial. The scaling factor Δ determines the precision: a larger Δ means more bits of precision but also a larger ciphertext modulus requirement.

The encoding is:

  1. Start with a vector of N/2 complex values (real values are embedded as complex with zero imaginary part)
  2. Apply the inverse canonical embedding (essentially an inverse DFT specialized to the cyclotomic structure)
  3. Scale by Δ and round to the nearest integer polynomial in R_Q
  4. Encrypt using standard RLWE encryption (same as BFV structurally)

The Rescaling Mechanism

The critical innovation in CKKS is rescaling. When you multiply two CKKS ciphertexts, the scaling factor squares: if both inputs have scale Δ, the product has scale Δ^2. Left unchecked, the scale would grow exponentially with multiplicative depth, quickly exceeding the ciphertext modulus.

Rescaling divides the ciphertext by one factor of Δ (by dividing both polynomials by the smallest prime in the modulus chain and dropping that prime). This brings the scale back down to Δ and simultaneously reduces the ciphertext modulus from Q = q_1 * q_2 * ... * q_L to Q' = q_1 * q_2 * ... * q_{L-1}.

Each rescaling operation consumes one level of the modulus chain. The total number of available levels L determines the maximum multiplicative depth of the computation. This is fundamentally similar to how BFV manages noise through modulus switching, but in CKKS the rescaling is mandatory after every multiplication to maintain numerical stability.

Precision vs. Depth Trade-Off

Every CKKS rescaling step introduces a small additional approximation error, roughly 1/Δ. After L levels of multiplication, the accumulated error is approximately L/Δ. For deep circuits (many sequential multiplications), you need a larger Δ to maintain precision, which requires a larger modulus Q, which increases ciphertext sizes and computation time. This is the fundamental precision-depth trade-off in CKKS.

Encoding Real Numbers to Polynomial Rings

The CKKS encoding leverages a beautiful algebraic structure. The ring Z[x]/(x^N+1) has N complex roots of unity (the primitive 2N-th roots of unity), and the canonical embedding maps a polynomial to its evaluation at these roots. For CKKS, we use only N/2 of these roots (the others are conjugates), giving us N/2 independent complex-valued slots.

In practice, this means:

When Approximate Is Good Enough

CKKS shines in domains where the input data itself is inherently approximate. Neural network weights are typically stored as float32 or float16. Sensor measurements have noise floors. Statistical aggregations are reported to a fixed number of significant digits. In all these cases, the controlled approximation error introduced by CKKS is smaller than the inherent uncertainty in the data — making exact arithmetic unnecessarily expensive.

Specific scenarios where CKKS's approximate model is a natural fit:

Head-to-Head Comparison

The following table compares BFV and CKKS across the dimensions that matter for production deployment. Parameters are normalized to N = 4096 and 128-bit security for a direct comparison.

Dimension BFV CKKS
Arithmetic type Exact integers mod t Approximate real/complex
Plaintext encoding CRT (integer slots) Canonical embedding (complex slots)
SIMD slots (N=4096) 4,096 2,048 (complex) or 4,096 (real)
Noise management Modulus switching (optional) Rescaling (mandatory after multiply)
Noise model Noise removed at decrypt Noise is approximation error
Addition cost ~1µs (trivial) ~1µs (trivial)
Multiplication cost ~50-200µs (depends on relin) ~40-180µs + rescale
Rotation cost ~100-300µs (key-switch) ~100-300µs (key-switch)
Ciphertext size (1 level) ~32 KB (N=4096, Q=56-bit) ~64 KB (N=4096, Q=109-bit)
Bootstrapping Supported but rarely needed Supported, more practical
Bootstrap overhead ~100ms+ (high cost) ~10-50ms (more efficient)
Comparison operations Native (integer comparison) Requires polynomial approximation
Quantum security Yes (RLWE / lattice) Yes (RLWE / lattice)
Key Takeaway

The performance differences between BFV and CKKS for basic operations (add, multiply, rotate) are relatively small. The major differences are in what those operations mean — exact vs. approximate — and in the parameter efficiency for different workload types. BFV with a single-level modulus is extremely compact. CKKS with deep modulus chains can grow large.

Use Case Decision Matrix

The choice between BFV and CKKS should be driven by the nature of the computation, not by performance benchmarks alone. Here is a detailed decision matrix with rationale for each domain.

Biometric Matching → BFV

Biometric authentication computes cosine similarity or Euclidean distance between encrypted templates. The match/no-match decision is binary and depends on an exact threshold comparison. Even a tiny approximation error could flip the decision. H33 uses BFV for this exact reason.

ML Inference → CKKS

Neural network inference involves matrix multiplications and polynomial-approximated activation functions. Model weights are inherently approximate (float32 or quantized int8), and inference outputs are probabilities that tolerate small errors. CKKS's native floating-point semantics map directly to this workload.

Database Queries → BFV

Private database queries (equality checks, range queries, keyword search) require exact matching. A query for "age = 25" must return exactly the records where age equals 25, not records where age is approximately 25. BFV's exact integer arithmetic handles this naturally.

Statistical Analysis → CKKS

Computing mean, variance, standard deviation, or correlation coefficients across encrypted datasets. These are inherently approximate computations — reporting a mean to 8 significant digits is more than sufficient. CKKS avoids the overhead of exact arithmetic for this class of workload.

Electronic Voting → BFV

Vote tallying must be exact. A vote is either cast or not cast. There is no "approximately 1 vote." BFV's integer arithmetic maps perfectly to binary or multi-candidate ballot encoding, and the exact addition guarantees a correct tally.

Signal Processing → CKKS

FFT, filtering, convolution, and spectral analysis operate on sampled real-valued signals that are inherently band-limited and noisy. CKKS's canonical embedding is structurally related to the DFT, making the encoding and computation particularly natural and efficient.

Financial Ledgers → BFV

Account balances, transaction amounts, and audit trails must be exact to the cent. Approximate arithmetic on financial data would introduce rounding discrepancies that compound across millions of transactions. BFV's modular arithmetic guarantees exact accounting.

Genomic Analysis → CKKS

Genome-wide association studies (GWAS) compute statistical correlations across hundreds of thousands of SNPs. The input data has inherent measurement noise, and the results are p-values and effect sizes that are meaningful only to a few significant figures. CKKS is the natural choice.

H33's Production BFV Configuration

H33's entire authentication pipeline runs on a single BFV configuration, tuned for minimum latency at 128-bit security. Every parameter choice has a specific rationale, and the configuration has been locked since February 2026 after extensive benchmarking on AWS Graviton4 (c8g.metal-48xl, 96 Neoverse V2 cores).

Why BFV for Authentication

The core operation in biometric template matching is an inner product between an encrypted probe template and a stored enrolled template. This inner product is then compared against a threshold to produce a match/no-match decision. The requirements are:

  1. Exact inner product — the distance computation must be bit-perfect to ensure deterministic match verdicts
  2. Integer-domain computation — biometric feature vectors are quantized to 8-10 bit integers during enrollment
  3. Single multiplicative depth — the inner product is a series of plaintext-ciphertext multiplications followed by additions, requiring only depth 1
  4. Maximum SIMD throughput — batching as many users as possible per ciphertext to amortize the NTT and key-switching overhead

CKKS could theoretically compute an inner product, but the approximation error would require a larger security margin in the threshold comparison, effectively wasting bits of precision to compensate for noise that BFV simply does not have. When the computation is naturally integer-valued and shallow, BFV is strictly superior.

Parameter Choices and Rationale

Production Parameters

Polynomial degree (N)4,096
Ciphertext modulus (Q)56-bit prime
Plaintext modulus (t)65,537
Security level128-bit
Multiplicative depth1
Number of moduli1 (single Q)
SIMD slots4,096
Users per ciphertext32

Why N = 4096? This is the minimum polynomial degree that provides 128-bit security with a 56-bit modulus. Increasing to N = 8192 would give us more slots and a deeper noise budget, but would double the NTT cost for no benefit — we only need depth 1.

Why a single 56-bit modulus? With multiplicative depth 1, we do not need a modulus chain. A single prime Q means no modulus switching overhead, no RNS (Residue Number System) decomposition, and ciphertext sizes that fit comfortably in L2 cache. This is the smallest Q that provides 128-bit security at N = 4096 while leaving enough noise budget for one multiplication plus a handful of additions.

Why t = 65537? As discussed above, this is the smallest prime satisfying t ≡ 1 (mod 8192) for full SIMD batching with N = 4096. It provides 16 bits of dynamic range per slot, more than enough for quantized biometric features.

Montgomery NTT Optimization

The Number Theoretic Transform is the computational bottleneck of any BFV implementation. Every encryption, decryption, and multiplication requires forward and inverse NTTs on degree-4096 polynomials. H33's NTT implementation uses several techniques that collectively eliminate all division from the hot path:

Rust ntt.rs (simplified)
/// Forward NTT with Montgomery arithmetic and Harvey lazy reduction.
/// Twiddles are pre-stored in Montgomery form. No division in the hot path.
pub fn forward_ntt_mont(
    data: &mut [u64],
    twiddles_mont: &[u64],
    q: u64,
    q_inv: u64,   // Montgomery inverse: -q^{-1} mod 2^64
    two_q: u64,   // 2*q for lazy reduction bound checks
) {
    let n = data.len();
    let mut t = n;
    let mut tw_idx = 0;

    while t > 1 {
        t >>= 1;
        for i in (0..n).step_by(2 * t) {
            let w = twiddles_mont[tw_idx];
            tw_idx += 1;
            for j in i..i + t {
                // Harvey butterfly: values stay in [0, 2q)
                let u = data[j];
                let v = mont_mul(data[j + t], w, q, q_inv);
                data[j]     = u + v - (two_q & mask_if_ge(u + v, two_q));
                data[j + t] = u - v + (two_q & mask_if_ge(v, u));
            }
        }
    }
    // Final reduction: bring all values from [0, 2q) to [0, q)
    for x in data.iter_mut() {
        *x -= q & mask_if_ge(*x, q);
    }
}

Benchmark Data

All measurements on c8g.metal-48xl (96 cores, AWS Graviton4, Neoverse V2), system allocator, Criterion.rs v0.5, February 2026.

Production Throughput

FHE batch (32 users)~1,375µs
ZKP STARK lookup~0.067µs
Dilithium attestation~240µs
Total 32-user batch~1,615µs
Per authentication~50µs
Sustained throughput (96 workers)~1.2M auth/sec

The FHE batch dominates the pipeline at ~85% of total latency. The ZKP and Dilithium attestation stages are amortized across the batch (one proof and one signature per 32 users), making them negligible per-authentication. This is why BFV parameter optimization — and specifically NTT optimization — is the single most important performance lever in the entire H33 stack. See FHE Performance Optimization for the full optimization journey.

Parameter Selection Guidelines

Choosing FHE parameters is an exercise in balancing four competing constraints: security level, computation depth, performance, and ciphertext size. The FHE Parameter Selection Guide covers this in exhaustive detail. Here we summarize the key considerations for BFV and CKKS.

BFV Parameter Selection

  1. Determine your multiplicative depth. Count the maximum number of sequential ciphertext-ciphertext multiplications in your circuit. For inner products with plaintext operands, the depth is 1. For matrix multiplications, it may be 2-4.
  2. Choose N. Start with the smallest power of two that provides your target security level (typically 128 bits) given the modulus size you will need. Use the Homomorphic Encryption Standard security tables.
  3. Choose Q. The ciphertext modulus must be large enough to accommodate the initial noise plus the noise growth from your computation. For depth 1 with N=4096, a single 56-bit prime suffices. For depth 4+, you will need a modulus chain of 3-8 primes totaling 200-400 bits.
  4. Choose t. For SIMD batching, pick the smallest prime satisfying t ≡ 1 (mod 2N) that provides sufficient dynamic range for your plaintext values.
  5. Validate security. Use the lattice-estimator tool to confirm that your (N, Q) pair achieves the target security level against known lattice attacks (primal uSVP, dual, hybrid).

CKKS Parameter Selection

  1. Determine your precision requirement. How many bits of precision do you need in the final result? This determines the minimum scale Δ, typically 2^{30} to 2^{60}.
  2. Determine your multiplicative depth. Same as BFV, but remember that each multiplication requires a mandatory rescaling step that consumes one modulus level.
  3. Build the modulus chain. You need L + 1 primes: L primes of size ~log2(Δ) bits for the computation levels, plus one special prime for the initial encryption level. The total modulus Q is the product of all primes.
  4. Choose N. The polynomial degree must provide 128-bit security for the total modulus size. Since CKKS modulus chains tend to be larger than BFV single-level moduli, CKKS typically requires N = 8192 or 16384 for moderate computation depths.
  5. Validate precision. Run your computation on test data and measure the actual output error against the expected result. Adjust Δ upward if precision is insufficient.
Common Mistake

Do not over-provision parameters. A common error is choosing N = 16384 "just to be safe" when N = 4096 would suffice. The NTT cost scales as O(N log N), so doubling N roughly doubles the latency. Similarly, using a modulus chain of 8 primes when your circuit only needs depth 2 wastes memory and bandwidth on unused ciphertext capacity. Profile your actual computation depth, then choose the minimum parameters that satisfy it.

Performance Optimization Techniques

Both BFV and CKKS share the same computational bottlenecks: NTT, polynomial multiplication, and key-switching. The FHE Performance Optimization guide covers these in detail. Here we highlight the techniques most relevant to scheme selection.

NTT-Domain Operations

The Number Theoretic Transform converts polynomial multiplication from O(N^2) to O(N log N). But beyond this asymptotic improvement, keeping operands in the NTT domain between operations avoids redundant forward/inverse transforms. In H33's BFV pipeline:

This NTT-domain persistence strategy reduces the number of NTT operations per authentication from dozens to the absolute minimum: one forward NTT for the encrypted probe, pointwise multiply-accumulate in NTT domain, one inverse NTT for the final result.

Lazy Reduction (Harvey Method)

Standard modular arithmetic reduces every intermediate result to the range [0, Q). The Harvey lazy reduction technique relaxes this constraint, allowing intermediate values to remain in [0, 2Q) between NTT butterfly stages. The benefits are twofold:

Batch Processing

SIMD batching (packing 32 users per ciphertext) is the single largest throughput multiplier. But there is a second level of batching: pipeline batching, where multiple ciphertexts are processed concurrently across worker threads. H33 runs 96 parallel workers on Graviton4, each processing independent 32-user batches. The workers share no mutable state — all key material is read-only and shared via Arc — so there is zero contention.

Hardware-Specific Tuning

Different CPU architectures favor different optimization strategies:

ARM NEON (Graviton4)

128-bit SIMD registers. Excellent for branchless permutations (Galois rotation), vectorized key-switching, and add/subtract/compare operations. NEON lacks native 64x64-bit multiply-to-128-bit, so Montgomery multiplication stays scalar. H33 uses NEON for Galois operations and batch CBD sampling.

x86 AVX-512

512-bit SIMD registers with 52-bit integer multiply (IFMA). AVX-512 uses Shoup's method (precomputed quotient) rather than Montgomery REDC for modular multiplication. H33's Montgomery-based NTT is ARM-optimized; an AVX-512 port would require a structural rewrite to Shoup's method for maximum throughput.

Quantum Security

Both BFV and CKKS are built on the Ring Learning With Errors (RLWE) problem, which is a specific instance of the general lattice-based hardness assumption. RLWE's security reduces to the worst-case hardness of finding short vectors in ideal lattices — a problem for which no efficient quantum algorithm is known.

This is a critical advantage of FHE over classical encryption schemes. RSA and ECC will be broken by Shor's algorithm on a sufficiently large quantum computer. AES will have its effective security halved by Grover's algorithm. But RLWE-based schemes — including both BFV and CKKS — are inherently post-quantum.

Post-Quantum by Construction

You do not need to add a separate post-quantum layer to an FHE computation. The data is already encrypted under a lattice-based scheme that resists both classical and quantum attacks. This is why H33's full-stack pipeline (FHE + ZKP + Dilithium attestation) achieves end-to-end post-quantum security: the FHE layer protects the biometric data, and the Dilithium attestation protects the authentication verdict.

The best known quantum attack against RLWE is based on quantum lattice sieving, which provides at most a polynomial speedup over classical lattice sieving — far less than the exponential speedup Shor's algorithm provides against RSA. The Homomorphic Encryption Standard committee has published security estimates that account for quantum attacks, and H33's parameter choice (N = 4096, Q = 56-bit) meets the 128-bit post-quantum security threshold according to these estimates.

Other FHE Schemes Worth Knowing

BFV and CKKS dominate SIMD-batched workloads, but two other schemes are important for completeness.

BGV (Brakerski-Gentry-Vaikuntanathan)

BGV is the intellectual predecessor to BFV and shares the same exact-integer-arithmetic model. The key difference is noise management: BGV uses modulus switching as a mandatory step after every multiplication to reduce noise, while BFV uses a scale-invariant approach where modulus switching is optional.

In practice, BFV and BGV perform similarly for shallow circuits. BGV can be slightly more efficient for deep circuits because its modulus switching is more tightly integrated with the noise growth analysis. However, BFV's simpler noise model makes it easier to reason about and implement correctly. Most modern FHE libraries (Microsoft SEAL, OpenFHE, Lattigo) support both, and the choice between them is often an implementation detail rather than an architectural decision.

Dimension BGV BFV CKKS TFHE
Arithmetic Exact integer Exact integer Approximate real Boolean / small integer
Batching SIMD (CRT) SIMD (CRT) SIMD (canonical emb.) No native SIMD
Noise mgmt Modulus switching Scale-invariant Rescaling Bootstrapping (fast)
Best for Deep integer circuits Shallow integer circuits ML, statistics Comparisons, Boolean
Bootstrap cost High (~100ms+) High (~100ms+) Moderate (~10-50ms) Low (~10-100µs)
Quantum-safe Yes (RLWE) Yes (RLWE) Yes (RLWE) Yes (LWE)

TFHE (Torus FHE)

TFHE operates on individual bits or small integers (typically 2-8 bits) and evaluates arbitrary Boolean circuits gate by gate. Its distinguishing feature is programmable bootstrapping: a bootstrapping operation that not only refreshes the noise but also evaluates a lookup table in the process. This makes TFHE uniquely suited for:

The trade-off is throughput. TFHE processes one bit at a time (or small integers), while BFV and CKKS process thousands of values in parallel via SIMD batching. For data-parallel workloads like biometric matching or ML inference, BFV and CKKS are orders of magnitude faster. For serial logic-heavy workloads like encrypted database filters with complex predicates, TFHE can be the better choice.

Decision Flowchart

Use this flowchart to determine the right FHE scheme for your workload:

Does your computation require exact integer results?
YES → BFV
↓ NO
Does your computation operate on real/floating-point numbers?
YES → CKKS
↓ NO
Does your computation involve complex comparisons or branching?
YES → TFHE
↓ NO
Is your circuit depth > 10 with exact integer arithmetic?
YES → BGV
↓ NO
Default for shallow integer circuits with SIMD batching
BFV

Conclusion

BFV and CKKS are not competing schemes — they are complementary tools designed for fundamentally different computational models. BFV provides exact integer arithmetic with zero precision loss, making it the only correct choice for applications where a single bit of error can change the outcome: biometric authentication, database equality checks, financial accounting, and electronic voting. CKKS provides efficient approximate arithmetic on real numbers, making it the natural choice for machine learning inference, statistical analysis, signal processing, and any workload where the input data is itself approximate.

H33 chose BFV for its production authentication pipeline because biometric matching is inherently an exact computation. The encrypted inner product between a probe template and an enrolled template must yield a bit-perfect result to ensure deterministic match/no-match verdicts. With BFV parameters tuned to the minimum necessary for this workload (N = 4096, single 56-bit Q, t = 65537), the entire pipeline runs at ~50 microseconds per authentication — fast enough to sustain 1.2 million authentications per second on a single Graviton4 instance.

For workloads that require approximate arithmetic, H33 also supports CKKS through the same API. The parameter selection, encoding, and computation mechanics differ, but the underlying RLWE security guarantee is identical. Both schemes are post-quantum by construction, and both benefit from the same NTT and key-switching optimizations.

The right scheme is the one that matches your computation's precision requirements. Everything else — performance, parameter sizes, implementation complexity — follows from that fundamental choice.

Further Reading

What Is Fully Homomorphic Encryption? — start here if you are new to FHE.
FHE Parameter Selection Guide — detailed walkthrough of choosing N, Q, and t.
FHE Performance Optimization — the full optimization journey from milliseconds to microseconds.
NTT: Number Theoretic Transform — deep dive into the computational engine behind all FHE schemes.
Biometric Authentication Guide — how H33 uses BFV for encrypted biometric matching.
Biometric Template Protection — why encrypted templates outperform hashed templates.
Introduction to Lattice Cryptography — the mathematical foundation underlying both BFV and CKKS.
What Is Post-Quantum Cryptography? — the quantum threat and how lattice-based schemes address it.

Ready to Go Quantum-Secure?

Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.

Get Free API Key →

Build With Post-Quantum Security

Enterprise-grade FHE, ZKP, and post-quantum cryptography. One API call. Sub-millisecond latency.

Get Free API Key → Read the Docs
Free tier · 10,000 API calls/month · No credit card required