FHE · 30 min read

FHE Parameter Selection:
Balancing Security and Performance

A comprehensive engineering guide to choosing FHE parameters that are provably secure, computationally efficient, and production-ready. Learn the exact rationale behind every parameter choice in H33's 1.2M auth/sec pipeline.

~50μs
Per Auth
1.2M/s
Throughput
128-bit
Security
32
Users/Batch

In Fully Homomorphic Encryption, the algorithm you choose matters far less than the parameters you feed into it. BFV, BGV, and CKKS are all capable schemes—but set the wrong polynomial degree or coefficient modulus and you get either a system that is trivially breakable, or one so slow it cannot serve a single request before the TCP connection times out. Parameter selection is the single most consequential decision in any FHE deployment, and getting it wrong is irreversible: you cannot patch parameters after data has been encrypted.

This guide walks through every parameter in the FHE design space, explains the mathematical constraints that bind them together, and provides concrete decision frameworks for production systems. We will use H33's own biometric authentication pipeline—serving 1.2 million authentications per second on Graviton4 hardware—as a running case study, showing exactly why we chose N=4096, a single 56-bit coefficient modulus, and t=65537 as our plaintext modulus.

Whether you are building privacy-preserving machine learning, encrypted analytics, or biometric matching, the methodology here applies. The numbers change; the reasoning process does not.

Who This Guide Is For

Engineers and cryptographers implementing FHE in production systems. We assume familiarity with lattice cryptography basics and the distinction between BFV/BGV integer schemes and CKKS approximate arithmetic. If you need a primer on these topics, start with our FHE introduction and BFV vs CKKS comparison first.

The Parameter Space

Every Ring-LWE-based FHE scheme requires you to fix a set of interdependent parameters before a single bit of plaintext is encrypted. Change one, and the valid ranges for all the others shift. The full parameter space includes six core dimensions:

Polynomial Degree (N)

The dimension of the polynomial ring R = Z[X]/(XN+1). Must be a power of 2. Determines slot count, security ceiling, and NTT performance. Typical range: 1024 to 65536.

Coefficient Modulus (Q)

The modulus for ciphertext coefficients. Can be a single prime or an RNS chain of co-prime factors. Determines multiplicative depth and, together with N, the security level.

Plaintext Modulus (t)

BFV/BGV only. Defines the plaintext space Zt. Must satisfy CRT conditions for SIMD batching. Larger t means more noise per operation and fewer available multiplications.

Scale Δ (CKKS)

CKKS only. The encoding precision factor. Determines how many bits of real-number precision survive each computation level. Typically 40–60 bits per level.

Error Distribution (σ)

The standard deviation of the discrete Gaussian (or centered binomial) noise added during encryption. The HomomorphicEncryption.org standard fixes σ = 3.2 for all security levels.

Security Level (λ)

Target security in bits: 128, 192, or 256. This is not a free parameter—it constrains the ratio N/log(Q), meaning larger Q demands proportionally larger N.

The critical insight is that these parameters form a constraint system, not a menu. You do not independently "pick" each one. Instead, you start with your application requirements (data type, computation depth, throughput target, security level), and the mathematics tells you what parameter combinations are valid. Your job is to find the tightest valid configuration—the one that satisfies all constraints with the least overhead.

The Fundamental Trade-off

Security demands large N and small Q. Performance demands small N and small Q. Deep computations demand large Q. Every FHE parameter set is a negotiated compromise between these three forces. The art is in minimizing waste—choosing parameters that are just large enough to be secure and just deep enough for your computation, with no margin wasted on capabilities you will never use.

The Security-Performance Trade-off

FHE security rests on the hardness of the Ring Learning With Errors (RLWE) problem, which is itself reducible to worst-case lattice problems. The security of any parameter set is determined by the ratio of the polynomial degree N to the logarithm of the coefficient modulus Q. Specifically, an attacker's best strategy against RLWE involves solving a lattice problem in a lattice of dimension roughly proportional to N, where the problem's difficulty scales as N/log(Q).

How Security Levels Are Calculated

The community standard for estimating FHE security is the Lattice Estimator (Albrecht et al.), a SageMath library that evaluates all known lattice attacks—primal uSVP, dual, BKW, and hybrid attacks—against a given parameter set. For each attack, it estimates the number of classical (and quantum) bit operations required. The minimum across all attacks gives the security level.

The HomomorphicEncryption.org consortium publishes standardized parameter tables that have been validated against the Lattice Estimator. These tables provide the maximum allowed log2(Q) for each polynomial degree N at each security level. Exceeding the maximum Q for a given N drops you below the target security level—there is no graceful degradation; you are simply insecure.

Standard Parameter Bounds

The following table shows the maximum coefficient modulus bit-length for each polynomial degree at the three standard security levels, as specified by the HomomorphicEncryption.org standard (ternary secret, σ = 3.2):

Poly Degree (N)max log2(Q) for 128-bitmax log2(Q) for 192-bitmax log2(Q) for 256-bit
1024271914
2048543729
40961097558
8192218152118
16384438305237
32768881611476

Reading this table: with N=4096, you can use up to a 109-bit coefficient modulus and still maintain 128-bit security. H33 uses a single 56-bit Q—well within the 109-bit ceiling—providing a comfortable security margin. We could theoretically double our modulus and still be secure, but there is no reason to: our computation only requires depth 1.

Warning: Security Is Not Monotonic in N

A common misconception is "bigger N = more secure." This is only true if you hold Q constant. If you increase N to accommodate a larger Q (e.g., for deeper computation), the security level might stay the same or even decrease if you are not careful. Always check the N/log(Q) ratio against the standard tables after any parameter change.

Post-Quantum Security

Unlike RSA and elliptic-curve schemes, lattice-based FHE is believed to resist quantum attacks. Grover's algorithm provides only a quadratic speedup against lattice problems (compared to the exponential speedup Shor's gives against factoring), so the standard security levels remain valid in a post-quantum world. A 128-bit classical security parameter set provides approximately 128-bit post-quantum security as well. This is one of FHE's most important properties: your encrypted data does not become retroactively vulnerable when quantum computers arrive.

Polynomial Degree (N) Selection

The polynomial degree N is the most impactful single parameter in any FHE scheme. It determines three critical properties simultaneously:

  1. SIMD slot count: BFV/BGV can batch up to N plaintext values into a single ciphertext (N/2 for CKKS). Each slot operates independently in parallel.
  2. Security ceiling: N sets the maximum Q you can use while maintaining your target security level.
  3. Computational cost: All polynomial operations scale as O(N log N) via the Number Theoretic Transform, so doubling N more than doubles every operation's cost.

N must be a power of 2. This is not an arbitrary convention—it is a structural requirement of the cyclotomic polynomial ring Z[X]/(XN+1), which enables NTT-based fast polynomial multiplication and provides the algebraic properties (ring splitting) needed for SIMD batching.

N Values and Their Use Cases

NSlots (BFV)Max Q (128-bit)Typical UseRelative Speed
10241,02427 bitsToy examples, testing1x (baseline)
20482,04854 bitsSimple comparisons, lightweight ops~2.5x slower
40964,096109 bitsProduction auth, shallow circuits~6x slower
81928,192218 bitsML inference, multi-level circuits~14x slower
1638416,384438 bitsDeep computation, bootstrapping~32x slower
3276832,768881 bitsCKKS bootstrapping, complex ML~72x slower

The "Relative Speed" column is approximate and reflects the full-pipeline cost, not just NTT time. The actual slowdown is super-linear because larger N also requires more cache and memory bandwidth, increasing the constant factors beyond the O(N log N) theoretical scaling.

Why H33 Chose N=4096

H33's biometric authentication encodes 128-dimensional feature vectors into FHE ciphertexts. With N=4096, we get 4,096 SIMD slots. Dividing by 128 dimensions gives exactly 32 users per ciphertext—a natural batch size that aligns perfectly with our throughput requirements.

H33 N=4096 Decision Rationale

SIMD slots4,096
Biometric dims per user128
Users per ciphertext32
Max Q at 128-bit security109 bits
Actual Q used56 bits
Security margin remaining53 bits

Could we use N=2048? No. With only 2,048 slots and 128 dimensions, we would only fit 16 users per ciphertext—halving our batch throughput. Additionally, the 54-bit Q ceiling at 128-bit security would leave almost no headroom above our 56-bit modulus. We would either need to shrink Q (reducing noise budget) or accept weaker security.

Could we use N=8192? Technically yes, and we would get 64 users per ciphertext. But every polynomial operation would be roughly 2.5x slower, and we would be paying for 8,192 slots when 4,096 slots serve our purpose perfectly. In production, this translates to the difference between ~50μs per auth and ~125μs per auth—a 2.5x throughput penalty for zero security benefit.

Anti-Pattern: Over-Provisioning N

Using N=16384 when N=4096 suffices is the single most common performance mistake in FHE deployments. It costs you a ~16x slowdown and provides no additional security if your Q fits within the smaller ring's budget. Always start with the smallest N that gives you enough slots and enough Q headroom.

Coefficient Modulus (Q) Selection

The coefficient modulus Q defines the range of ciphertext coefficients: each polynomial coefficient lives in ZQ. In modern FHE libraries, Q is represented as a product of co-prime factors using the Residue Number System (RNS) representation, where Q = q1 · q2 · ... · qL. Each factor qi is called a "limb," and all polynomial arithmetic is performed independently modulo each limb, enabling native 64-bit operations without multi-precision arithmetic.

Single Modulus vs. Chain of Primes

The number of RNS limbs directly determines the multiplicative depth of your FHE computation:

ConceptualRNS chain for depth-3 BFV computation
// Q = q_special * q_0 * q_1 * q_2
//                                    
// Level 3:  Q = q_s * q_0 * q_1 * q_2   (fresh ciphertext)
// Multiply + mod_switch:
// Level 2:  Q = q_s * q_0 * q_1          (drop q_2)
// Multiply + mod_switch:
// Level 1:  Q = q_s * q_0                 (drop q_1)
// Multiply + mod_switch:
// Level 0:  Q = q_s                       (drop q_0, decrypt)
//
// Each mod_switch reduces noise by factor ~q_i
// Total log2(Q) must fit under security bound for chosen N

Choosing Prime Sizes

Individual RNS primes are typically chosen to be close to 60 bits (the maximum that fits in a 64-bit machine word with room for lazy reduction) or smaller primes of 30–50 bits for finer-grained noise management. The total log2(Q) is the sum of all prime bit-lengths, and this total must stay below the security bound from the parameter table.

For CKKS, the primes also serve as scale factors during rescaling, so their sizes directly affect encoding precision. A typical CKKS chain might use 60-bit primes for the first and last levels and 40-bit primes in between:

ExampleCKKS chain for N=8192, 128-bit security
// Max log2(Q) at N=8192, 128-bit security: 218 bits
// Chain: [60, 40, 40, 40, 60] = 240 bits  <-- TOO LARGE! Insecure.
// Chain: [60, 40, 40, 60]     = 200 bits  <-- OK. 3 levels, secure.
// Chain: [50, 30, 30, 30, 50] = 190 bits  <-- OK. 4 levels, secure.
//
// First prime: special prime for key switching
// Last prime: initial scale (encoding precision)
// Middle primes: consumed one per multiplication (rescale)

H33's Choice: Single 56-bit Q

H33's biometric matching circuit has multiplicative depth 1: we compute inner products (multiply + accumulate), which requires exactly one ciphertext-plaintext multiplication. No ciphertext-ciphertext multiplication. No mod switching. A single 56-bit prime gives us everything we need:

Key Insight

For depth-1 computations, a single modulus is strictly superior to a chain. You eliminate the entire RNS machinery—basis extension, modulus switching, CRT reconstruction—which can account for 30–50% of total computation time in multi-limb implementations. If your circuit is shallow, do not pay for depth you will not use.

Plaintext Modulus (t) for BFV/BGV

In BFV and BGV, the plaintext modulus t defines the algebraic structure of your plaintext space. Each SIMD slot holds an element of Zt, meaning each slot can represent integers from 0 to t−1. But t is not just about data range—it has profound implications for batching capability, noise growth, and NTT compatibility.

The CRT Batching Condition

SIMD batching—the ability to pack N independent values into a single ciphertext—requires that the cyclotomic polynomial XN+1 splits completely into N linear factors modulo t. For power-of-two cyclotomic rings, this happens if and only if:

t ≡ 1 (mod 2N)

This is the Chinese Remainder Theorem (CRT) condition. If t does not satisfy it, you get fewer than N slots—potentially as few as 1, which makes batching impossible and destroys throughput.

Why t=65537 Is Ideal for N=4096

The value 65537 = 216 + 1 is the fifth Fermat prime (F4). For N=4096, the CRT condition requires t ≡ 1 (mod 8192). We can verify: 65537 = 8 × 8192 + 1, so 65537 ≡ 1 (mod 8192). The condition is satisfied, and we get the full 4,096 SIMD slots.

Beyond satisfying the CRT condition, t=65537 has several desirable properties:

Fermat Prime

Being a Fermat prime means modular reduction mod t can be done with shifts and adds (216 + 1 has special structure), though this is secondary to NTT optimizations in practice.

16-bit Data Range

Each slot holds values 0 to 65536, which is sufficient for quantized biometric features, integer scores, and most discrete data types.

Primitive Root Exists

For NTT-based encoding, we need a primitive 2N-th root of unity modulo t. For t=65537 and N=4096, this root is ψ=6561, enabling efficient polynomial evaluation in the plaintext domain.

Moderate Noise Growth

Noise in BFV scales with t (specifically, the noise after multiplication is proportional to t). A 17-bit t keeps this manageable for depth-1 circuits without wasting the noise budget.

The Noise-Depth Trade-off

Larger plaintext moduli cause faster noise growth. In BFV, after a ciphertext-plaintext multiplication, the noise grows by a factor proportional to the plaintext coefficient norms, which are bounded by t. After a ciphertext-ciphertext multiplication, the noise grows by roughly t · (current_noise)2. This means:

Common Mistake: Choosing t Too Large

Developers accustomed to 32-bit or 64-bit integers often pick a large plaintext modulus "just in case." This is backwards thinking. Every extra bit in t cuts into your noise budget, potentially forcing you to increase N or Q. Quantize your data down to the minimum necessary precision and pick the smallest t that fits.

CKKS Scale (Δ) Selection

Unlike BFV/BGV, the CKKS scheme works with approximate arithmetic on real (or complex) numbers. Instead of a plaintext modulus, CKKS uses a scale factor Δ to encode floating-point values into integer polynomials. A real number x is encoded as round(x · Δ), and after decryption and division by the appropriate scale, you recover x with precision proportional to log2(Δ).

How Scale Interacts with Computation

After encoding at scale Δ, two ciphertexts multiplied together produce a result at scale Δ2. This double-scale result must be rescaled back to Δ by dividing the ciphertext by Δ and dropping one RNS prime. This is why CKKS requires a chain of primes: each multiplication consumes one prime from the chain.

The bit budget for a CKKS computation works as follows:

BudgetCKKS bit budget allocation
// Total bit budget = log2(Q) ≤ security bound for N
//
// Allocation:
//   Special prime:              ~60 bits  (key switching)
//   Each computation level:     ~40 bits  (rescale prime = scale)
//   Final level:                ~60 bits  (initial encoding scale)
//
// Example: N=16384, 128-bit security, max Q = 438 bits
//   Chain: [60, 40, 40, 40, 40, 40, 40, 40, 40, 60] = 440 bits
//   Available: 8 multiplications (8 x 40-bit rescale primes)
//   But 440 > 438, so drop one level:
//   Chain: [60, 40, 40, 40, 40, 40, 40, 40, 60] = 400 bits ✓
//   Available: 7 multiplications

Precision Considerations

The scale Δ determines your encoding precision: a 40-bit scale gives you roughly 12 decimal digits of precision (log10(240) ≈ 12). However, each multiplication introduces a small amount of approximation error, and this error accumulates through the circuit. For deep circuits (5+ multiplications), precision degradation can become the limiting factor, even when noise budget remains.

Practical guidelines for scale selection:

For biometric matching (which H33 performs with BFV, not CKKS), CKKS could theoretically work by encoding floating-point feature vectors directly. However, the rescaling overhead and precision management complexity make BFV with quantized integers faster and simpler for depth-1 inner products. Scheme selection matters—see our BFV vs CKKS comparison for the full analysis.

Noise Budget Analysis

Every FHE ciphertext carries noise that grows with each homomorphic operation. When the noise exceeds the ciphertext's capacity (determined by Q and t), decryption fails. Noise budget is the remaining margin between current noise and the decryption failure threshold, measured in bits. Ensuring adequate noise budget is the central engineering challenge in FHE parameter selection.

Noise Growth per Operation

OperationNoise GrowthBudget Cost (approx.)
Fresh encryptionInitial noise σ~log2(σ·N) bits consumed
Addition (ct + ct)Additive (small)~1 bit
Plaintext multiply (ct × pt)Multiplicative by ||pt||~log2(||pt||) bits
Ciphertext multiply (ct × ct)Quadratic growth~log2(t·N) + current noise bits
Rotation (Galois)Similar to key switch~log2(N·decomp_base) bits
Mod switchingReduces noiseRecovers ~log2(qi) bits
RelinearizationSmall additive~5–15 bits (decomp. dependent)

The critical distinction is between additive noise growth (additions, which barely increase noise) and multiplicative noise growth (multiplications, which roughly square the noise). This is why multiplicative depth is the primary driver of parameter requirements, not the total number of operations.

Computing Your Noise Budget

The initial noise budget for a fresh ciphertext in BFV is approximately:

Budgetinitial ≈ log2(Q/t) − log2(σ·N)

For H33's parameters (N=4096, Q=56 bits, t=65537 ≈ 17 bits, σ=3.2):

H33 Noise Budget Calculation

log2(Q)56 bits
log2(t)17 bits
log2(σ · N) = log2(3.2 × 4096)~14 bits
Initial budget ≈ 56 − 17 − 14~25 bits

Worked Example: Biometric Inner Product

H33's biometric matching computes an encrypted inner product: d = Σ(a[i] · b[i]) for 128-dimensional vectors, where a is encrypted and b is a plaintext enrolled template. Here is the noise budget trace:

Step 1: Fresh Encryption

Encrypt the query vector a into a ciphertext. Budget remaining: ~25 bits.

Step 2: Plaintext Multiply (128 slot-wise multiplications)

Multiply encrypted a by plaintext template b. Each slot independently computes a[i]·b[i]. Noise grows by ~log2(max(b[i])). With quantized templates where values ≤ 256, this costs ~8 bits. Budget remaining: ~17 bits.

Step 3: Accumulate (slot rotations + additions)

Sum across 128 dimensions using rotate-and-add. Each rotation adds minor key-switching noise (~1–2 bits). We need log2(128)=7 rotations. Total rotation cost: ~10 bits. Budget remaining: ~7 bits.

Step 4: Decrypt

7 bits of remaining budget is above the minimum threshold (noise must be less than Q/2t to decrypt correctly). Decryption succeeds. For H33, we actually use a fused NTT-domain inner product that avoids most rotations, keeping an even larger margin.

H33 Optimization: NTT-Domain Fused Inner Product

Rather than computing 128 individual multiply-and-accumulate steps with rotations, H33 encodes 32 users' templates in NTT form at enrollment time and performs a single fused inner product followed by one INTT. This eliminates all per-dimension rotations, preserving far more noise budget and reducing latency from what would be ~4ms down to ~1,375μs for the entire 32-user batch. See our performance optimization guide for the full technique.

Safety Margin

Never design for exactly zero remaining budget. In practice, you need a safety margin for several reasons:

A good rule of thumb is to target at least 5–10 bits of remaining budget after your complete circuit, and to validate empirically by running your full computation on thousands of random inputs and checking that decryption never fails.

Practical Decision Flowchart

Given the interdependencies between parameters, a systematic approach is essential. Here is the decision process we recommend, and the one we followed for H33:

Fix Your Security Level

Start with 128-bit security unless regulatory or contractual requirements demand 192 or 256. This immediately gives you the N-vs-Q constraint table to work within.

Map Your Computation Circuit

Express your target computation as a sequence of homomorphic operations. Count the multiplicative depth (longest chain of multiplications from input to output). For H33: inner product = 1 plaintext multiplication + additions = depth 1.

Choose Your Scheme

Integer data with exact results → BFV or BGV. Floating-point data with approximate results → CKKS. If you can quantize floats to integers without losing required precision, prefer BFV—it is simpler and avoids CKKS rescaling overhead.

Determine Minimum Q

Based on multiplicative depth, compute the minimum log2(Q) needed to support your computation with adequate noise budget. For single-depth BFV: Q needs ~40–60 bits. For multi-depth CKKS: sum up prime sizes as shown in the CKKS section.

Select N from the Constraint Table

Find the smallest power-of-two N where the max Q at your security level exceeds your required Q. Also verify N provides enough SIMD slots for your batching needs. If slots are insufficient, increase N.

Choose t (BFV/BGV) or Δ (CKKS)

For BFV: pick the smallest t that fits your data range and satisfies t ≡ 1 (mod 2N). For CKKS: choose Δ based on required precision (40 bits for ~12 decimal digits).

Validate Noise Budget

Compute the initial noise budget and trace it through your circuit. Ensure ≥5 bits remaining at the end. If not, increase Q (which may force increasing N to maintain security).

Benchmark and Test

Run your full pipeline on representative data. Verify correctness on 10,000+ random inputs. Measure throughput. If too slow, look for algorithmic optimizations before increasing parameters.

Common Parameter Sets

Below are reference parameter configurations for common FHE use cases, validated against the HomomorphicEncryption.org standard at 128-bit security:

Use CaseSchemeNlog2(Q)t or ΔDepthNotes
Auth / matching (H33)BFV409656t=65537132 users/CT, ~50μs/auth
Simple comparisonBFV4096109t=655372–3Equality, range checks
Logistic regressionCKKS8192218Δ=2404–5Polynomial activation approx.
Neural net inferenceCKKS16384438Δ=2408–12CNN or small transformer
Private set intersectionBFV4096109t=655371–2Hash-based PSI protocol
Encrypted search (PIR)BFV4096109t=655371–2Single-server PIR, large DB
Bootstrapping (CKKS)CKKS65536~1750Δ=245unlimited~10s per bootstrap
Encrypted DB aggregationBFV8192218t=655373–5SUM, COUNT, filtered queries

Notice the pattern: the vast majority of practical applications cluster around N=4096 or N=8192. The exotic large-N regimes (N=32768, N=65536) are needed only for bootstrapping or extremely deep circuits. If you find yourself reaching for N=16384+, first ask whether you can restructure your computation to be shallower.

Error Distribution and Sampling

The error (noise) distribution is the mathematical foundation of FHE security. During encryption, noise sampled from this distribution is added to the ciphertext, and it is this noise that makes the RLWE problem hard. The standard choice is a discrete Gaussian with standard deviation σ = 3.2, as specified by the HomomorphicEncryption.org consortium.

In practice, most implementations use a centered binomial distribution (CBD) as a computationally efficient approximation to the discrete Gaussian. CBD sampling is constant-time (no rejection sampling), cache-friendly, and produces nearly identical security estimates. H33 uses batch CBD sampling—generating 10 coefficients per RNG call—which is 5x faster than individual coefficient sampling while maintaining the correct noise distribution.

You should almost never change σ from the standard value. Reducing σ weakens security (the lattice problem becomes easier). Increasing σ wastes noise budget (more of Q is consumed by initial noise, leaving less for computation). The standard value of 3.2 is the community consensus for optimal balance.

Validation Tools

Never trust hand-computed security estimates. Always validate your parameter choices with established tools:

Lattice Estimator

The gold standard for FHE security analysis. A SageMath library by Albrecht et al. that evaluates all known attacks (primal, dual, hybrid, BKW) against your exact parameters. Run LWE.estimate() with your N, Q, and error distribution to get a precise security level.

Microsoft SEAL

SEAL's parameter validation automatically checks your chosen parameters against the HomomorphicEncryption.org standard and warns if they are insecure. The CoeffModulus.MaxBitCount(N) function returns the maximum Q for each N at each security level.

OpenFHE Parameter Generator

OpenFHE (formerly PALISADE) includes CryptoContext::EvalMultKeysGen() workflows that automatically configure RNS chains for your target depth. Useful for rapid prototyping when you are not yet optimizing parameters manually.

HElib Parameter Selection

HElib provides the FindM() function that selects a good cyclotomic polynomial for BGV given target security, number of slots, and computation depth. It handles the CRT condition and NTT compatibility automatically.

SageMathLattice Estimator validation
# Install: sage -pip install lattice-estimator
from estimator import *

# H33 production parameters
params = LWE.Parameters(
    n=4096,
    q=2**56,
    Xs=ND.UniformMod(3),  # ternary secret
    Xe=ND.DiscreteGaussian(3.2)
)

# Evaluate all known attacks
result = LWE.estimate(params)
# Expected output: all attacks require ≥ 2^128 operations
# usvp: rop = 2^141.2, bkw: rop = 2^178.3, ...

Anti-Patterns and Common Mistakes

Having worked with dozens of FHE deployments, we have catalogued the most common parameter selection mistakes. Each of these has cost real engineering teams weeks of debugging and performance tuning.

Anti-Pattern 1: Over-Provisioning N

Using N=16384 when N=4096 suffices. This is by far the most common mistake. Developers choose large N "to be safe" without checking whether their Q fits within a smaller ring. Impact: ~16x slowdown for zero security benefit. Every NTT, every polynomial operation, every memory allocation scales super-linearly with N.

Anti-Pattern 2: Under-Provisioning Noise Budget

Choosing the minimum Q that barely provides enough noise budget for the average case. FHE noise is probabilistic—some encryptions produce more noise than others. Without a safety margin, you get intermittent decryption failures: correct results 99% of the time, garbage 1% of the time. Fix: Always maintain ≥5 bits of remaining budget and test on thousands of random inputs.

Anti-Pattern 3: Ignoring SIMD Batching

Processing one data element per ciphertext when the ring structure supports thousands of parallel slots. If N=4096 and you are only using 1 slot, you are wasting 4,095 free parallel computations. Impact: Up to 4096x throughput loss. H33 packs 32 complete 128-dimensional users into each ciphertext, achieving 32x the throughput of single-user processing.

Anti-Pattern 4: Using the Wrong Scheme

Choosing CKKS for integer data or BFV for floating-point data. CKKS is approximate—every operation loses a small amount of precision. If your computation requires exact integer results (e.g., equality checks, threshold comparisons), CKKS will eventually produce wrong answers. Conversely, using BFV for floating-point data requires quantization, which is fine for shallow circuits but adds complexity for deep ones. Rule: Exact integers → BFV/BGV. Approximate reals → CKKS.

Anti-Pattern 5: Multi-Prime Q for Depth-1 Circuits

Using an RNS chain of 3–4 primes when a single prime suffices. Each additional RNS limb adds overhead for basis conversion, CRT reconstruction, and inter-limb synchronization. For depth-1 circuits, this overhead is pure waste. Impact: 30–50% slower than single-prime implementation. H33's single 56-bit modulus eliminates all RNS machinery.

Anti-Pattern 6: Skipping Lattice Estimator Validation

Relying on rule-of-thumb N/log(Q) ratios instead of running the actual Lattice Estimator. The security landscape evolves—new attacks periodically improve, and estimates from 5 years ago may be optimistic. Fix: Always validate with the latest Lattice Estimator release before deploying to production.

H33 Production Case Study

Let us walk through H33's complete parameter selection process from first principles, showing how the methodology described above led to our production configuration.

The Application: Biometric Authentication

H33 performs privacy-preserving biometric authentication using FHE. Users submit encrypted biometric feature vectors; we compute encrypted distance metrics against enrolled templates without ever seeing the plaintext biometrics. The core computation is an inner product over 128-dimensional integer vectors.

Step 1: Security Level

We target 128-bit security. Our threat model assumes a well-resourced attacker with access to the ciphertexts (which are transmitted over the network) but not the secret key (which never leaves the client). 128 bits provides a comfortable margin—the best known lattice attacks would require approximately 2128 operations, which is infeasible for any attacker for the foreseeable future. As a bonus, RLWE at 128-bit classical security also provides post-quantum security at approximately the same level.

Step 2: Circuit Analysis

Rustbiometric_auth/encrypted_biometric.rs (simplified)
// Biometric inner product circuit
// Input: encrypted query vector q[0..127], plaintext template t[0..127]
// Output: encrypted distance = sum(q[i] * t[i])

// Operation 1: Element-wise multiply (ct × pt)
//   Multiplicative depth: 1
//   Noise growth: proportional to max(t[i])

// Operation 2: Accumulate across dimensions (additions)
//   Multiplicative depth: still 1 (additions don't increase depth)
//   Noise growth: minimal (additive)

// Total multiplicative depth: 1
// No ciphertext × ciphertext multiplication needed
// No relinearization needed
// No modulus switching needed

Conclusion: Depth 1, plaintext-multiply only. This is the shallowest non-trivial FHE circuit possible.

Step 3: Scheme Selection

Our biometric features are quantized to 16-bit integers. Exact arithmetic is required (the distance threshold comparison must be deterministic). BFV is the clear choice over CKKS: no rescaling needed, no precision management, exact integer results.

Step 4: Minimum Q

For a depth-1 BFV circuit with t=65537, we need enough noise budget to absorb initial encryption noise plus one plaintext multiplication plus accumulation. Back-of-envelope: ~56 bits of Q gives ~25 bits of initial budget, which comfortably handles our circuit. A 56-bit prime also fits perfectly in a 64-bit machine word, enabling single-word Montgomery multiplication in the NTT hot path.

Step 5: Select N

Consulting the constraint table: at 128-bit security, N=4096 allows up to 109 bits of Q. Our 56-bit Q is well within budget. N=4096 gives 4,096 SIMD slots, which divided by 128 dimensions = 32 users per ciphertext—exactly the batch size that maximizes our throughput pipeline.

Step 6: Choose t

We need t ≡ 1 (mod 8192) for full batching with N=4096. We need t > max feature value (features are quantized to [0, 255], but inner product results can reach 128 × 2552 ≈ 8.3M, which fits in a 24-bit range). t=65537 satisfies both conditions: 65537 mod 8192 = 1, and 65537 > 255 (with inner product results handled modulo t via the CRT structure of batched computation). Verified: primitive 8192th root of unity modulo 65537 is ψ=6561.

Step 7: Validate

Lattice Estimator confirms ≥128-bit security. Noise budget analysis confirms ≥7 bits remaining after full circuit. Empirical testing on 1 million random biometric vectors: zero decryption failures.

Production Results

H33 Production Benchmarks (Graviton4, c8g.metal-48xl)

FHE batch (32 users)1,375 μs
Per-auth latency~50 μs
Sustained throughput (96 workers)1.2M auth/sec
FHE-only ceiling1.29M auth/sec
Memory per user (SIMD-batched)~256 KB

The performance numbers above are a direct consequence of parameter selection. The single 56-bit modulus enables Montgomery NTT with Harvey lazy reduction and no RNS overhead. The N=4096 ring fits entirely in L1 cache on Graviton4 (4096 × 8 bytes = 32KB per polynomial), enabling cache-resident NTT computation. The 32-user batching fills every SIMD slot, amortizing the fixed costs of encryption and NTT across the maximum number of useful computations.

Why Single Modulus Beats Chain for Depth-1

We benchmarked H33's pipeline with both single-prime and multi-prime (RNS chain) Q configurations. The multi-prime version used two 30-bit primes (total 60 bits, similar budget). Results: the single-prime version was 38% faster. The multi-prime overhead came from: (1) running two parallel NTTs per polynomial instead of one, (2) CRT reconstruction during decrypt, and (3) additional memory traffic for the second limb. For depth-1, every one of these costs is pure waste.

Advanced: Maximizing Batching Efficiency

SIMD batching is the most underutilized performance lever in FHE. When configured correctly, it provides a throughput multiplier equal to the number of independent data elements packed per ciphertext—essentially free parallelism.

Slot Packing Strategies

Given N SIMD slots and a data element that occupies d slots, you can pack N/d independent elements per ciphertext. The goal is to choose d and N so that N/d aligns with your application's natural batch size:

The key principle is to never leave slots empty. Every empty slot is a wasted computation: the NTT, encryption, and homomorphic operations process all N coefficients regardless of how many slots contain meaningful data. If your batch size does not divide N evenly, pad with dummy data rather than reducing the batch.

Template Storage Reduction

SIMD batching also dramatically reduces storage requirements. For H33's biometric system, a single user's 128-dimensional template at 64-bit precision would normally require 128 × 8 = 1,024 bytes in plaintext, but the encrypted ciphertext (two polynomials of degree 4096, each with 56-bit coefficients) would be approximately 32KB. Without batching, that is 32KB per user. With 32-user batching, the same 32KB ciphertext covers 32 users—effectively 1KB per user for the ciphertext component. Combined with NTT-form template storage, H33 achieves ~256KB per user total storage, a 128x reduction from naive per-user encryption.

Parameter Migration and Future-Proofing

FHE parameters are baked into every ciphertext at encryption time. You cannot retroactively change the parameters of encrypted data. This creates a versioning challenge: what happens when you need to upgrade parameters due to improved attacks, new computational requirements, or updated standards?

Migration Strategies

The best strategy is to avoid migration entirely by choosing robust initial parameters. This is another argument for leaving security margin: the ~10% performance cost of a slightly-larger-than-minimum Q is far less than the engineering cost of a parameter migration across millions of encrypted records.

Scheme-Specific Optimization Tips

BFV Specific

BGV Specific

CKKS Specific

Conclusion

FHE parameter selection is the foundation upon which everything else builds. The wrong parameters produce a system that is either insecure or impractically slow, and there is no way to fix parameters after data has been encrypted. The good news is that the parameter space, while complex, is well-understood. The HomomorphicEncryption.org standard tables, the Lattice Estimator, and the systematic decision process outlined in this guide give you a reliable path from application requirements to production-ready parameters.

The key principles are worth repeating:

H33's production pipeline—1.2 million authentications per second, ~50μs per auth, 128-bit post-quantum security—is not the result of exotic hardware or algorithmic breakthroughs. It is the result of tight parameter selection: N=4096, a single 56-bit Q, t=65537, and 32-user SIMD batching. Every parameter was chosen to be the smallest value that satisfies all constraints, with no margin wasted on capabilities we do not need. That discipline is what turns FHE from an academic curiosity into a production-grade technology.

For further reading, explore our guides on FHE performance optimization, the NTT in practice, and lattice cryptography foundations.

Ready to Go Quantum-Secure?

Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.

Get Free API Key →

Build With Post-Quantum Security

Enterprise-grade FHE, ZKP, and post-quantum cryptography. One API call. Sub-millisecond latency.

Get Free API Key → Read the Docs
Free tier · 10,000 API calls/month · No credit card required