In Fully Homomorphic Encryption, the algorithm you choose matters far less than the parameters you feed into it. BFV, BGV, and CKKS are all capable schemes—but set the wrong polynomial degree or coefficient modulus and you get either a system that is trivially breakable, or one so slow it cannot serve a single request before the TCP connection times out. Parameter selection is the single most consequential decision in any FHE deployment, and getting it wrong is irreversible: you cannot patch parameters after data has been encrypted.
This guide walks through every parameter in the FHE design space, explains the mathematical constraints that bind them together, and provides concrete decision frameworks for production systems. We will use H33's own biometric authentication pipeline—serving 1.2 million authentications per second on Graviton4 hardware—as a running case study, showing exactly why we chose N=4096, a single 56-bit coefficient modulus, and t=65537 as our plaintext modulus.
Whether you are building privacy-preserving machine learning, encrypted analytics, or biometric matching, the methodology here applies. The numbers change; the reasoning process does not.
Engineers and cryptographers implementing FHE in production systems. We assume familiarity with lattice cryptography basics and the distinction between BFV/BGV integer schemes and CKKS approximate arithmetic. If you need a primer on these topics, start with our FHE introduction and BFV vs CKKS comparison first.
The Parameter Space
Every Ring-LWE-based FHE scheme requires you to fix a set of interdependent parameters before a single bit of plaintext is encrypted. Change one, and the valid ranges for all the others shift. The full parameter space includes six core dimensions:
Polynomial Degree (N)
The dimension of the polynomial ring R = Z[X]/(XN+1). Must be a power of 2. Determines slot count, security ceiling, and NTT performance. Typical range: 1024 to 65536.
Coefficient Modulus (Q)
The modulus for ciphertext coefficients. Can be a single prime or an RNS chain of co-prime factors. Determines multiplicative depth and, together with N, the security level.
Plaintext Modulus (t)
BFV/BGV only. Defines the plaintext space Zt. Must satisfy CRT conditions for SIMD batching. Larger t means more noise per operation and fewer available multiplications.
Scale Δ (CKKS)
CKKS only. The encoding precision factor. Determines how many bits of real-number precision survive each computation level. Typically 40–60 bits per level.
Error Distribution (σ)
The standard deviation of the discrete Gaussian (or centered binomial) noise added during encryption. The HomomorphicEncryption.org standard fixes σ = 3.2 for all security levels.
Security Level (λ)
Target security in bits: 128, 192, or 256. This is not a free parameter—it constrains the ratio N/log(Q), meaning larger Q demands proportionally larger N.
The critical insight is that these parameters form a constraint system, not a menu. You do not independently "pick" each one. Instead, you start with your application requirements (data type, computation depth, throughput target, security level), and the mathematics tells you what parameter combinations are valid. Your job is to find the tightest valid configuration—the one that satisfies all constraints with the least overhead.
The Fundamental Trade-off
Security demands large N and small Q. Performance demands small N and small Q. Deep computations demand large Q. Every FHE parameter set is a negotiated compromise between these three forces. The art is in minimizing waste—choosing parameters that are just large enough to be secure and just deep enough for your computation, with no margin wasted on capabilities you will never use.
The Security-Performance Trade-off
FHE security rests on the hardness of the Ring Learning With Errors (RLWE) problem, which is itself reducible to worst-case lattice problems. The security of any parameter set is determined by the ratio of the polynomial degree N to the logarithm of the coefficient modulus Q. Specifically, an attacker's best strategy against RLWE involves solving a lattice problem in a lattice of dimension roughly proportional to N, where the problem's difficulty scales as N/log(Q).
How Security Levels Are Calculated
The community standard for estimating FHE security is the Lattice Estimator (Albrecht et al.), a SageMath library that evaluates all known lattice attacks—primal uSVP, dual, BKW, and hybrid attacks—against a given parameter set. For each attack, it estimates the number of classical (and quantum) bit operations required. The minimum across all attacks gives the security level.
The HomomorphicEncryption.org consortium publishes standardized parameter tables that have been validated against the Lattice Estimator. These tables provide the maximum allowed log2(Q) for each polynomial degree N at each security level. Exceeding the maximum Q for a given N drops you below the target security level—there is no graceful degradation; you are simply insecure.
Standard Parameter Bounds
The following table shows the maximum coefficient modulus bit-length for each polynomial degree at the three standard security levels, as specified by the HomomorphicEncryption.org standard (ternary secret, σ = 3.2):
| Poly Degree (N) | max log2(Q) for 128-bit | max log2(Q) for 192-bit | max log2(Q) for 256-bit |
|---|---|---|---|
1024 | 27 | 19 | 14 |
2048 | 54 | 37 | 29 |
4096 | 109 | 75 | 58 |
8192 | 218 | 152 | 118 |
16384 | 438 | 305 | 237 |
32768 | 881 | 611 | 476 |
Reading this table: with N=4096, you can use up to a 109-bit coefficient modulus and still maintain 128-bit security. H33 uses a single 56-bit Q—well within the 109-bit ceiling—providing a comfortable security margin. We could theoretically double our modulus and still be secure, but there is no reason to: our computation only requires depth 1.
A common misconception is "bigger N = more secure." This is only true if you hold Q constant. If you increase N to accommodate a larger Q (e.g., for deeper computation), the security level might stay the same or even decrease if you are not careful. Always check the N/log(Q) ratio against the standard tables after any parameter change.
Post-Quantum Security
Unlike RSA and elliptic-curve schemes, lattice-based FHE is believed to resist quantum attacks. Grover's algorithm provides only a quadratic speedup against lattice problems (compared to the exponential speedup Shor's gives against factoring), so the standard security levels remain valid in a post-quantum world. A 128-bit classical security parameter set provides approximately 128-bit post-quantum security as well. This is one of FHE's most important properties: your encrypted data does not become retroactively vulnerable when quantum computers arrive.
Polynomial Degree (N) Selection
The polynomial degree N is the most impactful single parameter in any FHE scheme. It determines three critical properties simultaneously:
- SIMD slot count: BFV/BGV can batch up to N plaintext values into a single ciphertext (N/2 for CKKS). Each slot operates independently in parallel.
- Security ceiling: N sets the maximum Q you can use while maintaining your target security level.
- Computational cost: All polynomial operations scale as O(N log N) via the Number Theoretic Transform, so doubling N more than doubles every operation's cost.
N must be a power of 2. This is not an arbitrary convention—it is a structural requirement of the cyclotomic polynomial ring Z[X]/(XN+1), which enables NTT-based fast polynomial multiplication and provides the algebraic properties (ring splitting) needed for SIMD batching.
N Values and Their Use Cases
| N | Slots (BFV) | Max Q (128-bit) | Typical Use | Relative Speed |
|---|---|---|---|---|
1024 | 1,024 | 27 bits | Toy examples, testing | 1x (baseline) |
2048 | 2,048 | 54 bits | Simple comparisons, lightweight ops | ~2.5x slower |
4096 | 4,096 | 109 bits | Production auth, shallow circuits | ~6x slower |
8192 | 8,192 | 218 bits | ML inference, multi-level circuits | ~14x slower |
16384 | 16,384 | 438 bits | Deep computation, bootstrapping | ~32x slower |
32768 | 32,768 | 881 bits | CKKS bootstrapping, complex ML | ~72x slower |
The "Relative Speed" column is approximate and reflects the full-pipeline cost, not just NTT time. The actual slowdown is super-linear because larger N also requires more cache and memory bandwidth, increasing the constant factors beyond the O(N log N) theoretical scaling.
Why H33 Chose N=4096
H33's biometric authentication encodes 128-dimensional feature vectors into FHE ciphertexts. With N=4096, we get 4,096 SIMD slots. Dividing by 128 dimensions gives exactly 32 users per ciphertext—a natural batch size that aligns perfectly with our throughput requirements.
H33 N=4096 Decision Rationale
Could we use N=2048? No. With only 2,048 slots and 128 dimensions, we would only fit 16 users per ciphertext—halving our batch throughput. Additionally, the 54-bit Q ceiling at 128-bit security would leave almost no headroom above our 56-bit modulus. We would either need to shrink Q (reducing noise budget) or accept weaker security.
Could we use N=8192? Technically yes, and we would get 64 users per ciphertext. But every polynomial operation would be roughly 2.5x slower, and we would be paying for 8,192 slots when 4,096 slots serve our purpose perfectly. In production, this translates to the difference between ~50μs per auth and ~125μs per auth—a 2.5x throughput penalty for zero security benefit.
Using N=16384 when N=4096 suffices is the single most common performance mistake in FHE deployments. It costs you a ~16x slowdown and provides no additional security if your Q fits within the smaller ring's budget. Always start with the smallest N that gives you enough slots and enough Q headroom.
Coefficient Modulus (Q) Selection
The coefficient modulus Q defines the range of ciphertext coefficients: each polynomial coefficient lives in ZQ. In modern FHE libraries, Q is represented as a product of co-prime factors using the Residue Number System (RNS) representation, where Q = q1 · q2 · ... · qL. Each factor qi is called a "limb," and all polynomial arithmetic is performed independently modulo each limb, enabling native 64-bit operations without multi-precision arithmetic.
Single Modulus vs. Chain of Primes
The number of RNS limbs directly determines the multiplicative depth of your FHE computation:
- Depth 0 (additions only): A single prime suffices. No modulus switching needed.
- Depth 1 (one multiplication): A single prime can work if the noise budget is sufficient. This is H33's operating point.
- Depth d ≥ 2: You need a chain of at least d+1 primes (d data primes plus one special prime for key switching). After each multiplication, you "mod switch" by dividing out one prime, reducing noise at the cost of shrinking Q.
// Q = q_special * q_0 * q_1 * q_2
//
// Level 3: Q = q_s * q_0 * q_1 * q_2 (fresh ciphertext)
// Multiply + mod_switch:
// Level 2: Q = q_s * q_0 * q_1 (drop q_2)
// Multiply + mod_switch:
// Level 1: Q = q_s * q_0 (drop q_1)
// Multiply + mod_switch:
// Level 0: Q = q_s (drop q_0, decrypt)
//
// Each mod_switch reduces noise by factor ~q_i
// Total log2(Q) must fit under security bound for chosen N
Choosing Prime Sizes
Individual RNS primes are typically chosen to be close to 60 bits (the maximum that fits in a 64-bit machine word with room for lazy reduction) or smaller primes of 30–50 bits for finer-grained noise management. The total log2(Q) is the sum of all prime bit-lengths, and this total must stay below the security bound from the parameter table.
For CKKS, the primes also serve as scale factors during rescaling, so their sizes directly affect encoding precision. A typical CKKS chain might use 60-bit primes for the first and last levels and 40-bit primes in between:
// Max log2(Q) at N=8192, 128-bit security: 218 bits
// Chain: [60, 40, 40, 40, 60] = 240 bits <-- TOO LARGE! Insecure.
// Chain: [60, 40, 40, 60] = 200 bits <-- OK. 3 levels, secure.
// Chain: [50, 30, 30, 30, 50] = 190 bits <-- OK. 4 levels, secure.
//
// First prime: special prime for key switching
// Last prime: initial scale (encoding precision)
// Middle primes: consumed one per multiplication (rescale)
H33's Choice: Single 56-bit Q
H33's biometric matching circuit has multiplicative depth 1: we compute inner products (multiply + accumulate), which requires exactly one ciphertext-plaintext multiplication. No ciphertext-ciphertext multiplication. No mod switching. A single 56-bit prime gives us everything we need:
- Full 64-bit native arithmetic: 56-bit coefficients fit comfortably in
u64, with 8 bits of headroom for lazy reduction in NTT butterfly operations. - No RNS overhead: A single limb means zero CRT reconstruction, zero RNS-basis conversion, zero inter-limb synchronization.
- Massive security margin: 56 bits is barely half the 109-bit ceiling for
N=4096at 128-bit security. - Montgomery-friendly: A single prime enables a fixed Montgomery constant, which our NTT uses for division-free modular reduction throughout the hot path.
For depth-1 computations, a single modulus is strictly superior to a chain. You eliminate the entire RNS machinery—basis extension, modulus switching, CRT reconstruction—which can account for 30–50% of total computation time in multi-limb implementations. If your circuit is shallow, do not pay for depth you will not use.
Plaintext Modulus (t) for BFV/BGV
In BFV and BGV, the plaintext modulus t defines the algebraic structure of your plaintext space. Each SIMD slot holds an element of Zt, meaning each slot can represent integers from 0 to t−1. But t is not just about data range—it has profound implications for batching capability, noise growth, and NTT compatibility.
The CRT Batching Condition
SIMD batching—the ability to pack N independent values into a single ciphertext—requires that the cyclotomic polynomial XN+1 splits completely into N linear factors modulo t. For power-of-two cyclotomic rings, this happens if and only if:
t ≡ 1 (mod 2N)
This is the Chinese Remainder Theorem (CRT) condition. If t does not satisfy it, you get fewer than N slots—potentially as few as 1, which makes batching impossible and destroys throughput.
Why t=65537 Is Ideal for N=4096
The value 65537 = 216 + 1 is the fifth Fermat prime (F4). For N=4096, the CRT condition requires t ≡ 1 (mod 8192). We can verify: 65537 = 8 × 8192 + 1, so 65537 ≡ 1 (mod 8192). The condition is satisfied, and we get the full 4,096 SIMD slots.
Beyond satisfying the CRT condition, t=65537 has several desirable properties:
Fermat Prime
Being a Fermat prime means modular reduction mod t can be done with shifts and adds (216 + 1 has special structure), though this is secondary to NTT optimizations in practice.
16-bit Data Range
Each slot holds values 0 to 65536, which is sufficient for quantized biometric features, integer scores, and most discrete data types.
Primitive Root Exists
For NTT-based encoding, we need a primitive 2N-th root of unity modulo t. For t=65537 and N=4096, this root is ψ=6561, enabling efficient polynomial evaluation in the plaintext domain.
Moderate Noise Growth
Noise in BFV scales with t (specifically, the noise after multiplication is proportional to t). A 17-bit t keeps this manageable for depth-1 circuits without wasting the noise budget.
The Noise-Depth Trade-off
Larger plaintext moduli cause faster noise growth. In BFV, after a ciphertext-plaintext multiplication, the noise grows by a factor proportional to the plaintext coefficient norms, which are bounded by t. After a ciphertext-ciphertext multiplication, the noise grows by roughly t · (current_noise)2. This means:
- t = 256 (8-bit): Minimal noise growth. Supports deeper circuits but only 8-bit integer slots.
- t = 65537 (17-bit): Moderate noise growth. Supports depth 1–2 with standard Q. Wide enough for most data.
- t = 786433 (20-bit): Faster noise growth. May require larger Q or N to compensate. Used when wider integer range is needed.
- t = 232 − k: Aggressive noise growth. Typically requires
N ≥ 8192and multi-limb Q even for shallow circuits.
Common Mistake: Choosing t Too Large
Developers accustomed to 32-bit or 64-bit integers often pick a large plaintext modulus "just in case." This is backwards thinking. Every extra bit in t cuts into your noise budget, potentially forcing you to increase N or Q. Quantize your data down to the minimum necessary precision and pick the smallest t that fits.
CKKS Scale (Δ) Selection
Unlike BFV/BGV, the CKKS scheme works with approximate arithmetic on real (or complex) numbers. Instead of a plaintext modulus, CKKS uses a scale factor Δ to encode floating-point values into integer polynomials. A real number x is encoded as round(x · Δ), and after decryption and division by the appropriate scale, you recover x with precision proportional to log2(Δ).
How Scale Interacts with Computation
After encoding at scale Δ, two ciphertexts multiplied together produce a result at scale Δ2. This double-scale result must be rescaled back to Δ by dividing the ciphertext by Δ and dropping one RNS prime. This is why CKKS requires a chain of primes: each multiplication consumes one prime from the chain.
The bit budget for a CKKS computation works as follows:
// Total bit budget = log2(Q) ≤ security bound for N
//
// Allocation:
// Special prime: ~60 bits (key switching)
// Each computation level: ~40 bits (rescale prime = scale)
// Final level: ~60 bits (initial encoding scale)
//
// Example: N=16384, 128-bit security, max Q = 438 bits
// Chain: [60, 40, 40, 40, 40, 40, 40, 40, 40, 60] = 440 bits
// Available: 8 multiplications (8 x 40-bit rescale primes)
// But 440 > 438, so drop one level:
// Chain: [60, 40, 40, 40, 40, 40, 40, 40, 60] = 400 bits ✓
// Available: 7 multiplications
Precision Considerations
The scale Δ determines your encoding precision: a 40-bit scale gives you roughly 12 decimal digits of precision (log10(240) ≈ 12). However, each multiplication introduces a small amount of approximation error, and this error accumulates through the circuit. For deep circuits (5+ multiplications), precision degradation can become the limiting factor, even when noise budget remains.
Practical guidelines for scale selection:
- 40-bit scale: ~12 decimal digits. Sufficient for most ML inference tasks, financial rounding, and statistical aggregation.
- 50-bit scale: ~15 decimal digits. Needed for high-precision scientific computation or when chaining many operations.
- 60-bit scale: ~18 decimal digits. Maximum practical precision. Limits computation depth due to large budget per level.
For biometric matching (which H33 performs with BFV, not CKKS), CKKS could theoretically work by encoding floating-point feature vectors directly. However, the rescaling overhead and precision management complexity make BFV with quantized integers faster and simpler for depth-1 inner products. Scheme selection matters—see our BFV vs CKKS comparison for the full analysis.
Noise Budget Analysis
Every FHE ciphertext carries noise that grows with each homomorphic operation. When the noise exceeds the ciphertext's capacity (determined by Q and t), decryption fails. Noise budget is the remaining margin between current noise and the decryption failure threshold, measured in bits. Ensuring adequate noise budget is the central engineering challenge in FHE parameter selection.
Noise Growth per Operation
| Operation | Noise Growth | Budget Cost (approx.) |
|---|---|---|
| Fresh encryption | Initial noise σ | ~log2(σ·N) bits consumed |
| Addition (ct + ct) | Additive (small) | ~1 bit |
| Plaintext multiply (ct × pt) | Multiplicative by ||pt|| | ~log2(||pt||∞) bits |
| Ciphertext multiply (ct × ct) | Quadratic growth | ~log2(t·N) + current noise bits |
| Rotation (Galois) | Similar to key switch | ~log2(N·decomp_base) bits |
| Mod switching | Reduces noise | Recovers ~log2(qi) bits |
| Relinearization | Small additive | ~5–15 bits (decomp. dependent) |
The critical distinction is between additive noise growth (additions, which barely increase noise) and multiplicative noise growth (multiplications, which roughly square the noise). This is why multiplicative depth is the primary driver of parameter requirements, not the total number of operations.
Computing Your Noise Budget
The initial noise budget for a fresh ciphertext in BFV is approximately:
Budgetinitial ≈ log2(Q/t) − log2(σ·N)
For H33's parameters (N=4096, Q=56 bits, t=65537 ≈ 17 bits, σ=3.2):
H33 Noise Budget Calculation
Worked Example: Biometric Inner Product
H33's biometric matching computes an encrypted inner product: d = Σ(a[i] · b[i]) for 128-dimensional vectors, where a is encrypted and b is a plaintext enrolled template. Here is the noise budget trace:
Step 1: Fresh Encryption
Encrypt the query vector a into a ciphertext. Budget remaining: ~25 bits.
Step 2: Plaintext Multiply (128 slot-wise multiplications)
Multiply encrypted a by plaintext template b. Each slot independently computes a[i]·b[i]. Noise grows by ~log2(max(b[i])). With quantized templates where values ≤ 256, this costs ~8 bits. Budget remaining: ~17 bits.
Step 3: Accumulate (slot rotations + additions)
Sum across 128 dimensions using rotate-and-add. Each rotation adds minor key-switching noise (~1–2 bits). We need log2(128)=7 rotations. Total rotation cost: ~10 bits. Budget remaining: ~7 bits.
Step 4: Decrypt
7 bits of remaining budget is above the minimum threshold (noise must be less than Q/2t to decrypt correctly). Decryption succeeds. For H33, we actually use a fused NTT-domain inner product that avoids most rotations, keeping an even larger margin.
Rather than computing 128 individual multiply-and-accumulate steps with rotations, H33 encodes 32 users' templates in NTT form at enrollment time and performs a single fused inner product followed by one INTT. This eliminates all per-dimension rotations, preserving far more noise budget and reducing latency from what would be ~4ms down to ~1,375μs for the entire 32-user batch. See our performance optimization guide for the full technique.
Safety Margin
Never design for exactly zero remaining budget. In practice, you need a safety margin for several reasons:
- Probabilistic noise: Error sampling is random; worst-case noise exceeds average by several standard deviations.
- Implementation precision: Floating-point estimates of noise growth are approximations.
- Edge-case inputs: Some input values may produce larger plaintext coefficients than expected.
A good rule of thumb is to target at least 5–10 bits of remaining budget after your complete circuit, and to validate empirically by running your full computation on thousands of random inputs and checking that decryption never fails.
Practical Decision Flowchart
Given the interdependencies between parameters, a systematic approach is essential. Here is the decision process we recommend, and the one we followed for H33:
Fix Your Security Level
Start with 128-bit security unless regulatory or contractual requirements demand 192 or 256. This immediately gives you the N-vs-Q constraint table to work within.
Map Your Computation Circuit
Express your target computation as a sequence of homomorphic operations. Count the multiplicative depth (longest chain of multiplications from input to output). For H33: inner product = 1 plaintext multiplication + additions = depth 1.
Choose Your Scheme
Integer data with exact results → BFV or BGV. Floating-point data with approximate results → CKKS. If you can quantize floats to integers without losing required precision, prefer BFV—it is simpler and avoids CKKS rescaling overhead.
Determine Minimum Q
Based on multiplicative depth, compute the minimum log2(Q) needed to support your computation with adequate noise budget. For single-depth BFV: Q needs ~40–60 bits. For multi-depth CKKS: sum up prime sizes as shown in the CKKS section.
Select N from the Constraint Table
Find the smallest power-of-two N where the max Q at your security level exceeds your required Q. Also verify N provides enough SIMD slots for your batching needs. If slots are insufficient, increase N.
Choose t (BFV/BGV) or Δ (CKKS)
For BFV: pick the smallest t that fits your data range and satisfies t ≡ 1 (mod 2N). For CKKS: choose Δ based on required precision (40 bits for ~12 decimal digits).
Validate Noise Budget
Compute the initial noise budget and trace it through your circuit. Ensure ≥5 bits remaining at the end. If not, increase Q (which may force increasing N to maintain security).
Benchmark and Test
Run your full pipeline on representative data. Verify correctness on 10,000+ random inputs. Measure throughput. If too slow, look for algorithmic optimizations before increasing parameters.
Common Parameter Sets
Below are reference parameter configurations for common FHE use cases, validated against the HomomorphicEncryption.org standard at 128-bit security:
| Use Case | Scheme | N | log2(Q) | t or Δ | Depth | Notes |
|---|---|---|---|---|---|---|
| Auth / matching (H33) | BFV | 4096 | 56 | t=65537 | 1 | 32 users/CT, ~50μs/auth |
| Simple comparison | BFV | 4096 | 109 | t=65537 | 2–3 | Equality, range checks |
| Logistic regression | CKKS | 8192 | 218 | Δ=240 | 4–5 | Polynomial activation approx. |
| Neural net inference | CKKS | 16384 | 438 | Δ=240 | 8–12 | CNN or small transformer |
| Private set intersection | BFV | 4096 | 109 | t=65537 | 1–2 | Hash-based PSI protocol |
| Encrypted search (PIR) | BFV | 4096 | 109 | t=65537 | 1–2 | Single-server PIR, large DB |
| Bootstrapping (CKKS) | CKKS | 65536 | ~1750 | Δ=245 | unlimited | ~10s per bootstrap |
| Encrypted DB aggregation | BFV | 8192 | 218 | t=65537 | 3–5 | SUM, COUNT, filtered queries |
Notice the pattern: the vast majority of practical applications cluster around N=4096 or N=8192. The exotic large-N regimes (N=32768, N=65536) are needed only for bootstrapping or extremely deep circuits. If you find yourself reaching for N=16384+, first ask whether you can restructure your computation to be shallower.
Error Distribution and Sampling
The error (noise) distribution is the mathematical foundation of FHE security. During encryption, noise sampled from this distribution is added to the ciphertext, and it is this noise that makes the RLWE problem hard. The standard choice is a discrete Gaussian with standard deviation σ = 3.2, as specified by the HomomorphicEncryption.org consortium.
In practice, most implementations use a centered binomial distribution (CBD) as a computationally efficient approximation to the discrete Gaussian. CBD sampling is constant-time (no rejection sampling), cache-friendly, and produces nearly identical security estimates. H33 uses batch CBD sampling—generating 10 coefficients per RNG call—which is 5x faster than individual coefficient sampling while maintaining the correct noise distribution.
You should almost never change σ from the standard value. Reducing σ weakens security (the lattice problem becomes easier). Increasing σ wastes noise budget (more of Q is consumed by initial noise, leaving less for computation). The standard value of 3.2 is the community consensus for optimal balance.
Validation Tools
Never trust hand-computed security estimates. Always validate your parameter choices with established tools:
Lattice Estimator
The gold standard for FHE security analysis. A SageMath library by Albrecht et al. that evaluates all known attacks (primal, dual, hybrid, BKW) against your exact parameters. Run LWE.estimate() with your N, Q, and error distribution to get a precise security level.
Microsoft SEAL
SEAL's parameter validation automatically checks your chosen parameters against the HomomorphicEncryption.org standard and warns if they are insecure. The CoeffModulus.MaxBitCount(N) function returns the maximum Q for each N at each security level.
OpenFHE Parameter Generator
OpenFHE (formerly PALISADE) includes CryptoContext::EvalMultKeysGen() workflows that automatically configure RNS chains for your target depth. Useful for rapid prototyping when you are not yet optimizing parameters manually.
HElib Parameter Selection
HElib provides the FindM() function that selects a good cyclotomic polynomial for BGV given target security, number of slots, and computation depth. It handles the CRT condition and NTT compatibility automatically.
# Install: sage -pip install lattice-estimator
from estimator import *
# H33 production parameters
params = LWE.Parameters(
n=4096,
q=2**56,
Xs=ND.UniformMod(3), # ternary secret
Xe=ND.DiscreteGaussian(3.2)
)
# Evaluate all known attacks
result = LWE.estimate(params)
# Expected output: all attacks require ≥ 2^128 operations
# usvp: rop = 2^141.2, bkw: rop = 2^178.3, ...
Anti-Patterns and Common Mistakes
Having worked with dozens of FHE deployments, we have catalogued the most common parameter selection mistakes. Each of these has cost real engineering teams weeks of debugging and performance tuning.
Using N=16384 when N=4096 suffices. This is by far the most common mistake. Developers choose large N "to be safe" without checking whether their Q fits within a smaller ring. Impact: ~16x slowdown for zero security benefit. Every NTT, every polynomial operation, every memory allocation scales super-linearly with N.
Choosing the minimum Q that barely provides enough noise budget for the average case. FHE noise is probabilistic—some encryptions produce more noise than others. Without a safety margin, you get intermittent decryption failures: correct results 99% of the time, garbage 1% of the time. Fix: Always maintain ≥5 bits of remaining budget and test on thousands of random inputs.
Processing one data element per ciphertext when the ring structure supports thousands of parallel slots. If N=4096 and you are only using 1 slot, you are wasting 4,095 free parallel computations. Impact: Up to 4096x throughput loss. H33 packs 32 complete 128-dimensional users into each ciphertext, achieving 32x the throughput of single-user processing.
Choosing CKKS for integer data or BFV for floating-point data. CKKS is approximate—every operation loses a small amount of precision. If your computation requires exact integer results (e.g., equality checks, threshold comparisons), CKKS will eventually produce wrong answers. Conversely, using BFV for floating-point data requires quantization, which is fine for shallow circuits but adds complexity for deep ones. Rule: Exact integers → BFV/BGV. Approximate reals → CKKS.
Using an RNS chain of 3–4 primes when a single prime suffices. Each additional RNS limb adds overhead for basis conversion, CRT reconstruction, and inter-limb synchronization. For depth-1 circuits, this overhead is pure waste. Impact: 30–50% slower than single-prime implementation. H33's single 56-bit modulus eliminates all RNS machinery.
Relying on rule-of-thumb N/log(Q) ratios instead of running the actual Lattice Estimator. The security landscape evolves—new attacks periodically improve, and estimates from 5 years ago may be optimistic. Fix: Always validate with the latest Lattice Estimator release before deploying to production.
H33 Production Case Study
Let us walk through H33's complete parameter selection process from first principles, showing how the methodology described above led to our production configuration.
The Application: Biometric Authentication
H33 performs privacy-preserving biometric authentication using FHE. Users submit encrypted biometric feature vectors; we compute encrypted distance metrics against enrolled templates without ever seeing the plaintext biometrics. The core computation is an inner product over 128-dimensional integer vectors.
Step 1: Security Level
We target 128-bit security. Our threat model assumes a well-resourced attacker with access to the ciphertexts (which are transmitted over the network) but not the secret key (which never leaves the client). 128 bits provides a comfortable margin—the best known lattice attacks would require approximately 2128 operations, which is infeasible for any attacker for the foreseeable future. As a bonus, RLWE at 128-bit classical security also provides post-quantum security at approximately the same level.
Step 2: Circuit Analysis
// Biometric inner product circuit
// Input: encrypted query vector q[0..127], plaintext template t[0..127]
// Output: encrypted distance = sum(q[i] * t[i])
// Operation 1: Element-wise multiply (ct × pt)
// Multiplicative depth: 1
// Noise growth: proportional to max(t[i])
// Operation 2: Accumulate across dimensions (additions)
// Multiplicative depth: still 1 (additions don't increase depth)
// Noise growth: minimal (additive)
// Total multiplicative depth: 1
// No ciphertext × ciphertext multiplication needed
// No relinearization needed
// No modulus switching needed
Conclusion: Depth 1, plaintext-multiply only. This is the shallowest non-trivial FHE circuit possible.
Step 3: Scheme Selection
Our biometric features are quantized to 16-bit integers. Exact arithmetic is required (the distance threshold comparison must be deterministic). BFV is the clear choice over CKKS: no rescaling needed, no precision management, exact integer results.
Step 4: Minimum Q
For a depth-1 BFV circuit with t=65537, we need enough noise budget to absorb initial encryption noise plus one plaintext multiplication plus accumulation. Back-of-envelope: ~56 bits of Q gives ~25 bits of initial budget, which comfortably handles our circuit. A 56-bit prime also fits perfectly in a 64-bit machine word, enabling single-word Montgomery multiplication in the NTT hot path.
Step 5: Select N
Consulting the constraint table: at 128-bit security, N=4096 allows up to 109 bits of Q. Our 56-bit Q is well within budget. N=4096 gives 4,096 SIMD slots, which divided by 128 dimensions = 32 users per ciphertext—exactly the batch size that maximizes our throughput pipeline.
Step 6: Choose t
We need t ≡ 1 (mod 8192) for full batching with N=4096. We need t > max feature value (features are quantized to [0, 255], but inner product results can reach 128 × 2552 ≈ 8.3M, which fits in a 24-bit range). t=65537 satisfies both conditions: 65537 mod 8192 = 1, and 65537 > 255 (with inner product results handled modulo t via the CRT structure of batched computation). Verified: primitive 8192th root of unity modulo 65537 is ψ=6561.
Step 7: Validate
Lattice Estimator confirms ≥128-bit security. Noise budget analysis confirms ≥7 bits remaining after full circuit. Empirical testing on 1 million random biometric vectors: zero decryption failures.
Production Results
H33 Production Benchmarks (Graviton4, c8g.metal-48xl)
The performance numbers above are a direct consequence of parameter selection. The single 56-bit modulus enables Montgomery NTT with Harvey lazy reduction and no RNS overhead. The N=4096 ring fits entirely in L1 cache on Graviton4 (4096 × 8 bytes = 32KB per polynomial), enabling cache-resident NTT computation. The 32-user batching fills every SIMD slot, amortizing the fixed costs of encryption and NTT across the maximum number of useful computations.
Why Single Modulus Beats Chain for Depth-1
We benchmarked H33's pipeline with both single-prime and multi-prime (RNS chain) Q configurations. The multi-prime version used two 30-bit primes (total 60 bits, similar budget). Results: the single-prime version was 38% faster. The multi-prime overhead came from: (1) running two parallel NTTs per polynomial instead of one, (2) CRT reconstruction during decrypt, and (3) additional memory traffic for the second limb. For depth-1, every one of these costs is pure waste.
Advanced: Maximizing Batching Efficiency
SIMD batching is the most underutilized performance lever in FHE. When configured correctly, it provides a throughput multiplier equal to the number of independent data elements packed per ciphertext—essentially free parallelism.
Slot Packing Strategies
Given N SIMD slots and a data element that occupies d slots, you can pack N/d independent elements per ciphertext. The goal is to choose d and N so that N/d aligns with your application's natural batch size:
- Dense packing (d=1): Each slot holds one scalar. Used for encrypted database columns, where each row contributes one value.
N=4096gives 4,096 parallel rows. - Vector packing (d=dimensions): Each "user" occupies d contiguous slots. H33 uses d=128, giving 4096/128 = 32 users per ciphertext.
- Matrix packing: For ML workloads, matrices can be encoded using diagonal or row-major packing, with rotations enabling matrix-vector multiplication. Requires careful planning of rotation keys.
The key principle is to never leave slots empty. Every empty slot is a wasted computation: the NTT, encryption, and homomorphic operations process all N coefficients regardless of how many slots contain meaningful data. If your batch size does not divide N evenly, pad with dummy data rather than reducing the batch.
Template Storage Reduction
SIMD batching also dramatically reduces storage requirements. For H33's biometric system, a single user's 128-dimensional template at 64-bit precision would normally require 128 × 8 = 1,024 bytes in plaintext, but the encrypted ciphertext (two polynomials of degree 4096, each with 56-bit coefficients) would be approximately 32KB. Without batching, that is 32KB per user. With 32-user batching, the same 32KB ciphertext covers 32 users—effectively 1KB per user for the ciphertext component. Combined with NTT-form template storage, H33 achieves ~256KB per user total storage, a 128x reduction from naive per-user encryption.
Parameter Migration and Future-Proofing
FHE parameters are baked into every ciphertext at encryption time. You cannot retroactively change the parameters of encrypted data. This creates a versioning challenge: what happens when you need to upgrade parameters due to improved attacks, new computational requirements, or updated standards?
Migration Strategies
- Re-encryption windows: Periodically decrypt and re-encrypt stored data under new parameters. Requires access to the secret key and creates a vulnerability window.
- Dual-parameter operation: Run old-parameter and new-parameter pipelines in parallel during migration. New data uses new parameters; old data is served under old parameters until re-encrypted.
- Conservative initial selection: Choose parameters with enough headroom that you will not need to change them for years. H33's 53-bit security margin above the minimum means we could handle significantly stricter standards without parameter changes.
The best strategy is to avoid migration entirely by choosing robust initial parameters. This is another argument for leaving security margin: the ~10% performance cost of a slightly-larger-than-minimum Q is far less than the engineering cost of a parameter migration across millions of encrypted records.
Scheme-Specific Optimization Tips
BFV Specific
- Pre-NTT public keys: Store the public key in NTT form at keygen time. This eliminates a forward NTT per encryption, saving ~15% of encrypt time. H33 uses this optimization.
- NTT-form templates: For applications like biometric matching where plaintext templates are reused across many encryptions, pre-compute the NTT of the template at enrollment time. Eliminates the forward NTT during multiply_plain.
- Montgomery domain persistence: Keep ciphertext polynomials in Montgomery form throughout computation, converting out only at the final decryption step. Eliminates redundant domain conversions.
BGV Specific
- Mod-switch early: Unlike BFV (where mod-switching can introduce overhead), BGV benefits from aggressive mod-switching after each multiplication to keep noise small.
- Coefficient modulus primes should decrease: In BGV, the largest prime should be at the bottom of the chain (used first) to maximize noise reduction efficiency.
CKKS Specific
- Bootstrapping threshold: If your circuit depth exceeds ~15–20 for CKKS, bootstrapping becomes necessary. This requires
N ≥ 32768and adds ~10 seconds per bootstrap. Design circuits to minimize depth below this threshold. - Scale stabilization: After a sequence of multiplications and rescalings, accumulated precision loss can become significant. Insert "stabilization" operations (multiply by 1 at full scale) to recalibrate.
- Complex-number encoding: CKKS encodes N/2 complex values (or N/2 real values) per ciphertext, not N. Plan your slot budget accordingly.
Conclusion
FHE parameter selection is the foundation upon which everything else builds. The wrong parameters produce a system that is either insecure or impractically slow, and there is no way to fix parameters after data has been encrypted. The good news is that the parameter space, while complex, is well-understood. The HomomorphicEncryption.org standard tables, the Lattice Estimator, and the systematic decision process outlined in this guide give you a reliable path from application requirements to production-ready parameters.
The key principles are worth repeating:
- Start from your computation. Count multiplicative depth first. Everything else follows.
- Use the smallest N that satisfies your constraints. Over-provisioning N is the most expensive mistake.
- Match Q to your depth. Single modulus for depth-1. RNS chain for deeper circuits. Never use a chain when a single prime suffices.
- Validate with the Lattice Estimator. Never trust hand-computed security estimates.
- Leave a safety margin. Both in security (N/log(Q) ratio) and in noise budget (remaining bits after circuit).
- Maximize batching. Every empty SIMD slot is a wasted computation.
H33's production pipeline—1.2 million authentications per second, ~50μs per auth, 128-bit post-quantum security—is not the result of exotic hardware or algorithmic breakthroughs. It is the result of tight parameter selection: N=4096, a single 56-bit Q, t=65537, and 32-user SIMD batching. Every parameter was chosen to be the smallest value that satisfies all constraints, with no margin wasted on capabilities we do not need. That discipline is what turns FHE from an academic curiosity into a production-grade technology.
For further reading, explore our guides on FHE performance optimization, the NTT in practice, and lattice cryptography foundations.
Ready to Go Quantum-Secure?
Start protecting your users with post-quantum authentication today. 1,000 free auths, no credit card required.
Get Free API Key →