The Gate Is the Unit of Work
In TFHE (Torus Fully Homomorphic Encryption), the fundamental unit of computation is the Boolean gate. Not a multiplication. Not a polynomial evaluation. A single Boolean gate operating on encrypted bits. AND, OR, NAND, NOR, XOR, XNOR, NOT -- these are the building blocks from which all encrypted computation is constructed.
This is radically different from other FHE schemes. In BFV or CKKS, the natural unit of computation is a polynomial multiplication or an element-wise vector operation. You think in terms of ciphertext-ciphertext multiplications, rotations, and rescaling operations. The abstraction level is high: you are working with vectors of thousands of values packed into a single ciphertext.
In TFHE, you are working at the bit level. Each encrypted bit is an individual LWE ciphertext. Each Boolean gate operates on one or two encrypted bits and produces one encrypted bit. You are building circuits the way a hardware designer builds circuits: from individual logic gates up.
This bit-level granularity gives TFHE two properties that matter enormously in production. First, the cost of any computation is exactly predictable: count the gates, and you know the cost. No surprises, no hidden overhead, no parameter-dependent performance cliffs. Second, the gate count is the natural unit for billing: each programmable bootstrap consumed is metered, and the customer pays for exactly what they use.
This article walks through what it actually looks like to run Boolean gates on encrypted bits in production: the gate types and their costs, the client/server key architecture, a step-by-step walkthrough of an 8-bit greater-than comparison, and the production engine abstractions that make all of this usable.
The Gate Cost Table
Not all gates are created equal. In TFHE, gate cost is measured in programmable bootstraps (PBS), which are the expensive operation that simultaneously reduces noise and applies a Boolean function. Here is the complete cost table:
AND: 1 programmable bootstrap. AND is the workhorse gate. It takes two encrypted bits and produces their logical conjunction. Each AND gate costs one PBS because the combination of two LWE ciphertexts and the subsequent noise refresh requires a full bootstrap operation.
OR: 1 programmable bootstrap. OR is algebraically equivalent to AND with negated inputs (De Morgan's law: A OR B = NOT(NOT A AND NOT B)), but in TFHE it is more efficient to implement OR directly as a single bootstrap with a different lookup table than to compose NOT and AND.
NAND: 1 programmable bootstrap. NAND is the universal gate -- any Boolean function can be built from NAND gates alone. In TFHE, NAND is often the "native" gate: the bootstrap operation naturally computes NAND, and other gates are derived from it.
NOR: 1 programmable bootstrap. Like NAND, NOR is a universal gate. Its cost is identical to AND, OR, and NAND.
XOR: Free. XOR is computed as LWE addition. Adding two LWE ciphertexts that each encrypt a bit produces a ciphertext that encrypts the XOR of those bits. This is a simple vector addition -- no bootstrap required. The noise increases slightly (noise is additive), but the operation itself is essentially costless.
XNOR: Free. XNOR is NOT(XOR), which is LWE addition followed by negation. Both operations are free.
NOT: Free. NOT is LWE negation. Negating an LWE ciphertext flips the encrypted bit. This is a scalar operation on the ciphertext components -- no bootstrap required, negligible noise impact.
The implications are significant. Any circuit that can be designed to maximize XOR and NOT operations while minimizing AND/OR/NAND/NOR operations will be cheaper to evaluate. This is a real circuit optimization problem, and it matters in production because gate count directly determines cost and latency.
The Client/Server Key Architecture
Before any gates can be evaluated, the client must generate and distribute keys. The key architecture in TFHE is designed around a single trust principle: the client holds the secret, and the server holds the evaluation capability.
The LWE secret key (sk). This is a vector of binary values (in the simplest formulation) that the client generates randomly. It is the decryption key. Whoever holds sk can decrypt any ciphertext encrypted under it. The client keeps sk private and never sends it to any server.
The bootstrapping key (BSK). The BSK is a set of GGSW (Gentry-Gama-Sahai-Waters) ciphertexts, each encrypting one bit of the secret key under a different encryption scheme (GLWE -- Generalized Learning With Errors). The BSK enables the server to perform programmable bootstraps: the process of refreshing a noisy LWE ciphertext while applying a Boolean function. The BSK is large (typically several megabytes to tens of megabytes depending on parameters) and is sent to the server during the setup phase.
The key-switching key (KSK). After a programmable bootstrap, the resulting ciphertext is encrypted under the GLWE key. The KSK enables the server to convert this back to an LWE ciphertext under the original secret key. The KSK is also sent to the server during setup.
Together, the BSK and KSK give the server the ability to evaluate any Boolean circuit on encrypted data. The server can chain arbitrary sequences of AND, OR, XOR, NOT, and other gates, producing encrypted results that only the client can decrypt. The server cannot extract any information about the encrypted bits, the intermediate gate outputs, or the final result.
This key split is the foundation of the trust model. The client trusts itself to protect sk. The client does not trust the server with anything except the evaluation keys. The server can compute faithfully but cannot read. This is the correct trust model for enterprise deployments where the data owner (the client) wants to outsource computation to an untrusted server.
Anatomy of a Programmable Bootstrap
Since every non-free gate costs one programmable bootstrap, understanding what a PBS does is essential to understanding TFHE production costs.
A programmable bootstrap takes a noisy LWE ciphertext as input. This ciphertext encrypts a value (a bit or a small integer) under the client's secret key, but the noise in the ciphertext has accumulated from previous operations. If the noise gets too large, decryption will fail -- the ciphertext will decrypt to a wrong value.
The programmable bootstrap performs three operations in a single step:
1. Noise reduction. The output ciphertext has fresh noise at a known, low level, regardless of how much noise the input ciphertext had (as long as it had not already exceeded the decryption threshold). This is the "bootstrap" part: it refreshes the ciphertext, enabling further computation.
2. Function evaluation. During the noise reduction process, a lookup table is applied to the encrypted value. This lookup table encodes the Boolean function being computed (AND, OR, NAND, etc.). The output ciphertext encrypts the result of applying the function to the input. This is the "programmable" part: by changing the lookup table, you change the function.
3. Key conversion. The intermediate computation happens under a different encryption key (the GLWE key used in the BSK). The key-switching step at the end converts back to the original LWE key, so the output is compatible with the input format for the next gate in the circuit.
The computational cost of a PBS is dominated by a polynomial multiplication in a ring of dimension N (typically N=1024 or N=2048 for standard security parameters). This involves N multiplications and additions of large integers, which on current hardware takes approximately one millisecond. This per-gate cost is the fundamental constant of TFHE performance: everything else derives from it.
Walking Through an 8-Bit GT Circuit
Now let us trace through a concrete 8-bit greater-than (GT) comparison step by step. We have two 8-bit encrypted integers, A and B, each represented as eight individual encrypted bits. We want to compute an encrypted bit that equals 1 if A > B and 0 otherwise.
Denote the bits of A as a7, a6, ..., a1, a0 (most significant bit first) and the bits of B as b7, b6, ..., b1, b0.
The GT comparison works by scanning from the most significant bit to the least significant bit. At each bit position, we determine whether A is definitively greater than B at that position (ai = 1 and bi = 0), definitively less (ai = 0 and bi = 1), or still undecided (ai = bi). The final result is determined by the first bit position where A and B differ.
The circuit implements this as follows:
Bit 7 (MSB): Compute the partial GT result for the most significant bit. gt7 = a7 AND (NOT b7). This asks: is a7 = 1 while b7 = 0? NOT is free, AND costs 1 PBS. Running total: 1 PBS.
Bit 7 equality: Compute eq7 = NOT(a7 XOR b7). This asks: are a7 and b7 equal? XOR is free, NOT is free. Running total: still 1 PBS.
Bit 6: Compute the partial GT for bit 6: partial_gt6 = a6 AND (NOT b6). Cost: 1 PBS. Then combine with the chain: gt6 = gt7 OR (eq7 AND partial_gt6). The eq7 AND partial_gt6 costs 1 PBS. The OR costs 1 PBS. Running total: 1 + 1 + 1 + 1 = 4 PBS.
Wait -- that is 3 PBS for bit 6 alone, which would give us more than 2n-1 total. Let us be more precise about the optimized circuit.
The optimized GT circuit uses a carry-chain structure. The comparison result propagates from MSB to LSB as a single "carry" bit that tracks whether A is greater than B so far. At each bit position, the carry is updated based on whether the current bits differ.
The optimized recurrence is:
gt_carry[7] = a7 AND (NOT b7). Cost: 1 PBS (NOT is free, AND is 1 PBS).
For each subsequent bit i from 6 down to 0:
diff_i = a_i XOR b_i (free -- LWE addition)
bit_gt_i = a_i AND (NOT b_i) (1 PBS -- NOT is free, AND is 1 PBS)
gt_carry[i] = bit_gt_i OR (gt_carry[i+1] AND (NOT diff_i)) -- but this can be restructured.
Using the MUX formulation: gt_carry[i] = MUX(diff_i, bit_gt_i, gt_carry[i+1]). When diff_i = 1 (bits differ), use bit_gt_i (which tells us whether A's bit was the 1). When diff_i = 0 (bits are same), propagate the previous carry. The MUX can be implemented as: (diff_i AND bit_gt_i) OR ((NOT diff_i) AND gt_carry[i+1]). But since diff_i and bit_gt_i are related (when diff_i = 1, bit_gt_i = a_i), this simplifies to just needing 2 PBS per bit position in the optimized form.
The total for the optimized 8-bit GT circuit comes to 2(8) - 1 = 15 programmable bootstraps. This matches the theoretical cost formula for n-bit GT: 2n - 1 PBS.
At 1 millisecond per PBS (sequential execution), this is 15 milliseconds per comparison. Some gates at the same level can be parallelized (the per-bit partial computations are independent), but the carry chain is inherently sequential: each carry depends on the previous one.
The GateEngine
In H33's production TFHE implementation, Boolean gates are evaluated through the GateEngine abstraction. The GateEngine is the core execution unit that takes encrypted bits and evaluation keys as input, evaluates a specified gate, and produces an encrypted output bit.
The GateEngine handles three responsibilities:
Gate evaluation. Given a gate type (AND, OR, XOR, NOT, etc.) and the appropriate encrypted input bits, the GateEngine performs the corresponding operation. For gates that require a programmable bootstrap (AND, OR, NAND, NOR), the GateEngine performs the full bootstrap-key-switch pipeline. For free gates (XOR, NOT), it performs the LWE arithmetic directly.
Noise tracking. The GateEngine monitors the noise level in each ciphertext after every operation. Free gates increase noise slightly (additive noise from LWE addition); bootstrapped gates reset noise to a known level. The noise tracking ensures that no ciphertext exceeds the decryption threshold before being bootstrapped. If a chain of free operations (multiple XORs, for example) accumulates too much noise, the GateEngine inserts an explicit bootstrap to refresh the ciphertext before continuing.
Gate metering. Every programmable bootstrap consumed is counted. The GateEngine maintains a running tally of PBS operations per API call. This counter drives billing: customers are charged per programmable bootstrap, with free operations (XOR, NOT) incurring zero cost. The meter provides exact, auditable usage data.
The ComparisonEngine
Built on top of the GateEngine, the ComparisonEngine provides higher-level comparison operations. The ComparisonEngine takes two encrypted n-bit integers and produces an encrypted comparison result.
The ComparisonEngine exposes three primary operations:
gt(a, b): Greater-than comparison. Returns an encrypted bit that is 1 if a > b, 0 otherwise. Cost: 2n-1 programmable bootstraps for n-bit inputs.
gte(a, b): Greater-than-or-equal. Implemented as NOT(gt(b, a)) -- reverse the arguments and negate. NOT is free, so the cost is identical to gt: 2n-1 programmable bootstraps.
eq(a, b): Equality comparison. Returns an encrypted bit that is 1 if a equals b, 0 otherwise. At 4-bit precision and below, eq works with standard parameters. At 8-bit precision, eq requires the TRLWE N=1024 parameter upgrade to manage the noise profile of the chained gate topology. The ComparisonEngine validates the parameter set and rejects 8-bit eq calls on standard parameters, returning a clear error message rather than producing incorrect results.
The ComparisonEngine also provides compound operations: min(a, b), max(a, b), and clamp(x, low, high). Each is built from gt and MUX operations, with costs derived directly from the gate count.
The ThresholdProofEngine
The ThresholdProofEngine combines comparison with decision-making. It takes an encrypted value, a public threshold, and two possible encrypted outcomes, and produces the outcome corresponding to the comparison result.
The core operation is: threshold_select(x, threshold, outcome_above, outcome_below). This evaluates gt(x, threshold), then uses the encrypted result bit to MUX between outcome_above and outcome_below. The total cost is the gt cost (2n-1 PBS for n-bit values) plus the MUX cost (one PBS per output bit).
The ThresholdProofEngine is the bridge between raw Boolean gates and the decision logic that H33-Agent-Zero uses. When Agent-Zero needs to determine whether a confidence score exceeds a threshold, it calls the ThresholdProofEngine. When a compliance rule engine needs to check whether a transaction amount exceeds a limit, it calls the ThresholdProofEngine. When an underwriting model needs to compare a risk score against a tier boundary, it calls the ThresholdProofEngine.
Each call is metered at the gate level. The customer sees exactly how many programmable bootstraps were consumed: 15 for the 8-bit GT comparison, plus however many for the MUX stage. No hidden costs, no estimated charges, no billing surprises.
Production Optimizations
While the theoretical gate counts are fixed (2n-1 PBS for n-bit GT), several production optimizations reduce wall-clock time without changing the total PBS count.
Parallel partial computations. The per-bit partial GT values (a_i AND NOT b_i) are independent of each other. All eight can be computed simultaneously on eight threads, reducing this phase from 8 sequential PBS to 1 PBS of wall-clock time. However, the carry chain remains sequential.
Batched bootstrapping. When multiple independent comparisons need to be evaluated (e.g., multiple branches at the same level of a decision tree), their bootstraps can be batched. The computational core processes multiple bootstrap operations concurrently, amortizing the per-bootstrap overhead across the batch.
Key caching. The BSK and KSK are loaded once per session and cached in memory. Key deserialization and validation happen at connection time, not at gate evaluation time. This eliminates the key setup latency from the critical path of gate evaluation.
Memory pooling. LWE ciphertexts are allocated from a pre-allocated memory pool rather than being dynamically allocated for each gate output. This eliminates allocation overhead in the tight gate evaluation loop, which matters when evaluating thousands of gates per second.
These optimizations do not change the fundamental cost model (the PBS count is the PBS count), but they reduce the wall-clock time by a factor that depends on the available parallelism and the circuit structure.
A Concrete Production Example
Let us trace a complete production call through the system. A compliance engine needs to check whether an encrypted transaction amount (8-bit, representing a scaled dollar value) exceeds a threshold of $10,000.
Client-side setup (one-time per session): The client generates the LWE secret key, derives the BSK and KSK, and sends the evaluation keys to the server. Key size: several megabytes. Latency: network-dependent, typically under 100 milliseconds on a modern connection. This happens once and is amortized over all subsequent evaluations.
Client-side encryption (per transaction): The client encrypts the transaction amount as eight individual LWE ciphertexts, one per bit. The threshold is public (the server knows it is $10,000), so it does not need to be encrypted. Each LWE ciphertext is a vector of integers, typically a few hundred bytes. Total encrypted data size per transaction: a few kilobytes.
Server-side evaluation: The server receives the eight encrypted bits and calls ComparisonEngine::gt(encrypted_amount, public_threshold). The ComparisonEngine evaluates the 8-bit GT circuit: 15 programmable bootstraps, executed with partial parallelism. Wall-clock time: approximately 8-12 milliseconds depending on parallelism. The GateEngine meters 15 PBS.
Server-side decision: The server calls ThresholdProofEngine::threshold_select with the encrypted GT result, an encrypted "flag" tag (for amounts above threshold), and an encrypted "pass" tag (for amounts below). The MUX selects the correct tag based on the encrypted comparison bit. Cost: 1 PBS per output bit for the MUX (if the tag is a single bit, 1 PBS; if multi-bit, one per bit). The GateEngine adds the MUX PBS count to the running meter.
Server-side return: The server sends the encrypted result tag back to the client. The meter records the total PBS count (15 + MUX cost) for billing.
Client-side decryption: The client decrypts the result tag with its secret key. It learns whether the transaction was flagged or passed. The server learned nothing.
Total latency: 8-15 milliseconds for the gate evaluation, plus network round-trip time. Total billed PBS: 16-20 depending on MUX width. Total data exchanged: a few kilobytes of encrypted bits in each direction.
Gate Metering and Billing
H33's per-gate metering is a deliberate design choice that reflects a principle: customers should pay for exactly what they use, and they should know exactly what they are paying for before they use it.
Every API call that evaluates TFHE Boolean gates returns a gate_count field in the response that reports the exact number of programmable bootstraps consumed. This count is deterministic: the same circuit on the same input sizes always costs the same number of PBS. There are no variable costs, no probabilistic charges, no "it depends on the data" pricing.
This predictability enables customers to do capacity planning and cost estimation before deploying a circuit. An 8-bit GT comparison always costs 15 PBS. An argmax over 10 categories at 8-bit precision always costs (9 * 15) + MUX overhead. A decision tree with known depth and known feature bit widths has a fixed, calculable PBS cost. Customers can compute their total monthly gate cost from their expected query volume and per-query gate count.
Free operations (XOR, NOT) are tracked in the meter for transparency but are not billed. Customers can see exactly how many of each gate type were used, which helps with circuit optimization: if a circuit uses many AND gates where XOR could be substituted (by restructuring the Boolean logic), the meter data reveals the opportunity.
Why Bit-Level Matters for Security
Working at the bit level provides a security property that higher-level FHE schemes do not: each bit is independently encrypted. In BFV or CKKS, a single ciphertext encodes a vector of values packed into SIMD slots. Compromising the ciphertext reveals all values in the vector. In TFHE, each bit is a separate LWE ciphertext. Compromising one ciphertext reveals one bit -- which is informationally useless without the rest of the bits.
This means that even a partial side-channel attack on the server -- leaking some ciphertexts but not all -- reveals nothing useful. An attacker who learns three bits of an 8-bit encrypted value has eight possible values for the original (2^5 remaining bits). An attacker who learns one ciphertext from a CKKS computation might have thousands of values (all the SIMD slots). The bit-level granularity of TFHE provides defense in depth against partial compromise scenarios.
Additionally, each programmable bootstrap refreshes the noise to a known level, which eliminates noise-based side channels. In schemes where noise accumulates through a computation, the noise level at each step leaks information about the computation path. In TFHE, every bootstrapped gate resets the noise, making the noise profile uniform and data-independent.
Connecting to Agent-Zero
H33-Agent-Zero's confidence boundary, its decision trees, its policy enforcement engine -- all of these are built from the Boolean gates described in this article. There is no abstraction layer hiding a different computation model. The agent's decisions are literally sequences of AND, OR, XOR, and NOT gates applied to encrypted bits.
When Agent-Zero evaluates a confidence threshold, it is running a ComparisonEngine::gt call -- 15 programmable bootstraps for an 8-bit confidence score. When Agent-Zero evaluates a decision tree, it is running a sequence of ComparisonEngine::gt calls, one per branch. When Agent-Zero computes an argmax across classification categories, it is running a tournament of ComparisonEngine::gt calls with ThresholdProofEngine MUX operations.
The Boolean gates are not an implementation detail hidden behind an abstraction. They are the implementation. They are the reason Agent-Zero can make decisions on encrypted data. They are the reason the server cannot see the data, the decisions, or the outcomes. They are the mechanism by which encrypted control flow works.
Every programmable bootstrap consumed is a step in the agent's decision process. Every gate metered is a unit of encrypted reasoning. And every encrypted result returned is a decision that was made without any party ever seeing the underlying data.
That is what FHE Boolean gates in production look like. Not a theoretical construction. Not a benchmark number. A real system evaluating real decisions on real encrypted data, one gate at a time, with every gate metered, billed, and accounted for.
The Path Forward: Gate Optimization
The economics of encrypted Boolean computation create a strong incentive to minimize gate counts. Every gate saved is a millisecond of latency removed and a unit of cost eliminated. This is driving a new discipline in circuit design: gate-optimal encrypted circuits.
Classical circuit optimization has been studied for decades in the hardware design community. But the cost model in TFHE is different from ASIC or FPGA design. In hardware, all gates have roughly equal cost (area and delay). In TFHE, AND costs 1 PBS, but XOR costs 0. This changes the optimization landscape entirely. Circuits that are "optimal" in hardware (minimizing total gate count) may not be optimal in TFHE (where you want to minimize AND/OR gates specifically, even if it means adding more XOR/NOT gates).
H33 is investing in gate-optimal circuit libraries for common decision primitives. The 8-bit GT circuit at 2n-1 = 15 PBS is already optimized, but compound operations (argmax, decision trees, multi-threshold checks) have room for optimization through gate sharing, common subexpression elimination, and Boolean minimization techniques adapted to the TFHE cost model.
As gate costs decrease with hardware improvements and algorithmic advances, the set of practical encrypted decision circuits expands. Operations that are too expensive today at 16-bit or 32-bit precision become practical at future gate costs. The architecture -- GateEngine, ComparisonEngine, ThresholdProofEngine, per-gate metering -- remains the same. Only the numbers change.
This is the advantage of building on Boolean gates as the fundamental abstraction. The architecture does not need to change when the hardware improves. The circuits do not need to be redesigned when parameters are updated. The billing model does not need to be restructured when costs decrease. The gate is the unit of work, the unit of cost, and the unit of trust. Everything else is built from it.