What is encrypted AI inference?

Encrypted AI inference uses fully homomorphic encryption (FHE) to run AI models on encrypted data without ever decrypting it. The model processes ciphertext and produces ciphertext. The plaintext is never exposed to the model, the infrastructure, or any intermediary. H33 provides three FHE engines optimized for different AI workloads.

How does FHE AI inference differ from TEEs and differential privacy?

Trusted Execution Environments (TEEs) still process plaintext data inside enclaves and are vulnerable to side-channel attacks like Spectre and Meltdown. Differential privacy adds noise that degrades model accuracy. Federated learning still exposes local data to the model. FHE is the only approach where the model is mathematically incapable of seeing the data it processes.

What FHE engines does H33 provide for AI?

H33 provides three purpose-built FHE engines: CKKS for neural network inference with SIMD slot packing for parallel floating-point operations, BFV for decision trees and exact integer scoring where precision matters, and TFHE for boolean gate-level classification at 768 TPS. Each engine is optimized for its workload class.

What is H33-74 attestation?

H33-74 is a 74-byte cryptographic attestation that proves an AI inference was computed correctly on specific encrypted inputs producing specific encrypted outputs. It commits the input hash, model version, output hash, and signing authority using post-quantum Dilithium signatures. Any third party can verify it independently without vendor access.

What performance does H33 encrypted AI inference achieve?

H33 achieves 2,293,766 authenticated operations per second on Graviton4, with 38 microseconds per operation. FHE batch processing handles 32 users per ciphertext in under 943 microseconds. Inference result caching via Cachee delivers sub-microsecond lookups for repeated queries.

Fully Homomorphic Encryption for AI Inference

The Model Never Sees Your Data. The Results Are Proven.

Every AI API call processes your data in plaintext. The model sees the patient record. The provider logs the financial document. The inference server stores the privileged communication. Every single time.

H33 wraps AI inference in Fully Homomorphic Encryption. The model computes on ciphertext and produces ciphertext. It is mathematically incapable of seeing what it processes. H33-74 attestation proves correct execution.

Try Encrypted Inference How It Works

2.29M

Auth/sec on Graviton4

38µs

Per-operation latency

Purpose-built FHE engines

74B

Attestation per inference

The Problem

Every AI API call is a data exposure event

You send plaintext to the model. The model processes it. The provider logs it. The inference server caches it. Your data has now been exposed to every layer of the stack. Compliance says "encrypted in transit." That's TLS. The model still sees everything.

📡

API Providers See Everything

When you call GPT-4, Claude, or any hosted model, your prompt arrives in plaintext at the provider's inference server. TLS protects the wire. It does not protect the endpoint. The model, the logging system, and every middleware between you and the GPU has full access to your data.

🔍

Self-Hosted Doesn't Fix It

Running models on your own infrastructure moves the problem — it doesn't solve it. The model still processes plaintext. Your inference servers become a target. A breach of the GPU cluster exposes every input ever processed. The data surface is the same.

⚠

No Proof It Was Private

Even if you trust the provider, you cannot prove to an auditor, regulator, or court that the model never accessed your data in plaintext. Trust is not evidence. Compliance requires proof. Today, nobody has it.

Three Engines

The right encryption for every AI workload

Different AI models need different arithmetic. Neural networks need floating-point. Decision trees need exact integers. Classifiers need boolean gates. H33 provides a purpose-built FHE engine for each.

CKKS

Neural Network Inference

Approximate arithmetic with SIMD slot packing

CKKS encodes floating-point vectors into polynomial rings with SIMD slots, enabling parallel computation across thousands of values in a single ciphertext. Neural network layers — matrix multiplications, activations, normalization — execute on encrypted data with controlled precision loss.

SIMD slots for parallel float ops

BFV

Decision Trees & Exact Scoring

Exact integer arithmetic, zero precision loss

BFV operates on exact integers with no approximation error. Credit scoring, risk classification, rule-based decisioning, and threshold comparisons execute with bit-perfect accuracy on encrypted data. When the answer must be exactly right, BFV is the engine.

943µs per 32-user batch (Graviton4)

TFHE

Boolean Classification

Gate-level operations at 768 TPS

TFHE evaluates boolean circuits on encrypted bits. Binary classification, pass/fail determinations, flag-or-clear decisions, and bitwise comparisons run at gate level with programmable bootstrapping. Each gate refreshes noise, enabling arbitrary circuit depth.

768 TPS (16-bit equality)

Flagship Application

Agent-Zero: Encrypted document classification

Agent-Zero classifies documents — contracts, medical records, financial statements, legal filings — without ever seeing the plaintext. The document is FHE-encrypted before it reaches the classification model. The model processes ciphertext, returns an encrypted classification, and the client decrypts locally.

📄

Document In

Client encrypts the document using H33's FHE SDK. The plaintext never leaves the client's boundary. The encrypted representation is a lattice ciphertext indistinguishable from random noise.

🧠

Classification on Ciphertext

Agent-Zero's classification model processes the encrypted document. Feature extraction, embedding computation, and classification scoring all execute on ciphertext. The model is mathematically incapable of seeing the document content.

✅

Proven Result

The encrypted classification result returns to the client for local decryption. An H33-74 attestation proves the computation was correct: input hash committed, model version committed, output hash committed, authority signed with Dilithium.

How It Works

Five steps. Zero plaintext exposure.

From client encryption to verified result, the plaintext never exists outside the client's boundary.

Step 01

Client Encrypts

Data is FHE-encrypted on the client using H33's SDK. The ciphertext is a lattice polynomial indistinguishable from random noise.

Client-side

Step 02

Server Computes

The AI model processes encrypted ciphertext. Homomorphic operations (add, multiply, rotate) execute the model's computation graph on encrypted data.

943µs / 32-user batch

Step 03

Client Decrypts

The encrypted result returns to the client. Only the client's private key can decrypt. H33, the model, and the infrastructure never see the plaintext result.

Client-side

Step 04

H33-74 Attests

A 74-byte attestation is generated: input commitment, model version, output commitment, and Dilithium signature. Proves correct execution without revealing data.

391µs attestation

Step 05

Cachee Caches

The attestation and encrypted result are cached in Cachee for sub-microsecond replay. Repeated queries skip FHE computation entirely.

0.358µs cached lookup

Integration

Encrypt your inference in three lines

Your existing AI call, wrapped. FHE encrypts inputs before they touch the model.

inference.py — encrypted AI inference

from h33 import EncryptedInference

# Initialize with your preferred FHE engine
engine = EncryptedInference(engine="bfv")  # or "ckks", "tfhe"

# Your data never leaves your boundary in plaintext
encrypted_input = engine.encrypt(patient_record)

# Model computes on ciphertext — never sees plaintext
encrypted_result = engine.infer(
    model="classification-v3",
    input=encrypted_input
)

# Decrypt locally — only your key can read the result
result = engine.decrypt(encrypted_result)

# result.classification  — the AI's output
# result.attestation     — 74-byte H33-74 proof of correct execution
# result.model_version   — committed model hash
# result.verify_url      — h33.ai/verify/<proof_id>

Why Not Alternatives

The model still sees your data. Unless it's FHE.

TEEs, differential privacy, and federated learning each address a piece of the problem. None of them prevent the model from processing plaintext.

❌

Traditional AI Inference

Client data (plaintext) → API
API → Model (sees plaintext)
Model → Result (logged plaintext)
Provider retains data access

Model processes plaintext inputs
Provider infrastructure has full data access
Breach exposes all historical inputs
No cryptographic proof of privacy

✅

H33 Encrypted Inference

Client data → FHE encrypt (client-side)
Ciphertext → Model (computes on noise)
Encrypted result → Client decrypt
H33-74 attestation proves it

Model processes ciphertext only
Breach exposes random noise, not data
Auditor verifies without system access
74-byte proof per inference

Alternatives Compared

Why each alternative falls short

Each approach has legitimate uses. None of them solve the core problem: the model processing plaintext data.

Trusted Execution Environments

VERDICT: Side-channel vulnerable

TEEs (Intel SGX, AMD SEV, ARM TrustZone) create hardware enclaves where code runs in isolation. But the data is still plaintext inside the enclave. Spectre, Meltdown, PLATYPUS, and LVI have repeatedly demonstrated that side-channel attacks can extract secrets from enclaves. TEEs protect against software attacks. They do not protect against hardware-level side channels.

Differential Privacy

VERDICT: Accuracy loss, no data separation

Differential privacy adds calibrated noise to outputs to prevent reconstruction of individual inputs. This is a statistical guarantee, not a cryptographic one. The model still processes plaintext data — it just perturbs the output. Accuracy degrades with stronger privacy guarantees. And there is no proof that specific data was never accessed.

Federated Learning

VERDICT: Model still sees local data

Federated learning distributes training across devices without centralizing raw data. But each local model still processes local plaintext data during training. Gradient attacks can reconstruct training inputs. And at inference time, the model processes plaintext regardless — federated learning is a training technique, not an inference protection.

Use Cases

Encrypted inference across industries

Every industry that uses AI on sensitive data needs encrypted inference. Here's where it matters most.

Healthcare

Encrypted Diagnostic AI

Medical imaging analysis, diagnostic classification, and treatment recommendation models run on FHE-encrypted patient records. The AI produces results without accessing PHI. HIPAA compliance is cryptographic, not contractual.

Finance

Encrypted Credit Scoring

Credit models, risk assessments, and fraud detection run on encrypted financial data using BFV exact arithmetic. The model scores applicants without seeing income, debt ratios, or account balances. Results are bit-perfect.

Legal

Encrypted Document Review

Contract analysis, due diligence, and litigation support AI processes encrypted privileged documents. Attorney-client privilege is maintained because the model is cryptographically incapable of reading the documents it classifies.

Defense

Encrypted Intelligence Analysis

Classification models process encrypted intelligence reports. Analysts receive classifications without exposing source material to the AI system. Compartmentalization is enforced by mathematics, not by policy.

Insurance

Encrypted Claims Triage

Claims adjudication AI processes encrypted policyholder data. The model triages claims, flags anomalies, and recommends actions without accessing personal health information or financial details in plaintext.

Sanctions

Encrypted Sanctions Screening

Transaction screening against sanctions lists runs on encrypted transaction data. The model returns match/no-match on ciphertext. Wire transfer details, beneficiary names, and account numbers are never exposed to the screening system.