BenchmarksStack Ranking
APIsPricingDocsWhite PaperTokenBlogAboutSecurity Demo
Log InGet API Key
Pillar Guide

Privacy-Preserving AI: Compute on Data Without Exposure

Every AI model processes plaintext. Every inference exposes data. Every training run leaks information. There is a better way. Process sensitive data with AI while it remains fully encrypted. No decryption. No exposure. No compromise.

38.5µs
Per operation
2.17M/sec
Sustained throughput
0 bytes
Plaintext exposed
Post-Quantum
NIST FIPS 203/204

The Core Problem

Artificial intelligence requires access to data. That statement is so obvious it barely registers. But it contains an assumption that creates the single largest vulnerability in modern computing: to process data, you must first expose it.

Every machine learning model, every LLM, every fraud detection engine, every biometric matcher operates on plaintext. The data must be decrypted before the model can touch it. It sits in memory, in GPU VRAM, in inference caches, in log files, in training pipelines. Unencrypted. Readable. Exfiltrable.

This creates a fundamental tension. Organizations need AI to derive insights from their most sensitive data: medical records, financial transactions, biometric templates, classified intelligence. But the act of processing that data with AI is the act of exposing it.

Traditional security addresses this with perimeter defenses: firewalls, access control lists, DLP tools, network segmentation. These approaches protect the boundary. They do nothing for the data itself once it crosses into the processing environment. A compromised insider, a memory-scraping exploit, a misconfigured cloud bucket, a rogue model training pipeline — any of these bypass every perimeter control.

The question is not whether to use AI on sensitive data. Organizations that do not will fall behind. The question is how to use AI without the exposure that has historically been inseparable from computation.

The exposure window is the attack surface. Every millisecond data exists in plaintext is a millisecond it can be intercepted, copied, or exfiltrated. Encryption at rest and in transit protect storage and movement. They do nothing for the moment of computation.

Why Encryption Alone Is Not Enough

Modern encryption is excellent at two things: protecting data at rest (AES-256 on disk) and protecting data in transit (TLS 1.3 over the wire). These are solved problems. A properly encrypted database and a properly configured TLS connection are, for practical purposes, unbreakable.

But neither addresses the third state of data: data in use.

The moment an AI model needs to run inference, the data is decrypted into memory. A fraud detection model analyzing a transaction must see the account number, the amount, the merchant, the behavioral pattern — all in plaintext. A medical AI processing a radiology image must see the pixels. An LLM answering a question about confidential documents must see the text.

This is the decryption gap. Every traditional AI system has it. The gap exists in cloud inference APIs, in self-hosted models, in edge deployments. It exists in OpenAI's servers when you send a prompt. It exists in your own data center when you run a fine-tuned model. It is not a failure of any particular vendor. It is a structural limitation of how computation has worked since the invention of the computer.

The consequences are well-documented:

  • Memory-scraping attacks extract plaintext from running processes. Cold boot attacks, Spectre/Meltdown side-channels, and DRAM forensics all target data in use.
  • Insider access to model inference infrastructure provides access to every piece of data the model processes. A single compromised admin credential exposes everything.
  • Training data leakage allows models to memorize and regurgitate sensitive data. Model inversion attacks can reconstruct training inputs from model outputs.
  • Compliance violations occur the moment regulated data (PHI, PCI, classified) exists in plaintext in an unauthorized environment — even temporarily, even in memory.

Secure enclaves (SGX, SEV, TrustZone) attempt to solve this by creating trusted execution environments. But they rely on hardware trust assumptions that have been broken repeatedly. Side-channel attacks against SGX alone have produced over 20 academic papers demonstrating data extraction. The security is not cryptographic. It is physical. And physical assumptions fail.

The only way to eliminate the decryption gap is to eliminate decryption.

What Is Privacy-Preserving Computation

Privacy-preserving computation is a family of cryptographic techniques that allow computation on data without revealing the data itself. Three technologies form the foundation:

Fully Homomorphic Encryption (FHE) allows mathematical operations to be performed directly on ciphertext. You encrypt data, send the ciphertext to a server, the server computes on the ciphertext, and returns an encrypted result. Decrypting the result gives the same answer as if the computation had been performed on the original plaintext. The server never sees the data. The computation is provably correct. The security is based on the hardness of lattice problems — the same mathematical foundation that NIST has selected as quantum-resistant.

Zero-Knowledge Proofs (ZKPs) allow one party to prove a fact about data without revealing the data itself. A ZK proof can demonstrate that a biometric matched, a transaction was valid, or a compliance check passed — without exposing the biometric template, the transaction details, or the compliance data. The verifier learns nothing except that the statement is true. ZK-STARKs, specifically, are post-quantum secure because they rely on hash functions rather than elliptic curves.

Secure Multiparty Computation (MPC) allows multiple parties to jointly compute a function over their inputs while keeping those inputs private. Two banks can determine whether a transaction appears in both their fraud databases without either bank revealing its database to the other. The result is computed collaboratively. No party ever has access to another party's raw data.

These are not theoretical. H33 runs all three in production. FHE processes data at 38.5 microseconds per operation. ZK-STARK proofs verify in under 0.06 microseconds (cached). Dilithium signatures attest results in 291 microseconds per batch. The full pipeline handles 2.17 million operations per second on commodity ARM hardware.

How It Works

Data enters encrypted. It is processed encrypted. It leaves encrypted. The infrastructure never sees plaintext. Not in memory. Not in cache. Not in logs. Not in GPU VRAM.

Step 1
🔒

Client-Side Encrypt

Data encrypted with BFV FHE before leaving the client

Step 2
📡

Encrypted Transit

Ciphertext transmitted over TLS. Double-encrypted in motion

Step 3
⚙️

FHE Processing

BFV homomorphic operations on ciphertext. Inner products, matching, scoring

Step 4
📐

ZK-STARK Proof

Operation correctness proven without revealing data

Step 5

Dilithium Attestation

Post-quantum signature on result. Tamper-proof audit trail

Step 6
🔒

Encrypted Return

Result returned encrypted. Only the client can decrypt

The critical difference from traditional architectures: there is no decryption step on the server. The entire computation happens on ciphertext. The FHE scheme guarantees that the encrypted result, when decrypted by the client, equals the result of computing on plaintext. This is not an approximation. It is a mathematical proof.

The ZK-STARK proof provides a second guarantee: the computation was performed correctly. An auditor, a regulator, or a counterparty can verify the proof without accessing the data. The Dilithium signature provides a third guarantee: the result has not been tampered with, and the attestation is secure against quantum computers.

Three ML agents run alongside the pipeline in real time: a harvest detection agent monitoring for quantum harvest attacks (0.69 microseconds), a side-channel detection agent watching for timing and power analysis anomalies (1.14 microseconds), and a crypto health agent validating parameter integrity (0.52 microseconds). Total ML overhead: 2.35 microseconds.

All of this happens in a single API call. Total latency: 38.5 microseconds per operation.

Performance Without Compromise

The historical objection to FHE is performance. Early FHE implementations were 100,000x to 1,000,000x slower than plaintext computation. This is no longer the case.

H33's BFV engine uses Montgomery-form NTT with Harvey lazy reduction, pre-computed twiddle factors in Montgomery domain, and NTT-form persistence that eliminates redundant transforms. The result: 38.5 microseconds per operation. That is faster than most unoptimized plaintext authentication systems.

H33

Per operation 38.5 µs
Sustained throughput 2.17M/sec
Hardware ARM CPU
GPU required No
Per-operation cost <$0.000001

Alternatives

Zama (H100 GPU) ~800 µs
Generic FHE (cloud) 4–7 ms
Academic FHE 50–500 ms
Traditional DLP overhead 50–200 ms
Build in-house (estimate) 875x cost

The performance gap exists because H33 is a purpose-built cryptographic engine, not a wrapper around a general-purpose FHE library. Every component — the NTT, the polynomial arithmetic, the key-switching, the noise management — is optimized for the specific computation patterns required by real-world privacy-preserving AI workloads.

No GPU cluster. No specialized hardware. A single ARM-based cloud instance (Graviton4, c8g.metal-48xl) sustains 2.17 million operations per second with less than 1% variance over 120 seconds. The per-operation cost is under $0.000001.

Performance is no longer a reason to accept data exposure.

Use Cases by Industry

Privacy-preserving AI applies wherever sensitive data meets computation. Four industries face the most acute version of this problem.

Go Deeper

Resources

Technical deep dives, industry analysis, and implementation guides.

Blog

Protect Sensitive Data from AI

How to use AI on sensitive data without creating new attack surfaces. Practical approaches from encryption to architecture.

Read article →
Blog

LLM Data Privacy Risks

The specific privacy risks of large language models: memorization, extraction, prompt injection, and training data leakage.

Read article →
Blog

Is ChatGPT HIPAA Compliant?

Analysis of LLM HIPAA compliance: BAA requirements, PHI handling, and what encrypted inference changes.

Read article →
Blog

How Banks Secure AI Models

Financial services AI security: fraud model protection, regulatory requirements, and encrypted computation architectures.

Read article →
Blog

Build vs. Buy Post-Quantum Encryption

The real cost of implementing PQC in-house vs. using a hardened API. Engineering time, cryptographic expertise, and maintenance burden.

Read article →
Blog

FHE Companies

A comprehensive landscape of companies building with fully homomorphic encryption, from research labs to production platforms.

Read article →
Blog

ZK Companies

The zero-knowledge proof ecosystem: who is building what, and where the technology is being deployed at scale.

Read article →
Solution

Compute on Encrypted Data

Technical deep dive into FHE-powered computation. BFV parameters, integration paths, and production benchmarks.

View solution →
Blog

What Is Fully Homomorphic Encryption?

A technical introduction to FHE: lattice math, noise growth, bootstrapping, and the path from theory to production.

Read article →

Your Data Is Never Exposed in the First Place

Not protected after the fact. Not monitored for breaches. Not subject to access controls that can be bypassed. Encrypted from the moment it leaves your system to the moment the result returns. One API call.

Get Free API Key How It Works

1,000 free units/month · No credit card required · Zero plaintext exposure

Verify It Yourself