What are the main data exposure risks when using AI?

The five main risks are: GPU memory exposure (plaintext persists in VRAM after inference), KV cache leakage (attention caches store full prompt context), log and observability capture (prompts and responses appear in telemetry), training data memorization (models can reproduce verbatim training examples), and cross-tenant GPU sharing (multi-tenant inference can leak data between customers).

How does FHE make AI safe for sensitive data?

Fully homomorphic encryption allows AI models to process data without decrypting it. The input arrives encrypted, all computation happens on ciphertext, and the encrypted result is returned to the data owner for decryption. The AI server never possesses plaintext data or decryption keys, making exposure mathematically impossible regardless of breaches, insider threats, or misconfigurations.

Is AI Safe for Sensitive Data?

Q: Is AI safe for sensitive data?

No — not by default. Every AI model processes input data in plaintext during inference. That plaintext exists in GPU memory, caches, logs, and context windows. A breach at any layer exposes everything. Privacy-preserving AI using fully homomorphic encryption (FHE) eliminates this risk by processing data while it remains encrypted, so sensitive information is never exposed even during computation.

The default answer to whether AI is safe for sensitive data is no. Not because AI models are malicious, but because the architecture of AI inference requires plaintext access to data. Every input is decoded, tokenized, and processed as unencrypted numerical representations. At each stage, the data is vulnerable to extraction through multiple attack surfaces.

The Five Exposure Vectors

Understanding why standard AI is unsafe requires examining the specific points where sensitive data becomes vulnerable during inference.

1. GPU Memory Exposure

When an AI model processes data, the input is loaded into GPU VRAM as unencrypted tensors. These tensors persist in memory after inference completes — often indefinitely until overwritten by subsequent workloads. Research from Trail of Bits (2024) demonstrated that GPU memory can be read by subsequent processes on shared infrastructure, recovering full prompts and responses from prior sessions. On multi-tenant cloud GPU instances, this means one customer's sensitive data can be extracted by another customer's workload.

2. KV Cache Leakage

Transformer models maintain key-value (KV) caches that store the full attention context of every token processed. These caches contain complete representations of input data and grow linearly with context length. A single compromised cache entry can reconstruct the original prompt. In multi-turn conversations, the KV cache accumulates every message — including sensitive data from earlier in the conversation that the user may believe has been "forgotten."

3. Log and Observability Capture

Production AI systems generate telemetry. Prompts, responses, latency measurements, token counts, and error traces flow into logging pipelines. These logs routinely contain the full text of user inputs. An organization that routes medical records or financial data through an AI system may find that data replicated across Datadog, Splunk, CloudWatch, or any other observability platform — often with weaker access controls than the primary system.

4. Training Data Memorization

Large language models memorize portions of their training data verbatim. Carlini et al. (2023) demonstrated that GPT-3.5 could reproduce exact passages from training data when prompted with partial matches. If a model was trained on or fine-tuned with sensitive data — customer records, internal documents, PII — that data can be extracted through targeted prompting. This is not a bug; it is a fundamental property of how neural networks store information.

5. Cross-Tenant Memory Sharing

Multi-tenant GPU inference — the default deployment model for cloud AI services — shares physical hardware between customers. Without hardware-level memory isolation (which most GPU architectures do not provide), residual data from one inference request can leak into another. NVIDIA's Confidential Computing initiative acknowledges this risk, but hardware TEE support for GPUs remains limited and imposes significant performance penalties.

Why Access Controls Are Insufficient

The standard response to these risks is access controls: encryption at rest, encryption in transit, role-based access, audit logs. These measures protect data before and after processing. They do nothing during processing. The moment data enters the inference pipeline, it exists as plaintext in memory. Every access control is bypassed by the fundamental requirement that the model must see the data to process it.

This is not a configuration problem. It is an architectural constraint of plaintext computation. You cannot fix it with better policies, stricter permissions, or more rigorous auditing. The data must be unencrypted for the model to use it — unless the model can compute on encrypted data directly.

The FHE Solution

Fully homomorphic encryption eliminates every exposure vector listed above by ensuring data never exists as plaintext on the inference server. The data arrives encrypted. The model computes on ciphertext. The result is returned encrypted. The server never possesses the decryption key.

GPU memory contains only ciphertext — computationally indistinguishable from random noise. KV caches store encrypted representations. Logs capture encrypted inputs and outputs. There is no plaintext to memorize during training. Cross-tenant sharing is irrelevant because the shared data is encrypted with keys the infrastructure provider does not hold.

H33 implements this architecture in production. Biometric templates, identity documents, and authentication data are processed entirely under FHE encryption. The end-to-end pipeline — FHE inner product, ZK-STARK proof generation, Dilithium attestation — completes in 38.5 microseconds per authentication. Over 2.1 million authentications per second on a single instance.

For organizations subject to HIPAA, GDPR, PCI DSS, SOC 2, or any framework that requires data protection during processing, FHE is the only approach that provides mathematical guarantees rather than policy-based assurances. The data is protected because the mathematics make exposure impossible — not because a policy says it should be.