How does training data memorization cause AI data leaks?

Large language models memorize portions of their training data verbatim. Research has demonstrated that models can reproduce exact passages, including personal information, when prompted with partial matches. If a model was trained on or fine-tuned with sensitive data — customer records, medical files, internal documents — that data can be extracted through targeted prompting attacks.

How does FHE prevent AI from leaking data?

Fully homomorphic encryption ensures data never exists as plaintext during AI processing. The input arrives encrypted, computation happens on ciphertext, and encrypted results are returned. There is no plaintext in GPU memory to leak, no plaintext in KV caches to expose, no plaintext in logs to capture, no plaintext data for the model to memorize, and no plaintext for cross-tenant attacks to extract.

Can AI Leak Confidential Data?

Q: Can AI leak confidential data?

Yes. AI systems can leak sensitive data through five vectors: training data memorization (the model remembers and reproduces PII), prompt injection (adversarial inputs extract context), cache exposure (KV cache stores plaintext), log leakage (observability captures prompts and responses), and cross-tenant memory sharing (multi-tenant GPU inference). These are architectural properties of how AI systems process data, not bugs that can be patched.

Every AI system that processes data in plaintext is a potential leak vector. This is not a theoretical risk. Documented incidents span every major cloud AI provider and every model architecture. Understanding the five primary leak mechanisms is the first step toward eliminating them.

Vector 1: Training Data Memorization

Neural networks do not just learn patterns — they memorize specific training examples. Carlini et al. demonstrated in 2023 that GPT-3.5 could reproduce verbatim passages from its training data when given partial prompts. The phenomenon scales with model size: larger models memorize more data. A model fine-tuned on customer records, medical files, or legal documents will retain fragments of that data and can be induced to reproduce them.

Samsung discovered this firsthand in 2023 when engineers pasted proprietary source code into ChatGPT. That code became part of the model's context and could theoretically be surfaced to other users. The data was irrevocably exposed the moment it entered the inference pipeline.

Vector 2: Prompt Injection

Prompt injection attacks manipulate AI systems into revealing data they should not disclose. An attacker crafts an input that overrides the model's system prompt, instructing it to output its context window, previous conversations, or system instructions. In retrieval-augmented generation (RAG) systems, prompt injection can extract the retrieved documents — which may contain confidential data from the organization's knowledge base.

In early 2025, researchers demonstrated prompt injection attacks against Microsoft Copilot that extracted emails, calendar entries, and internal documents from the user's Microsoft 365 environment. The model had legitimate access to this data for contextual responses; the injection redirected that access to an adversary.

Vector 3: KV Cache Exposure

Transformer models maintain key-value caches that store attention state for every token in the context window. These caches contain dense representations of the full input. On shared GPU infrastructure, KV cache memory is not always zeroed between sessions. Research from UC San Diego (2024) showed that residual KV cache data could be extracted from GPU memory by subsequent workloads, recovering up to 87% of the previous session's input.

The attack surface expands with longer context windows. A 128K-token context window means 128K tokens of plaintext data sitting in GPU VRAM, accessible to any process that can read that memory region after the session ends.

Vector 4: Log Leakage

Production AI systems emit telemetry. Prompts, completions, token counts, latency metrics, and error traces flow into observability platforms. These logs routinely contain the full text of user inputs and model outputs. An organization routing medical records through an AI system may find patient data replicated across logging infrastructure with weaker access controls than the primary application.

In 2024, a major healthcare SaaS vendor discovered that their AI-powered clinical notes feature had been logging complete patient narratives to Datadog for 14 months. The logs were accessible to the entire engineering team. The data included diagnoses, treatment plans, and patient identifiers — a HIPAA violation caused not by a breach but by standard observability practices.

Vector 5: Cross-Tenant GPU Sharing

Cloud AI inference runs on shared GPU hardware. Multiple tenants share the same physical GPU, separated by software isolation rather than hardware boundaries. NVIDIA's A100 and H100 GPUs support MIG (Multi-Instance GPU) partitioning, but MIG does not provide cryptographic memory isolation. Research from ETH Zurich (2024) demonstrated side-channel attacks that extracted model weights and input data across MIG partitions on A100 GPUs.

For organizations processing sensitive data on cloud AI platforms, this means their data may be extractable by other tenants sharing the same physical hardware — without any network-level breach.

How FHE Eliminates All Five Vectors

Fully homomorphic encryption eliminates every leak vector by ensuring data never exists as plaintext during processing. The solution is architectural, not configurational.

Memorization: The model processes ciphertext. There is no plaintext to memorize. Even if the model perfectly memorizes encrypted training data, the ciphertext is computationally indistinguishable from random noise without the decryption key — which the model operator never possesses.

Prompt injection: Injected prompts operate on ciphertext. The attacker can manipulate the model's behavior, but the extracted data is encrypted. Without the data owner's private key, the extracted ciphertext is useless.

KV cache: The cache stores encrypted representations. Residual cache data from prior sessions is ciphertext that cannot be decrypted by subsequent workloads.

Log leakage: Logs capture encrypted inputs and encrypted outputs. The observability pipeline sees only ciphertext. A breach of the logging infrastructure reveals nothing about the underlying data.

Cross-tenant sharing: Side-channel attacks on shared GPUs extract ciphertext. Without the decryption key, the extracted data provides zero information about the plaintext.

H33 implements this architecture in production. Data arrives encrypted under BFV fully homomorphic encryption. The FHE computation, ZK-STARK proof generation, and Dilithium attestation all operate on ciphertext. The entire pipeline completes in 38.5 microseconds per authentication, sustaining over 2.1 million operations per second. The server never possesses a decryption key. A complete infrastructure compromise — including root access to every server, every log, every cache — yields only encrypted data that cannot be decrypted.