How to Audit AI Without Seeing the Data
Using fully homomorphic encryption and STARK proofs to verify AI decisions on encrypted data, with post-quantum attestation chains for regulatory compliance
A hospital deploys an AI model to flag potential drug interactions. A bank uses a machine learning system to approve or deny loans. An insurance company runs an algorithm that sets premiums based on behavioral data. In every case, regulators and auditors need to verify that the AI is making correct, unbiased, compliant decisions. And in every case, the data the AI operates on -- patient records, financial histories, behavioral profiles -- is too sensitive to show to an auditor. This is the fundamental tension in AI governance: you cannot audit what you cannot see, but you cannot show what you are legally required to protect.
Fully homomorphic encryption resolves this tension by allowing computation on encrypted data. The auditor verifies the AI's logic and results without ever decrypting the underlying data. STARK proofs go further by providing a cryptographic guarantee that the computation was performed correctly -- not merely a promise, but a mathematical proof that the AI executed the exact algorithm it claimed to execute on the exact data it claimed to process. And three-family post-quantum attestation chains ensure that the audit trail itself cannot be forged, even by an adversary with a quantum computer.
This is not a future capability. H33 runs this pipeline in production today at 2,293,766 authentications per second, with each operation completing in 38 microseconds. The same pipeline that secures biometric authentication can secure AI audit trails, because the underlying primitives -- FHE for encrypted computation, STARK for verifiable execution, and post-quantum signatures for tamper-proof attestation -- are the same regardless of whether you are matching fingerprints or auditing neural networks.
The Compliance Problem: You Must Audit What You Cannot See
HIPAA requires covered entities to implement audit controls that record and examine activity in systems containing electronic protected health information (ePHI). SOX Section 404 requires management to assess the effectiveness of internal controls over financial reporting, which now includes AI systems that make financial decisions. The EU AI Act classifies many AI applications as high-risk and mandates conformity assessments that include reviewing training data, model architecture, and decision outputs.
Every one of these frameworks assumes the auditor will have access to the data. HIPAA's audit trail provisions assume someone can read the logs. SOX's internal control assessments assume someone can examine the financial records. The EU AI Act's conformity assessments assume someone can review the training data. But the data itself is often more sensitive than the audit finding. Showing an auditor a patient's medical records to verify an AI drug interaction check creates a new HIPAA exposure. Giving an auditor access to raw financial data to verify a loan decision creates a new insider threat vector.
The traditional solution is contractual: the auditor signs an NDA, gets a background check, and receives access. This is a legal fiction that satisfies the letter of compliance requirements while ignoring the reality that every access point is an attack surface. The auditor's laptop gets compromised. The audit firm's cloud storage gets breached. The auditor themselves turns malicious. These are not theoretical risks -- they are documented incidents that have resulted in massive data exposures.
FHE: Computing on Encrypted Data
Fully homomorphic encryption allows arbitrary computation on ciphertexts. Encrypt data, perform additions and multiplications on the encrypted values, decrypt the result, and the result matches what you would have gotten by computing on the plaintext. This property means an auditor can run verification logic on encrypted data and get a correct answer about whether the AI complied with policy, without ever seeing the underlying data in plaintext.
H33 implements three FHE schemes, each suited to different audit workloads. BFV (Brakerski-Fan-Vercauteren) handles exact integer arithmetic, which is ideal for auditing decision trees, threshold checks, and discrete classification outputs. CKKS (Cheon-Kim-Kim-Song) handles approximate floating-point arithmetic, which is necessary for auditing neural network inference where weights and activations are real-valued. TFHE (Torus Fully Homomorphic Encryption) handles boolean gate-by-gate computation, which is needed for auditing bit-level operations like encrypted comparisons and conditional logic.
For AI auditing, the typical workflow uses CKKS for the model inference step and BFV for the compliance check step. The neural network runs on encrypted inputs using CKKS approximate arithmetic. The output of the neural network -- still encrypted -- is then fed into a BFV compliance checker that performs exact threshold comparisons: is the loan-to-income ratio above the regulatory limit? Is the drug interaction severity above the alert threshold? Is the insurance premium within the approved range? The compliance checker produces an encrypted boolean result that, when decrypted, tells the auditor "compliant" or "non-compliant" without revealing the input data, the model weights, or the intermediate computations.
Batched FHE for Audit Throughput
A single audit engagement might need to verify millions of AI decisions. Processing them one at a time through FHE would be impractical. H33's BFV implementation uses 4,096 SIMD (Single Instruction Multiple Data) slots per ciphertext, which means a single FHE operation processes 4,096 values simultaneously. For audit workloads, this means you can verify 4,096 loan decisions, drug interaction checks, or premium calculations in a single encrypted computation cycle.
The batch processing capability transforms FHE auditing from a theoretical exercise into a practical tool. Instead of taking days to verify a quarter's worth of AI decisions, the batched pipeline can process millions of decisions per hour. H33's production pipeline processes 32 users per ciphertext batch through the full pipeline (FHE + ZKP + PQ signing) in under 1,400 microseconds. Scale that to 4,096 SIMD slots for pure audit workloads where the pipeline stages differ, and the throughput is sufficient for even the largest financial institutions' quarterly audit requirements.
STARK Proofs: Verifying Correct Execution
FHE ensures the auditor cannot see the data. But how does the auditor know the AI actually ran the correct algorithm? What prevents the AI operator from encrypting fake results and claiming they came from the real model? This is where STARK proofs enter the picture.
A STARK (Scalable Transparent ARgument of Knowledge) proof is a cryptographic construction that proves a computation was performed correctly without revealing the inputs or intermediate values. The prover (the AI operator) executes the computation and generates a proof. The verifier (the auditor) checks the proof in time that is polylogarithmic in the size of the computation. This means verifying that a million-step neural network inference was performed correctly takes only milliseconds, even though re-running the computation would take much longer.
STARKs have two critical properties for AI auditing. First, they are transparent: they do not require a trusted setup ceremony. SNARK-based proof systems (like Groth16) require generating a common reference string that, if compromised, allows the prover to generate fake proofs. STARKs eliminate this risk entirely. The auditor does not need to trust anyone except the mathematics. Second, STARKs are believed to be quantum-resistant because they rely on collision-resistant hash functions rather than elliptic curve pairings. This means the proof cannot be forged even by an adversary with a quantum computer.
The Audit Proof Pipeline
In H33's pipeline, the STARK proof covers the entire computation chain: encryption of the input, execution of the AI model on the encrypted data, extraction of the compliance check result, and generation of the output. The proof attests that every step followed the declared algorithm. If the AI operator substitutes a different model, changes a threshold, or tampers with an intermediate result, the proof will not verify.
The verification step uses cached lookups through H33's CacheeEngine, which brings verification latency down to sub-microsecond levels for previously verified proof patterns. This caching is sound because STARK proofs are deterministic: the same computation on the same inputs always produces the same proof. If a proof matches a cached verification result, the cached result is correct. This optimization is what makes STARK-verified auditing practical for high-volume workloads rather than a one-off compliance exercise.
Three-Family Attestation: Quantum-Proof Audit Trails
The audit trail must survive longer than the audit itself. A SOX audit trail must be retained for seven years. HIPAA audit logs must be retained for six years. Some regulatory frameworks require retention for the life of the organization plus additional years. Any audit trail signed with classical cryptography (RSA, ECDSA) will be forgeable within that retention period if a cryptographically relevant quantum computer becomes operational.
H33's three-family post-quantum attestation addresses this by signing every audit result with three independent signature algorithms: ML-DSA-65 (based on MLWE lattice problems), FALCON-512 (based on NTRU lattice problems), and SLH-DSA-SHA2-128f (based on stateless hash functions). Each family rests on a different mathematical hardness assumption. An adversary must break all three simultaneously to forge an audit record. This is not merely defense in depth at the implementation level -- it is defense in depth at the mathematical level.
The H33-74 substrate distills the three-family signature bundle into 74 bytes: 32 bytes committed on-chain and 42 bytes stored in Cachee for verification. This distillation -- not compression, because the original signatures cannot be recovered from the 74 bytes -- preserves the property that the attestation fails only if all three hardness assumptions fail simultaneously. For audit trail storage, the difference between storing 21,087 bytes of raw signatures per audit record versus 74 bytes per record is the difference between a viable archive and an impractical one when you are retaining millions of records over a multi-year compliance horizon.
HIPAA: Encrypted AI in Healthcare
HIPAA's Privacy Rule requires that covered entities limit uses and disclosures of PHI to the minimum necessary. The Security Rule requires encryption of ePHI in transit and at rest. The Breach Notification Rule requires reporting when unsecured PHI is accessed by an unauthorized person. FHE-based AI auditing satisfies all three requirements simultaneously.
Consider a concrete scenario. A hospital's AI system reviews medication orders and flags potential drug interactions. The system processes patient medication lists, allergy data, genomic markers, and current diagnoses. A compliance auditor needs to verify that the AI correctly flagged interactions above a severity threshold and that it did not produce false negatives that could harm patients.
With FHE-based auditing, the patient data is encrypted before it enters the audit pipeline. The auditor runs the compliance verification logic on the encrypted data. The verification checks whether the AI's flagging decisions matched the expected outcomes based on the drug interaction database. The output is an encrypted compliance result that, when decrypted by the hospital (not the auditor), produces a report showing which records passed and which failed the compliance check. The auditor never sees any PHI. There is no breach notification obligation because no unsecured PHI was accessed. The minimum necessary standard is trivially satisfied because the auditor accessed exactly zero patient records.
SOX: Financial AI Under Internal Control
SOX Section 404 applies to any material financial process, and AI systems that approve loans, detect fraud, set prices, or allocate capital are increasingly material. The auditor must assess whether the AI's internal controls are effective, which means verifying that the model operates as documented, that changes are properly authorized, and that outputs fall within expected parameters.
FHE-based auditing allows the financial institution to demonstrate control effectiveness without exposing customer financial data to the auditor. The encrypted verification checks model version hashes (proving the production model matches the approved model), input validation rules (proving the model rejected out-of-range inputs), output bounds (proving no loan exceeded the authorized limit), and fairness metrics (proving protected-class outcomes fall within acceptable disparity ranges). Each check produces a cryptographic attestation that the control was effective, signed with three-family post-quantum signatures that will remain valid for the duration of the retention period.
The EU AI Act: Conformity Assessment Without Data Exposure
The EU AI Act requires conformity assessments for high-risk AI systems that include reviewing training data quality, testing model accuracy and robustness, and verifying bias mitigation measures. These assessments traditionally require the conformity assessment body to access the training data, which may include personal data of EU residents subject to GDPR protection.
FHE-based conformity assessment resolves the tension between the AI Act's transparency requirements and GDPR's data minimization principle. The assessment body runs statistical tests on encrypted training data: distribution checks, class balance verification, representativeness measures, and outlier detection. The results are encrypted compliance scores that, when decrypted by the AI provider, demonstrate conformity without the assessment body ever accessing personal data. The STARK proof attached to each assessment result proves the statistical tests were computed correctly, preventing the AI provider from fabricating conformity scores.
Building the Audit Pipeline: Architecture
The complete encrypted AI audit pipeline has five stages. Stage one is encryption: the data owner encrypts the AI model's input-output pairs under their FHE public key. Stage two is audit logic deployment: the auditor defines compliance checks as FHE-compatible circuits and deploys them to the computation environment. Stage three is encrypted execution: the compliance checks run on the encrypted data, producing encrypted results. Stage four is proof generation: a STARK proof is generated covering the entire computation, attesting to correct execution. Stage five is attestation: the encrypted results and STARK proof are signed with three-family post-quantum signatures and distilled into a 74-byte H33-74 attestation.
The data owner decrypts the results to see the compliance report. The auditor receives the STARK proof and the attestation. The auditor can verify that the computation was correct (via the STARK proof) and that the result has not been tampered with (via the attestation) without ever holding the decryption key. This is the critical architectural property: the auditor is a verifier, not a data holder. The auditor's compromise exposes proof verification metadata, not patient records or financial data.
Handling Model Opacity
Some AI audit scenarios require verifying model behavior without revealing the model itself. The AI model may be proprietary intellectual property that the operator does not want to share with the auditor. FHE handles this too: encrypt the model weights alongside the data. The compliance checks run on encrypted model outputs without the auditor seeing the weights, architecture, or hyperparameters. The auditor verifies behavioral properties (accuracy, fairness, robustness) rather than structural properties (layer count, activation functions, training procedure). This is often what regulators actually care about: does the system produce acceptable outcomes, regardless of how it achieves them?
Practical Limitations and Honest Tradeoffs
FHE-based auditing is not free. The computational overhead of FHE operations compared to plaintext is substantial -- typically 1,000x to 1,000,000x depending on the operation and parameter set. This is why H33's batch processing with 4,096 SIMD slots is essential: it amortizes the FHE overhead across thousands of simultaneous computations. For offline audit workloads where latency is measured in hours rather than milliseconds, this overhead is acceptable. For real-time audit of streaming AI decisions, it requires careful pipeline design and sufficient compute capacity.
The circuit complexity of the compliance checks also matters. Simple threshold comparisons encrypt efficiently. Complex statistical tests with branching logic are harder. TFHE handles arbitrary boolean circuits but at lower throughput than BFV or CKKS. The practical approach is to decompose complex audits into simple sub-checks that each operate within the efficient regime of their respective FHE scheme, then combine the results in a final aggregation step.
Finally, FHE-based auditing does not replace human judgment in the audit process. It replaces data access. The auditor still needs domain expertise to design the right compliance checks, interpret the results, and make recommendations. FHE makes the auditor's job possible without making the auditor a data breach risk. That is a meaningful improvement over the status quo, even if it is not a complete automation of the audit function.
The Attestation Chain: From AI Decision to Immutable Record
Every audit result passes through H33's attestation chain. The FHE computation result is verified by the STARK proof. The STARK proof is signed with three-family post-quantum signatures. The signatures are distilled into a 74-byte H33-74 substrate. The substrate is anchored on-chain for immutability. This chain creates an unbroken cryptographic link from the original AI decision to an immutable, quantum-resistant audit record. Seven years from now, when the SOX retention period is still in effect and quantum computers may be operational, the audit trail will still verify.
This is the full value proposition of encrypted AI auditing: compliance today that remains valid tomorrow. Not merely meeting the current regulatory standard, but building audit infrastructure that survives the cryptographic transition to a post-quantum world. Organizations that build this capability now will have seven-year audit trails that are intact when quantum computers arrive. Organizations that wait will have seven-year audit trails that are forgeable from year one.
Contact support@h33.ai to discuss encrypted AI audit pipeline architecture for your compliance requirements.