Every enterprise procurement team faces the same problem: how do you evaluate a vendor’s software quality before you buy it? The traditional answer is code review. The real answer is zero-knowledge evaluation. HICI makes it possible — and we’re publishing the formula because a standard isn’t a standard if only one company controls it.
Table of Contents
Why We Built HICI
Every enterprise procurement team faces the same problem: how do you evaluate a vendor’s software quality before you buy it?
The traditional answer is code review. Send your engineers into the vendor’s codebase for two weeks. Cost: $50K–$100K. Timeline: weeks. And it requires the vendor to hand over their source code — their most valuable intellectual property — to a potential customer who might not buy.
Most vendors refuse. Most buyers skip the review. Both sides lose.
HICI solves this by making code evaluation zero-knowledge. The vendor runs the evaluation locally. The code never leaves their machine. The output is a cryptographic proof that the evaluation ran correctly and a scored grade across six dimensions. The buyer sees the grade. The vendor keeps their code. The math proves it’s honest.
The Formula
Six dimensions. Six weights. One score.
HICI = Σ(wᵢ × Dᵢ) for i ∈ {1..6}
| Dimension | Symbol | Weight | Why This Weight |
|---|---|---|---|
| Code Quality | D₁ | 0.20 | The foundation. Poor code quality compounds into every other dimension. |
| Security Posture | D₂ | 0.25 | The highest weight. Security failures don’t degrade gracefully — they cascade. A single unpatched CVE or exposed secret can invalidate everything else. |
| Architecture | D₃ | 0.15 | Good architecture makes everything else easier. Bad architecture makes everything else harder. But it’s fixable with effort. |
| Performance | D₄ | 0.15 | Performance problems are real but bounded. A slow system is still a working system. |
| Compliance | D₅ | 0.15 | Regulatory alignment matters for enterprise buyers. License compatibility, PII handling, audit trails. |
| Maintenance Risk | D₆ | 0.10 | The lowest weight because it measures future risk, not current state. Important for long-term procurement decisions but shouldn’t dominate the score. |
The full computation:
Each dimension produces a score from 0 to 100. The weighted sum produces the final HICI score. The weights sum to 1.00. The output is a single number on a 0–100 scale that maps to a letter grade.
The Security Cap
There’s one override rule: Security Posture (D₂) can cap the entire score.
Security Override Rules
- If D₂ < 70: overall grade capped at C+ (max 69), regardless of other scores
- If D₂ < 50: automatic F (Failed)
Why? Because a codebase with 95/100 on everything except security is not a good codebase. It’s a well-architected, high-performance, compliant liability. Security is the one dimension where “good enough everywhere else” doesn’t compensate.
Consider a vendor whose code is beautifully structured, thoroughly tested, well-documented, and lightning fast — but ships with hardcoded API keys, three unpatched critical CVEs, and no input validation on user-facing endpoints. That software is a breach waiting to happen. The security cap ensures the HICI score reflects reality: no amount of architectural elegance compensates for an open front door.
What Each Dimension Measures
Each dimension runs as a deterministic evaluation circuit. Same code, same score, every time. The circuits are open-source and hash-pinned — you can verify that the circuit that produced a score is the same circuit published in the HICI repository.
D₁ — Code Quality (w = 0.20)
The foundation dimension. Measures the baseline health of the codebase through static analysis metrics:
- Cyclomatic complexity distribution — functions above threshold, weighted by call frequency
- Test coverage — line coverage and branch coverage, both measured
- Documentation coverage — public API surface documentation ratio
- Linting conformance — language-specific rulesets (ESLint, Clippy, pylint, etc.)
- Dead code ratio — unreachable paths, unused exports, orphaned functions
- Dependency freshness — outdated vs current, weighted by severity of available updates
These are static analysis metrics. The circuit runs deterministically — same code, same score, every time.
D₂ — Security Posture (w = 0.25)
The heaviest dimension. Security failures cascade in ways that other problems don’t:
- Known vulnerability count — CVE database cross-reference against all dependencies
- Secret exposure detection — API keys, tokens, credentials in code or config files
- Authentication strength — algorithm currency, key sizes, password hashing algorithms
- Encryption algorithm currency — is the crypto post-quantum ready? Are deprecated algorithms still in use?
- Input validation coverage — injection surface area across all entry points
- Dependency supply chain risk — typosquatting detection, abandoned maintainers, known-compromised packages
The encryption currency sub-metric is where H33’s expertise is unique. We know which algorithms survive quantum and which don’t — because we built the replacements.
D₃ — Architecture (w = 0.15)
Measures structural quality — the decisions that are expensive to change later:
- Separation of concerns — module boundary clarity and coupling metrics
- API surface consistency — naming conventions, versioning, error handling patterns
- Error handling patterns — catch-all vs specific, logged vs silent, propagation depth
- Logging hygiene — sensitive data in logs, structured vs unstructured output
- Configuration management — hardcoded values, environment separation, secret injection
- Circular dependency count — module-level and package-level cycles
D₄ — Performance (w = 0.15)
Measures efficiency characteristics that affect production behavior:
- Algorithmic complexity hotspots — O(n²)+ patterns in hot paths identified via call graph analysis
- N+1 query patterns — database, API, and file I/O loop patterns
- Memory allocation patterns — leak potential, unnecessary copies, allocation in hot loops
- Concurrency safety — race conditions, deadlock potential, lock contention patterns
- Response time characteristics — blocking I/O in request paths, synchronous calls in async contexts
D₅ — Compliance (w = 0.15)
Measures regulatory readiness — critical for enterprise procurement decisions:
- License compatibility — GPL contamination, commercial restrictions, license chain analysis
- Data handling patterns — PII detection using the same pattern library H33 uses in FHE PII encryption
- Regulatory framework markers — SOC 2 control points, HIPAA safeguards, GDPR data flow patterns
- Audit trail completeness — are decisions logged? Are logs tamper-evident? Is there a retention policy?
D₆ — Maintenance Risk (w = 0.10)
Measures long-term sustainability — the lowest weight because it measures future risk, not current state:
- Bus factor — contributor concentration. Is one person responsible for everything?
- Dependency age and abandonment risk — last update date, maintainer activity, fork ratio
- Breaking change frequency — how often does the API surface change?
- Migration debt — deprecated dependencies, unsupported runtimes, EOL frameworks
- Test brittleness — flaky test detection, snapshot over-reliance, time-dependent tests
The Grade Scale
The weighted score maps to a letter grade that procurement teams can compare across vendors:
| Score | Grade | What It Means |
|---|---|---|
| 95–100 | A+ | Exceptional — best-in-class across all dimensions |
| 90–94 | A | Excellent — strong in every dimension, no significant gaps |
| 85–89 | A- | Strong — minor items noted, none blocking |
| 80–84 | B+ | Good — production-ready with noted items |
| 75–79 | B | Acceptable — meets requirements with room for improvement |
| 70–74 | B- | Adequate — meets minimums, improvement recommended |
| 65–69 | C+ | Below average — notable gaps in multiple dimensions |
| 60–64 | C | Marginal — significant concerns for enterprise deployment |
| 55–59 | C- | Poor — substantial remediation required |
| 50–54 | D | Failing — critical issues across multiple dimensions |
| 0–49 | F | Failed — not suitable for production deployment |
Why It’s Zero-Knowledge
The HICI CLI runs entirely on the vendor’s machine. The code never leaves. No cloud upload, no API call that transmits source code, no temporary storage on third-party infrastructure. The evaluation happens locally. What gets transmitted to the buyer is three things:
- Merkle root — a single SHA3-256 hash committing to the exact repository state. Change one line of code and the hash changes. But the hash reveals nothing about the code itself. It’s a commitment, not a disclosure.
- STARK proof — proves that the evaluation circuit (which is open-source and hash-pinned) ran correctly on the committed codebase. The proof is valid if and only if the circuit produced the claimed scores from the claimed codebase. SHA3-256 hash-based, no trusted setup, post-quantum secure.
- H33-3-Key signature — the grade is signed with three independent signature families: Ed25519 (elliptic curve), Dilithium (lattice-based), and FALCON (NTRU-based). Breaking the signature requires breaking elliptic curves AND lattice cryptography AND NTRU simultaneously. This is the same nested hybrid signature scheme used in H33’s production authentication pipeline.
The buyer receives: a grade, a proof, and a signature. Not code. Not snippets. Not metadata about file names, directory structures, or implementation details. Math.
What this means in practice
A vendor can prove their code scores 91/100 (A grade) without revealing a single line of source code. The buyer can verify the proof is valid — that the open-source evaluation circuit actually produced that score from the committed codebase — without any access to the code. The Merkle root pins the exact codebase version. The STARK proof pins the evaluation. The 3-Key signature pins the attestation. If any component is tampered with, the verification fails.
This is why HICI changes procurement. The old model forced a binary choice: either the vendor exposes their IP or the buyer skips due diligence. HICI creates a third option where both sides get what they need. The vendor keeps their code private. The buyer gets a cryptographically verified quality assessment. The math eliminates the trust problem.
Why We Open-Sourced It
A scoring standard controlled by a single company isn’t a standard — it’s a product feature. We want HICI to be the S&P rating for software. That requires transparency.
The formula is published. The evaluation circuits are open-source. The CLI is Apache 2.0. Anyone can:
- Audit the scoring logic — every metric, every weight, every threshold is inspectable
- Propose improvements via pull request — the standard evolves with community input
- Run evaluations locally without H33 infrastructure — the CLI is self-contained
- Fork the methodology for domain-specific needs — healthcare, fintech, government, defense
What H33 provides on top of the open standard: hosted proof pages, STARK attestation infrastructure, 3-Key signing, and the ZK-Procure procurement platform. The standard is free. The infrastructure is a product.
The incentive structure
Open-sourcing the formula aligns incentives. Vendors trust the evaluation because they can read the code that evaluates them. Buyers trust the grade because the circuit is auditable and the proof is verifiable. H33 benefits because adoption of the standard drives demand for the attestation infrastructure. Everyone wins when the methodology is transparent.
The Relationship to HATS
HATS (H33 AI Trust Standard) and HICI serve different but complementary purposes. HATS is a publicly available technical conformance standard for continuous AI trustworthiness; certification under HATS provides independently verifiable evidence that a system satisfies the standard’s defined controls.
HICI evaluates whether the code itself is well-built.
- HATS asks: “Are the security controls running correctly in production?”
- HICI asks: “Is the code good enough to deploy in the first place?”
They’re complementary. A system can be HATS-certified (controls are operating correctly) with a poor HICI score (the underlying code has technical debt). Or it can have a perfect HICI score (clean code) but fail HATS certification (controls aren’t properly instrumented).
Enterprise procurement teams should look at both. HICI tells you whether the software is well-engineered. HATS tells you whether it’s well-operated. Together, they give you the complete picture: code quality and operational trustworthiness, both cryptographically verified.
Get Your HICI Score
Run a HICI assessment through ZK-Procure. Your code stays on your machine. The evaluation circuit runs locally. The proof is generated locally. The only thing transmitted is the grade, the proof, and the signature.
The math speaks for itself.
Score Your Codebase
Run a HICI assessment through ZK-Procure. Zero-knowledge evaluation — your code never leaves your machine.
Start AssessmentResources
- ZK-Procure — Run your assessment
- HICI Standard — Full specification
- GitHub: h33ai-postquantum/hici — Open-source evaluation circuits