The HICI Formula: How We Score Software Without Seeing the Code

Every enterprise procurement team faces the same problem: how do you evaluate a vendor’s software quality before you buy it? The traditional answer is code review. The real answer is zero-knowledge evaluation. HICI makes it possible — and we’re publishing the formula because a standard isn’t a standard if only one company controls it.

Why We Built HICI
The Formula
The Security Cap
What Each Dimension Measures
The Grade Scale
Why It’s Zero-Knowledge
Why We Open-Sourced It
The Relationship to HATS
Get Your HICI Score

Why We Built HICI

Every enterprise procurement team faces the same problem: how do you evaluate a vendor’s software quality before you buy it?

The traditional answer is code review. Send your engineers into the vendor’s codebase for two weeks. Cost: $50K–$100K. Timeline: weeks. And it requires the vendor to hand over their source code — their most valuable intellectual property — to a potential customer who might not buy.

Most vendors refuse. Most buyers skip the review. Both sides lose.

HICI solves this by making code evaluation zero-knowledge. The vendor runs the evaluation locally. The code never leaves their machine. The output is a cryptographic proof that the evaluation ran correctly and a scored grade across six dimensions. The buyer sees the grade. The vendor keeps their code. The math proves it’s honest.

Why publish the formula? A scoring standard controlled by a single company isn’t a standard — it’s a product feature. We want HICI to be the S&P rating for software. That requires transparency. HICI is Apache 2.0. Anyone can audit it, fork it, or build on it.

The Formula

Six dimensions. Six weights. One score.

HICI = Σ(wᵢ × Dᵢ) for i ∈ {1..6}

Dimension	Symbol	Weight	Why This Weight
Code Quality	D₁	0.20	The foundation. Poor code quality compounds into every other dimension.
Security Posture	D₂	0.25	The highest weight. Security failures don’t degrade gracefully — they cascade. A single unpatched CVE or exposed secret can invalidate everything else.
Architecture	D₃	0.15	Good architecture makes everything else easier. Bad architecture makes everything else harder. But it’s fixable with effort.
Performance	D₄	0.15	Performance problems are real but bounded. A slow system is still a working system.
Compliance	D₅	0.15	Regulatory alignment matters for enterprise buyers. License compatibility, PII handling, audit trails.
Maintenance Risk	D₆	0.10	The lowest weight because it measures future risk, not current state. Important for long-term procurement decisions but shouldn’t dominate the score.

The full computation:

HICI_score = 0.20(D₁) + 0.25(D₂) + 0.15(D₃) + 0.15(D₄) + 0.15(D₅) + 0.10(D₆)

Each dimension produces a score from 0 to 100. The weighted sum produces the final HICI score. The weights sum to 1.00. The output is a single number on a 0–100 scale that maps to a letter grade.

The Security Cap

There’s one override rule: Security Posture (D₂) can cap the entire score.

Security Override Rules

If D₂ < 70: overall grade capped at C+ (max 69), regardless of other scores
If D₂ < 50: automatic F (Failed)

Why? Because a codebase with 95/100 on everything except security is not a good codebase. It’s a well-architected, high-performance, compliant liability. Security is the one dimension where “good enough everywhere else” doesn’t compensate.

Consider a vendor whose code is beautifully structured, thoroughly tested, well-documented, and lightning fast — but ships with hardcoded API keys, three unpatched critical CVEs, and no input validation on user-facing endpoints. That software is a breach waiting to happen. The security cap ensures the HICI score reflects reality: no amount of architectural elegance compensates for an open front door.

What Each Dimension Measures

Each dimension runs as a deterministic evaluation circuit. Same code, same score, every time. The circuits are open-source and hash-pinned — you can verify that the circuit that produced a score is the same circuit published in the HICI repository.

D₁ — Code Quality (w = 0.20)

The foundation dimension. Measures the baseline health of the codebase through static analysis metrics:

Cyclomatic complexity distribution — functions above threshold, weighted by call frequency
Test coverage — line coverage and branch coverage, both measured
Documentation coverage — public API surface documentation ratio
Linting conformance — language-specific rulesets (ESLint, Clippy, pylint, etc.)
Dead code ratio — unreachable paths, unused exports, orphaned functions
Dependency freshness — outdated vs current, weighted by severity of available updates

These are static analysis metrics. The circuit runs deterministically — same code, same score, every time.

D₂ — Security Posture (w = 0.25)

The heaviest dimension. Security failures cascade in ways that other problems don’t:

Known vulnerability count — CVE database cross-reference against all dependencies
Secret exposure detection — API keys, tokens, credentials in code or config files
Authentication strength — algorithm currency, key sizes, password hashing algorithms
Encryption algorithm currency — is the crypto post-quantum ready? Are deprecated algorithms still in use?
Input validation coverage — injection surface area across all entry points
Dependency supply chain risk — typosquatting detection, abandoned maintainers, known-compromised packages

The encryption currency sub-metric is where H33’s expertise is unique. We know which algorithms survive quantum and which don’t — because we built the replacements.

D₃ — Architecture (w = 0.15)

Measures structural quality — the decisions that are expensive to change later:

Separation of concerns — module boundary clarity and coupling metrics
API surface consistency — naming conventions, versioning, error handling patterns
Error handling patterns — catch-all vs specific, logged vs silent, propagation depth
Logging hygiene — sensitive data in logs, structured vs unstructured output
Configuration management — hardcoded values, environment separation, secret injection
Circular dependency count — module-level and package-level cycles

D₄ — Performance (w = 0.15)

Measures efficiency characteristics that affect production behavior:

Algorithmic complexity hotspots — O(n²)+ patterns in hot paths identified via call graph analysis
N+1 query patterns — database, API, and file I/O loop patterns
Memory allocation patterns — leak potential, unnecessary copies, allocation in hot loops
Concurrency safety — race conditions, deadlock potential, lock contention patterns
Response time characteristics — blocking I/O in request paths, synchronous calls in async contexts

D₅ — Compliance (w = 0.15)

Measures regulatory readiness — critical for enterprise procurement decisions:

License compatibility — GPL contamination, commercial restrictions, license chain analysis
Data handling patterns — PII detection using the same pattern library H33 uses in FHE PII encryption
Regulatory framework markers — SOC 2 control points, HIPAA safeguards, GDPR data flow patterns
Audit trail completeness — are decisions logged? Are logs tamper-evident? Is there a retention policy?

D₆ — Maintenance Risk (w = 0.10)

Measures long-term sustainability — the lowest weight because it measures future risk, not current state:

Bus factor — contributor concentration. Is one person responsible for everything?
Dependency age and abandonment risk — last update date, maintainer activity, fork ratio
Breaking change frequency — how often does the API surface change?
Migration debt — deprecated dependencies, unsupported runtimes, EOL frameworks
Test brittleness — flaky test detection, snapshot over-reliance, time-dependent tests

The Grade Scale

The weighted score maps to a letter grade that procurement teams can compare across vendors:

Score	Grade	What It Means
95–100	A+	Exceptional — best-in-class across all dimensions
90–94	A	Excellent — strong in every dimension, no significant gaps
85–89	A-	Strong — minor items noted, none blocking
80–84	B+	Good — production-ready with noted items
75–79	B	Acceptable — meets requirements with room for improvement
70–74	B-	Adequate — meets minimums, improvement recommended
65–69	C+	Below average — notable gaps in multiple dimensions
60–64	C	Marginal — significant concerns for enterprise deployment
55–59	C-	Poor — substantial remediation required
50–54	D	Failing — critical issues across multiple dimensions
0–49	F	Failed — not suitable for production deployment

Note on the security cap: A codebase can score 92 on the weighted formula and still receive a C+ if its Security Posture dimension falls below 70. The cap overrides the formula. This is intentional — security is the one axis where “great everywhere else” is not sufficient.

Why It’s Zero-Knowledge

The HICI CLI runs entirely on the vendor’s machine. The code never leaves. No cloud upload, no API call that transmits source code, no temporary storage on third-party infrastructure. The evaluation happens locally. What gets transmitted to the buyer is three things:

Merkle root — a single SHA3-256 hash committing to the exact repository state. Change one line of code and the hash changes. But the hash reveals nothing about the code itself. It’s a commitment, not a disclosure.
STARK proof — proves that the evaluation circuit (which is open-source and hash-pinned) ran correctly on the committed codebase. The proof is valid if and only if the circuit produced the claimed scores from the claimed codebase. SHA3-256 hash-based, no trusted setup, post-quantum secure.
H33-3-Key signature — the grade is signed with three independent signature families: Ed25519 (elliptic curve), Dilithium (lattice-based), and FALCON (NTRU-based). Breaking the signature requires breaking elliptic curves AND lattice cryptography AND NTRU simultaneously. This is the same nested hybrid signature scheme used in H33’s production authentication pipeline.

The buyer receives: a grade, a proof, and a signature. Not code. Not snippets. Not metadata about file names, directory structures, or implementation details. Math.

What this means in practice

A vendor can prove their code scores 91/100 (A grade) without revealing a single line of source code. The buyer can verify the proof is valid — that the open-source evaluation circuit actually produced that score from the committed codebase — without any access to the code. The Merkle root pins the exact codebase version. The STARK proof pins the evaluation. The 3-Key signature pins the attestation. If any component is tampered with, the verification fails.

This is why HICI changes procurement. The old model forced a binary choice: either the vendor exposes their IP or the buyer skips due diligence. HICI creates a third option where both sides get what they need. The vendor keeps their code private. The buyer gets a cryptographically verified quality assessment. The math eliminates the trust problem.

Why We Open-Sourced It

A scoring standard controlled by a single company isn’t a standard — it’s a product feature. We want HICI to be the S&P rating for software. That requires transparency.

The formula is published. The evaluation circuits are open-source. The CLI is Apache 2.0. Anyone can:

Audit the scoring logic — every metric, every weight, every threshold is inspectable
Propose improvements via pull request — the standard evolves with community input
Run evaluations locally without H33 infrastructure — the CLI is self-contained
Fork the methodology for domain-specific needs — healthcare, fintech, government, defense

What H33 provides on top of the open standard: hosted proof pages, STARK attestation infrastructure, 3-Key signing, and the ZK-Procure procurement platform. The standard is free. The infrastructure is a product.

The incentive structure

Open-sourcing the formula aligns incentives. Vendors trust the evaluation because they can read the code that evaluates them. Buyers trust the grade because the circuit is auditable and the proof is verifiable. H33 benefits because adoption of the standard drives demand for the attestation infrastructure. Everyone wins when the methodology is transparent.

The Relationship to HATS

HATS (H33 AI Trust Standard) and HICI serve different but complementary purposes. HATS is a publicly available technical conformance standard for continuous AI trustworthiness; certification under HATS provides independently verifiable evidence that a system satisfies the standard’s defined controls.

HICI evaluates whether the code itself is well-built.

HATS asks: “Are the security controls running correctly in production?”
HICI asks: “Is the code good enough to deploy in the first place?”

They’re complementary. A system can be HATS-certified (controls are operating correctly) with a poor HICI score (the underlying code has technical debt). Or it can have a perfect HICI score (clean code) but fail HATS certification (controls aren’t properly instrumented).

Enterprise procurement teams should look at both. HICI tells you whether the software is well-engineered. HATS tells you whether it’s well-operated. Together, they give you the complete picture: code quality and operational trustworthiness, both cryptographically verified.

Get Your HICI Score

Run a HICI assessment through ZK-Procure. Your code stays on your machine. The evaluation circuit runs locally. The proof is generated locally. The only thing transmitted is the grade, the proof, and the signature.

The math speaks for itself.

Score Your Codebase

Run a HICI assessment through ZK-Procure. Zero-knowledge evaluation — your code never leaves your machine.

Start Assessment

Resources

ZK-Procure — Run your assessment
HICI Standard — Full specification
GitHub: h33ai-postquantum/hici — Open-source evaluation circuits

The HICI Formula:
How We Score Software Without Seeing the Code

Table of Contents

Why We Built HICI

The Formula

The Security Cap

Security Override Rules

What Each Dimension Measures

D₁ — Code Quality (w = 0.20)

D₂ — Security Posture (w = 0.25)

D₃ — Architecture (w = 0.15)

D₄ — Performance (w = 0.15)

D₅ — Compliance (w = 0.15)

D₆ — Maintenance Risk (w = 0.10)

The Grade Scale

Why It’s Zero-Knowledge

What this means in practice

Why We Open-Sourced It

The incentive structure

The Relationship to HATS

Get Your HICI Score

Score Your Codebase

Build With Post-Quantum Security

Table of Contents

Why We Built HICI

The Formula

The Security Cap

Security Override Rules

What Each Dimension Measures

D₁ — Code Quality (w = 0.20)

D₂ — Security Posture (w = 0.25)

D₃ — Architecture (w = 0.15)

D₄ — Performance (w = 0.15)

D₅ — Compliance (w = 0.15)

D₆ — Maintenance Risk (w = 0.10)

The Grade Scale

Why It’s Zero-Knowledge

What this means in practice

Why We Open-Sourced It

The incentive structure

The Relationship to HATS

Get Your HICI Score

Score Your Codebase

Build With Post-Quantum Security

Related Articles