# Production Readiness Report — First Time Travel Replay (L5)

**Proof ID:** `first-time-travel-replay`
**Subject:** Replay at distinct T values reconstructs substantively *different* state — different active policy versions, different active model versions, different decisions present — and each replay at the same T is byte-identical. Decisions carry causal lineage via `parent_decision_ids`. Replay confidence is scored 0–100 with honest acknowledgment of the open Phase E signature-verification gap.
**Date:** 2026-06-02
**Determination:** PROVEN IN OPERATION (scope: one time-travel tenant with 11 signed events including 2 model versions, 2 policy versions, 4 decisions with lineage; five distinct state_ids at five distinct T values; every replay byte-deterministic; replay confidence reported honestly)
**Version:** 1.0 (Final)

---

## Strict wording

L5 — Time Travel Replay. The L5 question is *"What did it look like on March 14, 2027 at 10:42:17 UTC when this decision was made?"* — not "what does it look like now." This proof extends the canonical event model with three new event kinds (`policy_amend`, `model_register`, `decision`), extends the replay engine to track policy versions, model versions, and decisions over time, and demonstrates that replay at different T values returns substantively different state with byte-identical determinism. Two cross-cutting additions Eric named June 2 2026 are folded in: **decision lineage** (causal chains via `parent_decision_ids`) and **replay confidence score** (0–100 completeness metric).

Backward compatibility is verified: replaying every prior tenant (L1 / L2 / L3 / L4 / V101 first proof / tokenized / etc.) under the extended engine produces byte-identical state_ids to the originally-published values.

---

## Three claims (the 10-second read)

1. **Five replays at five distinct T values produce five distinct state_ids** — each capturing the *exact* model versions, policy versions, and decisions in effect at that moment.
2. **Replays at the same T are byte-deterministic** — the time-travel snapshot is reproducible forever.
3. **Decisions carry causal lineage and replays carry a confidence score** — the human's approval (decision_002) references the AI's recommendation (decision_001) in `parent_decision_ids`; the replay confidence is honestly reported as 72/100, with the only failing check being signature verification (Phase E lock, L9 closes it).

---

## 01 — Problem

Static logs are not enough. A regulator three years from now will ask:

> *"What did the authority graph look like the moment this decision was made? Which model produced the recommendation? What was the policy text in effect? Who was the actor? Who approved?"*

Most platforms can't answer because the policy got amended last year, the model got retrained two months ago, the audit log doesn't link the decision to either. **H33 answers it because the canonical event log preserves every state transition and replay reconstructs the exact state at any T.**

This proof exercises the capability against a small underwriting tenant: one policy that gets amended, one model that gets retrained, four decisions across two model+policy generations.

---

## 02 — Environment

| Component | Detail |
|---|---|
| Reconstruction harness | `tests/time_travel_replay_001.rs` in `scif-backend` at SHA `d29cc7c33` |
| Storage | `PostgresEventLogSource` against `h33_production.canonical_auth_events` |
| Replay | `h33_xeon_api::agent_zero::astate_replay::replay_until` (extended to process `policy_amend`, `model_register`, `decision`) |
| New snapshot fields | `active_policy_versions` · `active_model_versions` · `decisions_up_to` (all `skip_serializing_if = empty` for backward compat) |
| New replay result field | `replay_confidence: Option<ReplayConfidence>` |
| Signing | Production PQ keys at `h33/production/canonical-event-signer` |
| DB CHECK constraint | extended to allow `policy_amend`, `model_register`, `decision` |

---

## 03 — The schema extensions (the L5 substrate)

### New canonical event kinds

| Event kind | Records |
|---|---|
| `policy_amend` | `(policy_id, prev_version, new_version, content_hash, amended_by, at_ms)` |
| `model_register` | `(model_id, version, weight_hash, training_fingerprint, registered_by, at_ms)` |
| `decision` | `(decision_id, actor_principal, capability, subject, model_version_ref, policy_version_ref, outcome, parent_decision_ids, at_ms)` |

### New snapshot fields (extending `AuthorityStateSnapshot`)

| Field | Type | Sort key |
|---|---|---|
| `active_policy_versions` | `Vec<PolicyVersionSnapshot>` | `(policy_id, version)` |
| `active_model_versions` | `Vec<ModelVersionSnapshot>` | `(model_id, version)` |
| `decisions_up_to` | `Vec<DecisionSnapshot>` | `(at_ms, decision_id)` |

All three carry `#[serde(default, skip_serializing_if = "Vec::is_empty")]`. Tenants that don't emit the new event kinds get empty vectors, which serialize as nothing, which preserves byte-for-byte state_id for all prior proofs.

### New `ReplayResult` field

`replay_confidence: Option<ReplayConfidence>` — score 0–100 + named checks with severity (Critical / Warning / Info). Weights: Critical = 3×, Warning = 2×, Info = 1×.

---

## 04 — The L5 scenario (11 events across one tenant)

| T (ms) | Event | What it does |
|---|---|---|
| 1780440000000 | `policy_register` | `pol_underwriting` declared |
| 1780440001000 | `model_register` | `model_underwriting v1` registered (`weight_hash`, `training_fingerprint`) |
| 1780440002000 | `grant` | root → `princ_customer_9` (approve_underwriting + delegate) |
| 1780440003000 | `grant` | human → `princ_ai_underwriter_001` (recommend_underwriting) |
| 1780440004000 | `policy_amend` | `pol_underwriting` v1 pinned with `content_hash` |
| **1780440005000** | **`decision_001`** | AI recommends `application_001` (bound to model v1, policy v1) |
| **1780440006000** | **`decision_002`** | Human approves `application_001`, consumes `decision_001` (lineage) |
| 1780440007000 | `policy_amend` | `pol_underwriting` v1→v2 (tightened threshold, +EU jurisdictions) |
| 1780440008000 | `model_register` | `model_underwriting v2` registered (retrained) |
| **1780440009000** | **`decision_003`** | AI recommends `application_002` (bound to model v2, policy v2) |
| **1780440010000** | **`decision_004`** | Human approves `application_002`, consumes `decision_003` (lineage) |

---

## 05 — Five replays · five distinct state_ids (the time travel itself)

| T | State_id | Policy versions | Model versions | Decisions present |
|---|---|---|---|---|
| **T₅** (after `decision_001`) | `1890b20c61daa91ed6079b0215f3c99c5b61d1e7031f9d8fc3ddeb91b72b0025` | 1 (v1) | 1 (v1) | 1 (decision_001) |
| **T₆** (after `decision_002`) | `70fdc855447623918c2ee1b5cdfd3550ca20273a4116a3cd4bd33892508b91e8` | 1 (v1) | 1 (v1) | 2 (incl. lineage) |
| **T₈** (after model v2 register) | `deb7f04a0e6bbf48cf3e68817a1aece75c7849da0380b3cc992eaa62f928eb60` | 2 (v1+v2) | 2 (v1+v2) | 2 |
| **T₁₀** (after `decision_004`) | `b07974aed797856dc47ca07f423124804a1096cb892294c57fb902db149cde50` | 2 | 2 | 4 |
| **T∞** (1800000000000, far future) | `0f0e51dd8c35d13d53de9b49c7e72f1926160b19f8f5d5e1b55f0c7cd1770c97` | 2 | 2 | 4 |

All five state_ids are byte-distinct. Replays at the same T are byte-identical across runs.

**Why T₁₀ and T∞ differ in state_id even though their content is identical:** `timestamp_t_ms` is a field in the snapshot. The CONTENT (decisions, model versions, policy versions) matches; the snapshot's *self-reported moment* differs. That's correct behavior — a regulator asking "what was true at T₁₀" gets a different artifact than a regulator asking "what is true now" even when nothing else has changed.

---

## 06 — Decision lineage (causal chains)

`decision_002.parent_decision_ids == ["decision_001"]` — the human's approval **consumed** the AI's recommendation.

`decision_004.parent_decision_ids == ["decision_003"]` — same pattern, second generation (model v2 / policy v2).

```json
{
  "decision_id": "decision_002",
  "actor_principal": "princ_customer_9",
  "capability": "approve_underwriting",
  "subject": "application_001",
  "model_version_ref": ["model_underwriting", 1],
  "policy_version_ref": ["pol_underwriting", 1],
  "outcome": "approved",
  "parent_decision_ids": ["decision_001"],
  "at_ms": 1780440006000
}
```

Replay walks the lineage automatically. Three years from now a regulator asks "why was application_001 approved?" — the chain is reconstructable: human approved (decision_002) ← AI recommended (decision_001) ← (model v1 + policy v1 in effect at T) ← (grant chain to root).

---

## 07 — Replay Confidence Score

The replay result emits a 0–100 completeness score with named checks:

```
Replay Confidence at T₁₀: 72/100

✓ authority_chain               (Critical)  All grants traced to root.
✗ signatures_verified_at_replay (Critical)  Phase E lock: AuthEvent.signature is
                                            stored but not verified at replay
                                            ingestion. Standalone L9 verifier
                                            will close this.
✓ policy_versions_present       (Warning)   Every decision's policy_version_ref
                                            resolves to a registered policy version.
✓ model_versions_present        (Warning)   Every decision's model_version_ref
                                            resolves to a registered model version.
✓ decision_lineage_resolves     (Info)      Every decision's parent_decision_ids
                                            resolve to earlier decisions.
```

**The honesty is the feature.** A regulator/insurer/auditor needs to know HOW COMPLETE a replay is, not just that it ran. The 72/100 score reflects the one open critical gap (Phase E signature verification) honestly. L9 — Evidence Survivability — closes that gap with a standalone verifier.

Weights: Critical = 3× · Warning = 2× · Info = 1×. Score = passed_weight × 100 / total_weight.

---

## 08 — Backward compatibility verified

Every prior proof's published `state_id` was reverified under the extended engine and matched byte-for-byte:

| Proof | Published state_id (head 8 / tail 4) | Reverified |
|---|---|---|
| V101 first proof (regulator-replay tenant) | `96a29047…be4a` | ✓ matches |
| Tokenized Transfer | `cc0d4369…b9b3` | ✓ matches |
| AI-Assisted Transfer (L1) | `1cbd6979…36840` | ✓ matches |
| Agent Authority Envelope (L2) | `b52fe565…dae66` | ✓ matches |
| Agent Supervisor Chain (L3) | `5aefda52…5026d` | ✓ matches |
| Tenant Hierarchy + ASL (L4) | `2a4bf5f6…6217` | ✓ matches |

The `skip_serializing_if = "Vec::is_empty"` and `skip_serializing_if = "Option::is_none"` attributes on the new fields keep canonical JSON byte-identical for tenants that don't emit the new event kinds.

---

## 09 — Known limitations

1. **Phase E lock — signatures are stored but not verified at replay ingestion.** Reported honestly as a failing Critical check in the Replay Confidence score. L9 — Evidence Survivability — closes this with a standalone verifier that signs+verifies independent of the H33 backend.
2. **Policy content is referenced by hash, not stored.** `policy_amend` carries `content_hash`. The full policy text must live in a content-addressable store the verifier can pull from. Not in scope for L5.
3. **Model weights are referenced by hash, not stored.** Same pattern.
4. **Reconstruction-only, not live decision issuance.** Decisions in this proof were seeded via the signing CLI for demonstration. A live deployment will issue decisions through the receipt-issuing service.
5. **ASL time-travel syntax (`descendants(p) AS OF T_past`) is not yet in v1.** The engine supports time-travel today; the ASL surface gets the syntax in v1.1.

---

## 10 — Evidence appendix

| Field | Value |
|---|---|
| Tenant ID | `tenant_time_travel_44962d9b-25f5-5622-bd9a-98d5580bb8a2` |
| Tenant root | `princ_root_time_travel_44962d9b-…` |
| Human actor | `princ_customer_9` |
| AI actor | `princ_ai_underwriter_001` |
| Policy | `pol_underwriting` (v1 → v2, content hashes published in `reconstruction.json`) |
| Model | `model_underwriting` (v1 → v2, weight + training hashes published) |
| Event count | 11 |
| Time-travel snapshots demonstrated | 5 |
| Decisions emitted | 4 (with lineage: 2 → 1, 4 → 3) |
| Reconstruction artifact | [`reconstruction.json`](reconstruction.json) |
| Harness | `tests/time_travel_replay_001.rs` (scif-backend @ `d29cc7c33`) |

---

## Independent reconstruction inputs

```bash
H33_TEST_PG_URL='postgres://…?sslmode=require' \
  cargo test --test time_travel_replay_001 -- --ignored --nocapture
```

Expected: all 5 listed state_ids match byte-for-byte; replay confidence at T₁₀ = 72; decision_002 lineage = `[decision_001]`; decision_004 lineage = `[decision_003]`.

---

## Readiness determination

> **First Time Travel Replay (L5): PROVEN IN OPERATION** for one underwriting tenant with 11 signed events, 2 model versions, 2 policy versions, 4 decisions with causal lineage, 5 distinct deterministic state_ids at 5 distinct T values, Replay Confidence honestly reported.

What this unlocks:
- L6 Counterfactual Replay (authority-counterfactual, on the same substrate)
- L7 Authority Drift Detection (graph diffs across T)
- L10 Regulator Mode (portal asks "what was true at T?" and gets a byte-identical answer)
- L11 Organizational Memory (`decisions_by(model_v4)` over years)
- **The meta-proof Eric named — Proof #12 First Replayable Organization** — frames the corpus's capability as describing a company as a reconstructable graph

What this does **not** unlock: live decision issuance through the receipt-issuing service (deferred); the ASL time-travel syntax (`AS OF T`); policy content payload storage; standalone independent verification (L9).

---

## Where this proof sits in the ladder

| Level | Proof | Status |
|---|---|---|
| L1 — Agent Recommendation | [first-ai-assisted-transfer](/proofs/first-ai-assisted-transfer/) | proven |
| L2 — Agent Authority Envelope | [first-agent-authority-envelope](/proofs/first-agent-authority-envelope/) | proven |
| L3 — Agent Supervisor Chain | [first-agent-supervisor-chain](/proofs/first-agent-supervisor-chain/) | proven |
| L4 — Tenant-Scoped Agent Hierarchy + ASL v1 | [first-tenant-agent-hierarchy](/proofs/first-tenant-agent-hierarchy/) | proven |
| **L5 — Time Travel Replay** | **this proof** | **proven now** |
| L6 — Counterfactual Replay | TBD | next horizon |
| L7 — Authority Drift Detection | TBD | roadmap |
| L8 — Blast Radius Live API | TBD | partial in ASL v1 |
| L9 — Evidence Survivability | TBD | the moat |
| L10 — Regulator Mode | TBD | depends on L5 |
| L11 — Organizational Memory | TBD | depends on L5 |

The next proof after L5 is **#12 First Replayable Organization** — the meta-proof reframing the entire corpus per Eric's positioning lock.

---

## Version

| Field | Value |
|---|---|
| Report version | v1.0 (Final) |
| Frozen | 2026-06-02 |
| Supersedes | None |
| Superseded by (planned) | `first-counterfactual-replay` (L6) · `first-replayable-organization` (#12 meta) |

---

*Issued by H33, Inc. — Eric Beans, CEO. Independently reconstructable per Section 10.*
