Home / Docs-Technical WhitePaper / 51-Pipeline Card Template v1.0
Chapter 7 — State, Idempotency & Fault Tolerance (Transactions / Retry / Replay)
I. Purpose & Scope
- Unify modeling and validation conventions for state machines, idempotency, and fault tolerance (transactions/retry/replay) to ensure recoverability and auditability under failures/jitter/redundant triggers.
- For stages involving path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, with delta_form ∈ {general, factored} recorded on the data side; publication requires p_dim = 1.0.
II. Prerequisites & Inputs
- Contracts complete: inbound/outbound schemas aligned with TARR, units/dimensions clear and passed I70-dim_check.
- Sync & timebase: Chapter 5 satisfied (clock_state="locked", |ts_start − calib.timestamp| ≤ τ_calib).
- Citations & versions: use “volume + version + anchor (P/S/M/I)”, with anchor coverage ≥ 90%.
- Path consistency: len(gamma_ell)=len(d_ell)=len(n_eff)≥2, Δell compliant; missing items are rejected.
III. State Machine
- States: state ∈ {pending, running, succeeded, failed, rolled_back}.
- Basic transitions:
- pending → running: all deps succeeded and gates G1–G3 passed.
- running → succeeded: stage gates G1–G8 all passed.
- running → failed: any of S1–S5 (dimensional/freshness/path/covariance/citation failure).
- failed → rolled_back: checkpoint or compensation exists and audit is complete.
- Invariants: transitions must not violate upstream artifact consistency; rollback must match checkpoints and be replayable.
IV. Transactions & Errors
- Transactional boundary: input read → compute → output write is one atomic commit; external side effects must be compensable.
- Error classes:
- E_INPUT (contract/type/window errors)
- E_DIM (dimensional failure)
- E_GATE (quality gate failure)
- E_SYNC (unlock/offset over threshold)
- E_UQ (non-PD covariance/coverage mismatch)
- E_INTERNAL (internal exception)
- Handling: compensable errors prefer rollback + retry; non-compensable errors mark failed and route to human review.
V. Idempotency
- Idempotency key: idempotency_key = f(run_id, partition, window, …); repeated triggers with the same key must not change outputs.
- Exactly-Once vs At-Least-Once: default At-Least-Once; declaring Exactly-Once requires dedupe evidence and idempotent writes (primary key/merge-write).
- Input snapshot: persist input_hashes[] to guarantee determinism under retry/replay.
- Side-effect isolation: external calls (API/DB) must be idempotent or have compensation logs (undo messages).
VI. Retry Strategy
- Policy fields: retry_policy = { max_retries, backoff ∈ {const, exp, jitter}, deadline }.
- Invariants: retries do not change idempotency_key or the input snapshot; exceeding deadline marks failed.
- Backoff guidance: exp + jitter to protect upstream/shared resources; record attempt count and intervals.
- With gates: for E_GATE, branch to degraded paths per thresholds; for E_DIM, forbid blind retries—fix then re-trigger.
VII. Replay & Checkpoint
- Checkpointing: enable checkpoint: true on critical stages, saving minimal sufficient state and output digest.
- Replay constraints: replay in reverse topological order; if side effects exist, compensate before replay; replay still must pass G1–G8.
- Consistency: replay outputs (hash/primary-key sets) must match the original; any divergence ends in rolled_back and is reported.
VIII. Path-Aware Requirements
- Required inputs: gamma(ell), d ell, n_eff(ell), c_ref, and (for phase) λ_ref; record delta_form.
- Unified forms:
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ); T_arr = ( ∫ ( n_eff / c_ref ) d ell );
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell ). - Alignment order: time alignment → path alignment (gamma_ell/d_ell/n_eff sync) → phase alignment (reference window).
IX. Gates & Stops
- G1 Schema completeness | G2 Citation compliance | G3 Path conventions | G4 Dimensional closure | G5 Freshness | G6 Coverage | G7 Covariance consistency | G8 Uniqueness.
- Triggering S1–S5 (dimensional/freshness/path/covariance/citation failure) stops and rolls back; enter [Restricted] when necessary.
X. Machine-Readable Configs
A. state_machine.yaml (extended)
version: "1.0.0"
states: [pending, running, succeeded, failed, rolled_back]
transitions:
- { from: pending, to: running, when: "deps_succeeded && G1..G3" }
- { from: running, to: succeeded, when: "G1..G8 && !S1..S5" }
- { from: running, to: failed, when: "any(S1..S5)" }
- { from: failed, to: rolled_back, when: "has_checkpoint && do_compensate" }
retry_policy: { max_retries: 3, backoff: "exp+jitter", deadline: "10m" }
idempotency_key: "run_id+partition+window"
B. replay_plan.yaml
version: "1.0.0"
replay:
enabled: true
order: "reverse_topology"
require_checkpoint: true
compensate_before_replay: true
verify:
hash: true
primary_keys: true
alerts:
on_divergence: ["page_ops","open_ticket"]
C. Audit event audit.jsonl (sample line)
XI. Validation & Monitoring
- /validate: return current stage state, gate pass rates, stops_triggered, and retry/replay stats; dimensional closure per check_dim_report.json.
- Online KPIs: Latency_P50/P95, Throughput, retry_count, replay_count, idempotency_conflicts, σ_y(τ), δt_abs, Q_res, p_dim.
- Alerts: state flapping, retry storms, non-PD covariance, path desync, unlock, and gate breaches; support suppression and escalation.
XII. Anti-Patterns & Fixes
- Anti: retry changes idempotency_key → Fix: fix key and keep input snapshot.
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: T_arr = ( ∫ ( n_eff / c_ref ) d ell ).
- Anti: writing when clock_state!="locked" → Fix: reject or fallback and tag [Restricted].
- Anti: replay without checkpoints → Fix: add checkpoints & compensation; or restrict to read-only dry runs.
XIII. Release & Layout
PTN_EXPORT/
configs/
state_machine.yaml
replay_plan.yaml
reports/
check_dim_report.json
validate_report.json
audit.jsonl
figs/
state_transitions.svg
retry_replay_timeline.pdf
report_manifest.yaml
SIGNATURE.asc
XIV. Cross-References
- Architecture & graph: Ch. 3; Inbound contracts: Ch. 4; Timebase/Sync/Buffering: Ch. 5; Stage control: Ch. 6; Gates/monitoring: Ch. 9; UQ loop: Ch. 10.
- Parameter / Error / Experimental Protocol templates: see respective chapters.
XV. Checklist
- State machine complete, transition conditions machine-verifiable; retry_policy, idempotency_key, checkpoint configured.
- Transaction boundary defined; external side effects compensable or idempotent; audit events complete.
- For path stages: explicit gamma/measure/delta_form; Δell & f_s constraints satisfied; phase aligned in reference window.
- I70-dim_check passed, p_dim = 1.0; clock_state="locked", τ_calib compliant.
- /validate passes G1–G8; on S1–S5, fallback & alerts defined; release bundle signed with checksums; citation anchor coverage ≥ 90%.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/