Home / Docs-Technical WhitePaper / 54-Reproducibility Checklist Template v1.0
Chapter 9 — Metrics, Intervals & Gates (Alignment Decisions)
I. Purpose & Scope
- Define the metric set, coverage (interval) conventions, and quality gates, plus the alignment decision rules for reproduction, covering point estimates & intervals, tolerances & equivalence rules, gate–threshold mapping, and reporting format—so results are comparable, auditable, and releasable.
- For path quantities (arrival/phase), explicitly show gamma(ell) and d ell in text; record delta_form ∈ {general, factored} on the data side; parenthesize all expressions; publication requires p_dim = 1.0 with check_dim_report.json.
II. Inputs & Dependencies
- Depends on: Ch. 4 (Environment Lock), Ch. 5 (Data Snapshot), Ch. 6 (Weights/Params/Freshness), Ch. 7 (Scripts & Commands), Ch. 8 (Seeds/Randomness/Determinism).
- Cross-volume alignment: Error Budget Card (intervals/thresholds), Model Card Ch.7/Ch.8, Dataset Card Ch.11, Pipeline Card Ch.9.
- Citations use “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.
III. Metric Set
- Primary metrics: MAE, RMSE, task-specific AUC/ACC, Latency_P95, Throughput, Q_res (robust residual), p_dim (=1).
- Path-related (if applicable): ΔT_arr (s), r_phi (1), ε_flux (1).
- Statistical windows: explicitly annotate @window=… and strata (batch/device/region/slice_k), unify sampling & aggregation conventions.
IV. Intervals & Coverage
- Choose exactly one mode across the volume:
- k coverage (expanded): U = k·u_c;
- alpha confidence: t_{ν,1−α/2} or normal approximation;
- quantile[p_lo,p_hi] (e.g., [0.025, 0.975]).
- Reporting: every key metric must include point estimate + interval; figures use error bars/bands and caption the coverage mode & parameters.
- Small-sample DOF: for Delta, use Welch–Satterthwaite degrees of freedom.
V. Alignment Decisions & Tolerances
- Numeric tolerances: define τ_mae, τ_rmse, τ_auc, τ_lat, τ_thr, etc.; accept if |m_repro − m_ref| ≤ τ_m.
- Interval overlap: reproduction intervals must overlap the reference intervals or lie within the same coverage band; non-overlap ⇒ mismatch.
- Curve consistency: convergence/power/performance curves must lie within tolerance bands; provide Hausdorff/MAD or band-width deltas.
- Cross-platform: same platform/libs require bitwise or ULP ≤ N; cross-platform allows small numeric drift but must stay in the same coverage interval.
VI. Normative Path Forms
- Arrival (two equivalent):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
T_arr = ( ∫ ( n_eff / c_ref ) d ell ) - Phase accumulation:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell )
Before decisions, align time → path → phase; ensure len(gamma_ell)=len(d_ell)=len(n_eff)≥2; echo delta_form; require p_dim = 1.0.
VII. Gate Mapping & Decision
- G1 Schema completeness | G2 Citation compliance | G3 Path conventions | G4 Dimensional closure | G5 Freshness | G6 Coverage consistency | G7 Covariance consistency | G8 Uniqueness & acyclicity.
- Threshold examples (aligned with Error Budget / Model Card):
- |ΔT_arr| + U(T_arr) ≤ τ_T;
- LB(r_phi) ≥ r_phi_min;
- P95(ε_flux) ≤ ε_flux_guard;
- Latency_P95 ≤ SLA, Throughput ≥ SLO.
- Release rule: core gates pass and all key metrics (point + interval) meet thresholds → Pass; else Fail / [Restricted] (qualitative only).
VIII. Machine-Readable Specs
A. eval/compare_spec.yaml
version: "1.0.0"
coverage: { mode: "k", k: 2 } # k|alpha|quantile
metrics:
mae: { tolerance: 1.0e-4 }
auc: { tolerance: 2.0e-3 }
r_phi:{ lb95_min: 0.60 }
delta_t_arr_s: { guard: "tau_T_s" }
epsilon_flux_p95: { guard: 0.02 }
latency_p95_s: { guard: 0.200 }
rules:
interval_overlap_required: true
same_coverage_band_required: true
B. reports/validate_report.json (excerpt)
{
"gates":{"G1":true,"G2":0.94,"G3":true,"G4":true,"G5":true,"G6":true,"G7":true,"G8":true},
"metrics":{
"MAE":{"ref":0.0123,"repro":0.0124,"within_tol":true},
"Latency_P95_s":{"ref":0.182,"repro":0.188,"within_guard":true}
},
"intervals":{
"r_phi":{"ref":[0.61,0.80],"repro":[0.62,0.79],"overlap":true}
},
"decision":"pass"
}
C. Figure exports: figs/metric_curves.{pdf,png}, figs/interval_bands.{svg,png}—captions include units & coverage mode.
IX. Anti-Patterns & Fixes
- Anti: reporting means only, no intervals → Fix: add U = k·u_c or quantile bands with convergence diagnostics.
- Anti: T_arr = ∫ n_eff / c_ref d ell (no parentheses) → Fix: parenthesized unified form.
- Anti: cross-volume coverage mode mismatch → Fix: unify single mode and declare in manifests & captions.
- Anti: cross-platform results fall in different coverage bands → Fix: tighten tolerances or adopt stable/high-precision algorithms until aligned.
- Anti: unequal path array lengths or missing delta_form → Fix: equalize lengths and echo metadata.
X. Cross-References
- Ch. 3 (Layout & Artifacts), Ch. 5 (Data Snapshot), Ch. 6 (Weights/Params), Ch. 7 (Scripts), Ch. 8 (Seeds/Determinism), Ch. 10 (Reproduction Flow).
- Model Card Ch.7/Ch.8; Error Budget Card Ch.8/Ch.9; Dataset Card Ch.11; Pipeline Card Ch.9.
XI. Checklist
- compare_spec.yaml aligned with cross-volume conventions; coverage mode locked.
- All key metrics provide point + interval; tolerances & thresholds explicit; convergence diagnostics complete.
- Path alignment explicit gamma/measure/delta_form; len(path) ≥ 2, Δell compliant.
- check_dim_report.json passed, p_dim = 1.0; /validate passed G1–G8.
- Figures dual-exported with units, see[]/version, and coverage notes; non-compliances tagged [Restricted] and handled.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/