Home / Docs-Technical WhitePaper / 52-Dataset Card Template v1.0
Chapter 8 — Uncertainty & Covariance (Dataset Level)
I. Purpose & Scope
- Standardize modeling, registration, composition, and publication conventions for dataset-level uncertainty and covariance, across Splits/Versioning/Freshness scenarios, ensuring consistent statistical intervals, covariance blocks, and external release conventions.
- For path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, and the data side records delta_form ∈ {general, factored}; publication requires p_dim = 1.0 with check_dim_report.json attached.
II. Prerequisites & Inputs
- Structure & contract: schema.json/contract.yaml (Ch. 4) aligned with TARR, with units/dimensions defined.
- Splits/Versioning/Freshness: split.yaml/split_manifest.json (Ch. 6) and freshness.policy ready; expired samples isolated.
- Metrology & parameters: align cov_group/Σ and coverage ∈ {k, alpha, quantile} with the Error Budget; version/freshness aligned with the Parameter Card.
- Citations & versions: “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.
III. Dataset-Level UQ Modeling
- Delta (first-order): for field vector y=f(x), u^2(y) ≈ J · Σ · Jᵀ, with J=∂f/∂x|_{x̂}; use when near-linear and residuals near-Gaussian.
- MC/Bootstrap: B ≥ 10^4; for heavy tails/heteroscedasticity, use robust quantiles (e.g., P2.5–P97.5) or Huber surrogates; report convergence diagnostics.
- Composition:
- Uncorrelated: u_c = √(∑ u_i^2);
- Correlated: u_c^2 = ∑ u_i^2 + 2∑_{i<j} ρ_{ij}u_i u_j, or compose via covariance blocks.
- Coverage harmonization: choose one across data & publication — k coverage / alpha significance / quantile[p_lo,p_hi].
IV. Covariance Modeling
- Intra-group: within a cov_group, use block structures or kernels (exp(−|Δx|/L_c), AR(1), Matérn, constant ρ).
- Inter-group: default independent; if coupling exists (e.g., temperature coefficient vs refractive index), register cross-covariance explicitly and align with the Error Budget.
- Path functions: for n_eff(ell), build Σ(ℓ_i,ℓ_j) via path kernel K(Δℓ), specify σ^2/L_c, and honor step & alignment constraints from Ch. 4/5.
- Numerical stability: ensure Σ positive-definite; add jitter Σ ← Σ + εI (ε ≈ 1e−6~1e−3) if needed.
V. Normative Path Forms
- Arrival time (two equivalent):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ); T_arr = ( ∫ ( n_eff / c_ref ) d ell ). - Phase:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell ).
In text, explicitly show gamma(ell) and d ell; record delta_form; arrays satisfy len(gamma_ell)=len(d_ell)=len(n_eff)≥2.
VI. Coupling with Splits/Versioning/Freshness
- Per-split estimation: estimate u/Σ within train/val/test/holdout/slice_k separately to avoid cross-split leakage/bias.
- Version migration: for MAJOR/MINOR/PATCH upgrades, provide u/Σ change notes and migration/rollback; log deltas in audit.jsonl.
- Freshness impact: when clock_state="locked" and |ts_start − calib.timestamp| ≤ τ_calib, use baseline u/Σ; otherwise increase uncertainty or isolate as [Restricted].
VII. Field-Level Registration
Attach in data:- uncertainty{ type(A|B|A/B), estimate, distribution, coverage };
- cov_group (e.g., timing|optics|medium|algo|env|geo);
- (optional) cov_model{ kernel, params } and references[].
Field example (snippet)
obs:
T_arr:
unit: "s"
uq:
type: "A"
estimate: 1.5e-9
distribution: "normal"
coverage: { k: 2 }
cov_group: "medium"
Phi:
unit: "rad"
uq:
type: "A/B"
estimate: 0.012
distribution: "student"
coverage: { quantile: [0.025, 0.975] }
cov_group: "medium"
VIII. Gate Mapping (dataset level)
- G4 | Dimensional closure: pass I70-dim_check, p_dim = 1.0.
- G6 | Coverage: coverage ∈ {k, alpha, quantile} aligned with publication.
- G7 | Covariance consistency: cov_group/Σ aligned with Error Budget; Σ PD.
- G1/G3/G5/G8: accompany structure/path/freshness/uniqueness checks (Ch. 7).
- Trigger S1–S5 (dimensional/path/freshness/covariance/citation failures) to reject or [Restricted].
IX. Machine-Readable Configs
A. dataset_uq.yaml
version: "1.0.0"
targets: ["T_arr","Phi","epsilon_flux","Q_res","p_dim"]
methods:
T_arr: { type: "delta", jacobian: "auto", cov_group: "medium" }
Phi: { type: "mc", draws: 10000, coverage: { quantile: [0.025, 0.975] } }
covariance:
medium: { kernel: "exp", params: { sigma2: 9.0e-6, L_c_m: 25.0 } }
coverage:
mode: "k" # k|alpha|quantile
k: 2
split_scope: "per_split" # per_split|global|per_slice
freshness:
policy: { tau_calib_s_max: 86400, clock_state: "locked" }
outputs:
attach: ["uq_summary.json","cov_blocks.json"]
B. uq_summary.json (example)
{
"split": "test",
"T_arr": { "point": 1.23e-8, "U_k2": 1.5e-9 },
"Phi": { "median": 0.035, "q025": 0.028, "q975": 0.043 },
"epsilon_flux": { "p95": 0.011 },
"Q_res": 0.13
}
X. Validation & Monitoring
- /validate: output per-split summaries of u/Σ, coverage mode vs gate thresholds, PD checks, and stops_triggered.
- Online KPIs: u(T_arr), U = k·u_c, interval_overlap, spectral radius of Σ, Q_res, p_dim.
- Alerts: PD failure, interval instability, coverage mismatch, gate breaches; support suppression & escalation.
XI. Anti-Patterns & Fixes
- Anti: reporting means without intervals → Fix: add k/alpha/quantile intervals with convergence diagnostics.
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: parenthesize to the normative form.
- Anti: Σ not PD or inconsistent with cov_group → Fix: adjust kernel/params or switch to robust surrogate; align with Error Budget.
- Anti: inconsistent coverage between data & publication → Fix: unify mode & params.
XII. Cross-References
- Structure & Schema: Ch. 4; Splits/Versioning/Freshness: Ch. 6; Gates & Integrity: Ch. 7.
- Pipeline Card: UQ coupling (Ch. 10).
- Error Budget Card: covariance & propagation (Ch. 5/6), intervals & thresholds (Ch. 8/9).
XIII. Checklist
- dataset_uq.yaml declares methods (Delta/MC/Bootstrap), coverage, and split scope; field-level uncertainty and cov_group registered.
- For path quantities, explicit gamma/measure/delta_form; I70-dim_check passed, p_dim = 1.0.
- Σ PD and aligned with Error Budget; uq_summary.json passes gate comparisons.
- /validate lists intervals, PD check, and stops_triggered; online monitoring covers u/Σ/Q_res/p_dim.
- Release bundle includes uq_summary.json/cov_blocks.json/check_dim_report.json with signatures; citations & versions compliant (anchor coverage ≥ 90%).
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/