Home / Docs-Technical WhitePaper / 52-Dataset Card Template v1.0
Chapter 11 — Visualization, Benchmarks & Comparative Scoring (Bench/Score)
I. Purpose & Scope
- Standardize visualizations, benchmarks, and comparative scoring fields/charts/release conventions so that scale/distribution/quality and path profiles are consistent, benchmark tasks are reproducible, and scoring with gate mapping is transparent and auditable.
- For visualizations and scoring that involve path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, record delta_form ∈ {general, factored} on the data side; use parenthesized forms; publication requires p_dim = 1.0.
II. Prerequisites & Inputs
- Structure & contract: schema.json/contract.yaml (Ch. 4) consistent and passed I70-dim_check.
- Splits/Versioning/Freshness: split.yaml/split_manifest.json (Ch. 6) ready; freshness.policy active.
- Gate status: /validate passed G1–G8 (Ch. 7); tag non-compliant items [Restricted] where necessary.
- Metrology & coverage: aligned with Error Budget (cov_group/Σ, coverage ∈ {k, alpha, quantile}).
- Citations & versions: all figures and scoring manifests use “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.
III. Visualization Standards
- Formats: dual export for each figure (vector PDF/SVG and bitmap PNG/JPG), DPI ≥ 300; axes show explicit units (s, rad, 1, m, m/s, MB/s, etc.).
- Caption elements: see[]/version, dataset split, coverage mode (k/alpha/quantile); for path plots annotate Δell and delta_form.
- Minimal figure set:
- Scale & distribution: N/M overview, field hist/KDE, missingness heatmap.
- Time & freshness: timeline/watermark with clock_state, σ_y(τ).
- Path profiles: n_eff(ell) vs ell with T_arr/Phi interval bands.
- Quality & uncertainty: Q_res trends, U = k·u_c or quantile bands.
- Benchmarks & scoring: per-task bars/radars, total score Q with intervals.
Parentheses required: any division/integral/composite expression must use parentheses; path plots must explicitly show gamma(ell) and d ell.
IV. Benchmarks
- Tasks & data protocol: declare tasks (classification/regression/time-series/path/multimodal), split/sampling strategy, evaluation fields with units/dimensions.
- Comparability: align contracts & versions with public/internal benchmarks; if using public tasks, list mappings & differences.
- Statistical conventions: report point estimates and intervals (k/alpha/quantile) for each metric, plus convergence diagnostics for repeats/bootstrap.
- Leakage prevention: align time/entity/path consistency with split.yaml; forbid cross-split entity sharing.
V. Comparative Scoring
- Primary metrics (minimal): ΔT_arr (s), r_phi (1), ε_flux (1), p_dim (1), Q_res (1); optionally add scale/missingness and bias metrics.
- Normalization & mapping:
- Normalize: z_m = ( m − m_baseline ) / σ_baseline.
- Sigmoid: q_m = 1 / ( 1 + exp( a z_m + b ) ) (default a=1,b=0; flip sign if “higher is better”).
- Aggregate score: Q = ( ∑_i w_i q_{m_i} ) / ( ∑_i w_i ); specify weights w_i and sources.
- Decision thresholds (aligned with Ch. 7/8 and Pipeline Ch. 12):
- Positive: all core gates pass (e.g., |ΔT_arr| + U(T_arr) ≤ τ_T, LB(r_phi) ≥ r_phi_min, p_dim = 1.0, P95(ε_flux) ≤ guard) and Q ≥ Q_base + δQ_min.
- Negative/Restricted: otherwise tag [Restricted] and publish qualitative plots & diagnostics only.
VI. Normative Path Forms
- Arrival time (two equivalent):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
T_arr = ( ∫ ( n_eff / c_ref ) d ell ) - Phase accumulation:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell )
Record delta_form ∈ {general, factored} on the data side; arrays len(gamma_ell)=len(d_ell)=len(n_eff)≥2.
VII. Machine-Readable Configs
A. bench_plan.yaml
version: "1.0.0"
tasks:
- id: "bench-arrival"
split: "test"
metrics: ["DeltaT_arr_s","Q_res","p_dim"]
coverage: { mode: "k", k: 2 }
- id: "bench-phase"
split: "test"
metrics: ["r_phi","epsilon_flux"]
coverage: { mode: "quantile", p: [0.025,0.975] }
baseline:
id: "base-001"
version: "1.2.3"
weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }
B. scorecard.json (example)
{
"version": "1.0.0",
"baseline": { "id": "base-001", "Q": 0.62 },
"method": { "id": "ds-core", "Q": 0.78 },
"weights": { "DeltaT_arr_s": 0.35, "r_phi": 0.25, "epsilon_flux": 0.15, "p_dim": 0.15, "Q_res": 0.10 },
"metrics": {
"DeltaT_arr_s": { "mean": -2.3e-9, "std": 4.8e-9, "U_k2": 1.5e-9 },
"r_phi": { "value": 0.72, "lb95": 0.61, "ub95": 0.80 },
"epsilon_flux": { "median": 0.004, "p95": 0.011 },
"p_dim": 1.0,
"Q_res": 0.13
},
"decision": "pass",
"see": ["EFT.WP.Core.Equations v1.1:S20-1","Data.Benchmarks v1.0:PROTO"]
}
C. kpi_summary.csv (headers)
split,DeltaT_arr_s_mean,DeltaT_arr_s_Uk2,r_phi_lb95,r_phi_ub95,epsilon_flux_p95,p_dim,Q_res
test,-2.3e-9,1.5e-9,0.61,0.80,0.011,1.0,0.13
VIII. Gate Mapping
- G1 Schema completeness: fields for visualizations & scoring present.
- G2 Citation compliance: figure/table anchors with coverage ≥ 90%.
- G3 Path conventions: path arrays used in plots and scoring complete; step compliant.
- G4 Dimensional closure: check_dim_report.json passed.
- G6 Coverage: same mode as data (k/alpha/quantile).
- G7 Covariance consistency: scoring assumptions align with Error Budget; Σ PD.
- G8 Uniqueness: artifacts carry checksum & signature; versions match manifests.
IX. Anti-Patterns & Fixes
- Anti: reporting means without intervals → Fix: add U = k·u_c or quantile bands with convergence diagnostics.
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: parenthesize to normative form.
- Anti: undisclosed weights/mappings → Fix: declare w_i and coverage mode in bench_plan.yaml/scorecard.json.
- Anti: path plots without delta_form/Δell → Fix: complete captions and align with n_eff.
X. Release & Layout
DS_EXPORT/
figs/
scale_dist.pdf
missing_heatmap.svg
sync_health.pdf
path_profile.pdf
scorecard_bar.pdf
tables/
kpi_summary.csv
scorecard.csv
reports/
check_dim_report.json
validate_report.json
audit.jsonl
manifests/
report_manifest.yaml
SIGNATURE.asc
XI. Cross-References
- Structure & Schema: Ch. 4; Splits/Versioning/Freshness: Ch. 6; Gates: Ch. 7; Uncertainty & Covariance: Ch. 8.
- Pipeline Card: outputs & release (Ch. 12), gates & monitoring (Ch. 9).
- Error Budget Card: scoring conventions & thresholds (Ch. 8/Ch. 9).
XII. Checklist
- Dual exports for figures with axis units and see[]/version in captions; path plots include Δell and delta_form.
- bench_plan.yaml consistent with scorecard.csv/json; weights, intervals, and gate comparisons clear.
- Data splits & versions for scoring/benchmarks explicit; coverage.mode consistent with data.
- check_dim_report.json/validate_report.json/audit.jsonl/report_manifest.yaml and signatures complete.
- /validate passed with no S1–S5; for Restricted mode, outputs tagged [Restricted] with qualitative statements only.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/