52-Dataset Card Template v1.0 | Chapter 11 — Visualization, Benchmarks & Comparative Scoring (Bench/Score)

Home ／ Docs-Technical WhitePaper (V6.0) ／ 52-Dataset Card Template v1.0

Chapter 11 — Visualization, Benchmarks & Comparative Scoring (Bench/Score)

I. Purpose & Scope

Standardize visualizations, benchmarks, and comparative scoring fields/charts/release conventions so that scale/distribution/quality and path profiles are consistent, benchmark tasks are reproducible, and scoring with gate mapping is transparent and auditable.
For visualizations and scoring that involve path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, record delta_form ∈ {general, factored} on the data side; use parenthesized forms; publication requires p_dim = 1.0.

II. Prerequisites & Inputs

Structure & contract: schema.json/contract.yaml (Ch. 4) consistent and passed I70-dim_check.
Splits/Versioning/Freshness: split.yaml/split_manifest.json (Ch. 6) ready; freshness.policy active.
Gate status: /validate passed G1–G8 (Ch. 7); tag non-compliant items [Restricted] where necessary.
Metrology & coverage: aligned with Error Budget (cov_group/Σ, coverage ∈ {k, alpha, quantile}).
Citations & versions: all figures and scoring manifests use “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.

III. Visualization Standards

Formats: dual export for each figure (vector PDF/SVG and bitmap PNG/JPG), DPI ≥ 300; axes show explicit units (s, rad, 1, m, m/s, MB/s, etc.).
Caption elements: see[]/version, dataset split, coverage mode (k/alpha/quantile); for path plots annotate Δell and delta_form.
Minimal figure set:
- Scale & distribution: N/M overview, field hist/KDE, missingness heatmap.
- Time & freshness: timeline/watermark with clock_state, σ_y(τ).
- Path profiles: n_eff(ell) vs ell with T_arr/Phi interval bands.
- Quality & uncertainty: Q_res trends, U = k·u_c or quantile bands.
- Benchmarks & scoring: per-task bars/radars, total score Q with intervals.

Parentheses required: any division/integral/composite expression must use parentheses; path plots must explicitly show gamma(ell) and d ell.

IV. Benchmarks

Tasks & data protocol: declare tasks (classification/regression/time-series/path/multimodal), split/sampling strategy, evaluation fields with units/dimensions.
Comparability: align contracts & versions with public/internal benchmarks; if using public tasks, list mappings & differences.
Statistical conventions: report point estimates and intervals (k/alpha/quantile) for each metric, plus convergence diagnostics for repeats/bootstrap.
Leakage prevention: align time/entity/path consistency with split.yaml; forbid cross-split entity sharing.

V. Comparative Scoring

Primary metrics (minimal): ΔT_arr (s), r_phi (1), ε_flux (1), p_dim (1), Q_res (1); optionally add scale/missingness and bias metrics.
Normalization & mapping:
- Normalize: z_m = ( m − m_baseline ) / σ_baseline.
- Sigmoid: q_m = 1 / ( 1 + exp( a z_m + b ) ) (default a=1,b=0; flip sign if “higher is better”).
Aggregate score: Q = ( ∑_i w_i q_{m_i} ) / ( ∑_i w_i ); specify weights w_i and sources.
Decision thresholds (aligned with Ch. 7/8 and Pipeline Ch. 12):
- Positive: all core gates pass (e.g., |ΔT_arr| + U(T_arr) ≤ τ_T, LB(r_phi) ≥ r_phi_min, p_dim = 1.0, P95(ε_flux) ≤ guard) and Q ≥ Q_base + δQ_min.
- Negative/Restricted: otherwise tag [Restricted] and publish qualitative plots & diagnostics only.

VI. Normative Path Forms

Arrival time (two equivalent):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
T_arr = ( ∫ ( n_eff / c_ref ) d ell )
Phase accumulation:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell )

Record delta_form ∈ {general, factored} on the data side; arrays len(gamma_ell)=len(d_ell)=len(n_eff)≥2.

VII. Machine-Readable Configs
A. bench_plan.yaml

version: "1.0.0"

tasks:

- id: "bench-arrival"

split: "test"

metrics: ["DeltaT_arr_s","Q_res","p_dim"]

coverage: { mode: "k", k: 2 }

- id: "bench-phase"

split: "test"

metrics: ["r_phi","epsilon_flux"]

coverage: { mode: "quantile", p: [0.025,0.975] }

baseline:

id: "base-001"

version: "1.2.3"

weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }

B. scorecard.json (example)

{

"version": "1.0.0",

"baseline": { "id": "base-001", "Q": 0.62 },

"method": { "id": "ds-core", "Q": 0.78 },

"weights": { "DeltaT_arr_s": 0.35, "r_phi": 0.25, "epsilon_flux": 0.15, "p_dim": 0.15, "Q_res": 0.10 },

"metrics": {

"DeltaT_arr_s": { "mean": -2.3e-9, "std": 4.8e-9, "U_k2": 1.5e-9 },

"r_phi": { "value": 0.72, "lb95": 0.61, "ub95": 0.80 },

"epsilon_flux": { "median": 0.004, "p95": 0.011 },

"p_dim": 1.0,

"Q_res": 0.13

"decision": "pass",

"see": ["EFT.WP.Core.Equations v1.1:S20-1","Data.Benchmarks v1.0:PROTO"]

}

C. kpi_summary.csv (headers)

split,DeltaT_arr_s_mean,DeltaT_arr_s_Uk2,r_phi_lb95,r_phi_ub95,epsilon_flux_p95,p_dim,Q_res

test,-2.3e-9,1.5e-9,0.61,0.80,0.011,1.0,0.13

VIII. Gate Mapping

G1 Schema completeness: fields for visualizations & scoring present.
G2 Citation compliance: figure/table anchors with coverage ≥ 90%.
G3 Path conventions: path arrays used in plots and scoring complete; step compliant.
G4 Dimensional closure: check_dim_report.json passed.
G6 Coverage: same mode as data (k/alpha/quantile).
G7 Covariance consistency: scoring assumptions align with Error Budget; Σ PD.
G8 Uniqueness: artifacts carry checksum & signature; versions match manifests.

IX. Anti-Patterns & Fixes

Anti: reporting means without intervals → Fix: add U = k·u_c or quantile bands with convergence diagnostics.
Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: parenthesize to normative form.
Anti: undisclosed weights/mappings → Fix: declare w_i and coverage mode in bench_plan.yaml/scorecard.json.
Anti: path plots without delta_form/Δell → Fix: complete captions and align with n_eff.

X. Release & Layout

DS_EXPORT/

figs/

scale_dist.pdf

missing_heatmap.svg

sync_health.pdf

path_profile.pdf

scorecard_bar.pdf

tables/

kpi_summary.csv

scorecard.csv

reports/

check_dim_report.json

validate_report.json

audit.jsonl

manifests/

report_manifest.yaml

SIGNATURE.asc

XI. Cross-References

Structure & Schema: Ch. 4; Splits/Versioning/Freshness: Ch. 6; Gates: Ch. 7; Uncertainty & Covariance: Ch. 8.
Pipeline Card: outputs & release (Ch. 12), gates & monitoring (Ch. 9).
Error Budget Card: scoring conventions & thresholds (Ch. 8/Ch. 9).

XII. Checklist

Dual exports for figures with axis units and see[]/version in captions; path plots include Δell and delta_form.
bench_plan.yaml consistent with scorecard.csv/json; weights, intervals, and gate comparisons clear.
Data splits & versions for scoring/benchmarks explicit; coverage.mode consistent with data.
check_dim_report.json/validate_report.json/audit.jsonl/report_manifest.yaml and signatures complete.
/validate passed with no S1–S5; for Restricted mode, outputs tagged [Restricted] with qualitative statements only.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05