HomeDocs-Technical WhitePaper53-Model Card Template v1.0

Chapter 8 — Benchmarks & Comparative Scoring (Bench/Score)


I. Purpose & Scope


II. Prerequisites & Inputs


III. Bench Tasks & Comparability


IV. Leakage Prevention & Consistency


V. Metrics & Intervals

  1. Primary metrics (examples): AUC, ACC, MAE, RMSE, r_phi, ε_flux, Q_res, Latency_P95/Throughput (if perf-constrained).
  2. Interval rules:
    • k coverage: U = k·u_c;
    • alpha: use t_{ν,1−α/2} or normal approx;
    • quantile: e.g., [0.025, 0.975]; choose one mode across the volume.

VI. Scoring Mapping


VII. Gate Mapping & Decisions

  1. Align thresholds with Error Budget:
    • |ΔT_arr| + U(T_arr) ≤ τ_T;
    • LB(r_phi) ≥ r_phi_min;
    • P95(ε_flux) ≤ ε_flux_guard;
    • p_dim = 1.0, Σ PD.
  2. Release decision: core gates pass and Q ≥ Q_base + δQ_min → Pass; else Fail / [Restricted] (qualitative plots & diagnostics only).

VIII. Normative Path Forms

Explicitly show path & measure; record delta_form; all expressions parenthesized.


IX. Machine-Readable
A. bench_plan.yaml

version: "1.0.0"

tasks:

- id: "bench-arrival"

split: "test"

metrics: ["DeltaT_arr_s","Q_res","p_dim"]

coverage: { mode: "k", k: 2 }

- id: "bench-phase"

split: "test"

metrics: ["r_phi","epsilon_flux"]

coverage: { mode: "quantile", p: [0.025, 0.975] }

baseline: { id: "base-001", version: "1.2.3" }

weights: { DeltaT_arr_s: 0.35, r_phi: 0.25, epsilon_flux: 0.15, p_dim: 0.15, Q_res: 0.10 }

B. scorecard.json (example)

{

"version": "1.0.0",

"baseline": { "id": "base-001", "Q": 0.62 },

"method": { "id": "mdl-core", "Q": 0.78 },

"weights": { "DeltaT_arr_s": 0.35, "r_phi": 0.25, "epsilon_flux": 0.15, "p_dim": 0.15, "Q_res": 0.10 },

"metrics": {

"DeltaT_arr_s": { "mean": -2.3e-9, "Uk2": 1.5e-9 },

"r_phi": { "value": 0.72, "lb95": 0.61, "ub95": 0.80 },

"epsilon_flux": { "median": 0.004, "p95": 0.011 },

"p_dim": 1.0,

"Q_res": 0.13

},

"decision": "pass",

"see": ["EFT.WP.Core.Equations v1.1:S20-1","Error Budget Card v1.0:Ch.8"]

}


C. eval_report.md (outline)

# Evaluation Report

- Tasks, splits, seeds

- Metrics with intervals & convergence

- Score mapping, weights, final Q

- Gate comparison & decision


X. Anti-Patterns & Fixes


XI. Cross-References


XII. Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/