Home / Docs-Technical WhitePaper / 56-Report-Level Methods Appendix Template v1.0
Chapter 8 Evaluation Protocol & Metrics
I. Chapter Goals & Scope (Mandatory)
- Fix a unified caliber for evaluation protocol—metric definitions—gates—statistics—observation windows—pass lines, ensuring results are comparable, replayable, auditable.
- Aligned with Ch.5 (Data & Experimental Design), Ch.6 (Math & Pseudocode), Ch.7 (Metrology & Calibration), and Ch.11 (Implementation Binding).
II. Protocol Structure & General Requirements (Mandatory)
- Minimal elements: target under test, data partitions, metric list, gate thresholds, statistical method & intervals, script locators (script@commit), exported results & hashes.
- Observation window: ISO8601 with UTC; rolling evaluations must specify step/overlap.
- Repeatability: fixed random seeds; cross-environment reruns consistent (container image and locked dependencies).
III. Metric Definitions & Direction (Mandatory)
- Naming: metric_name; direction uses arrows — ↑ higher-is-better, ↓ lower-is-better, ≈ closer-is-better.
- Common metrics (examples):
- gate_accuracy (↑) — vs analytic or fine-grid baseline;
- gate_latency (↓) — end-to-end latency for batch/stream;
- compat_rate (↑) — replay/production compatibility pass rate;
- error_rate (↓) — error/exception ratio;
- cal_residuals (↓) — calibration residuals (see Ch.7).
- If arrival-time criteria are involved, restate the caliber in the same paragraph:
- Factored: T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- General: T_arr = ( ∫ ( n_eff / c_ref ) d ell )
and explicitly declare path gamma(ell) and measure d ell, with check_dim=true.
IV. Gates & Pass Lines (Mandatory)
- Naming rule: gate_<metric><comparator><threshold>@<window>, e.g., gate_accuracy>=0.99@7d, gate_latency<=2h@7d, compat_rate>=0.995@replay.
- Layers: hard gates (failure blocks) and soft gates (may pass with signed clauses per Decision & Sign-off).
- Statistical caliber: specify method (e.g., bootstrap), CI_95%, sample size, sampling strategy; record raw and aggregated measures.
V. Execution & Recording (Mandatory)
- Inputs to record: versioned data/model/config, run parameters, random seeds, numeric accuracy params such as Δell/ε_int, environment hashes.
- Exports: yaml/json/pdf artifacts containing raw result summaries + gate decisions + confidence intervals + script locators + artifact hashes.
- Baseline policy: name Option Base; tabulate per-metric and overall comparisons vs candidates.
VI. Benchmarks & Controls (Mandatory)
- Benchmark sources: analytic solutions, ultra-fine-grid computations (smaller Δell), authoritative public benchmarks; state applicability and max deviation.
- Control dimensions: performance/cost/time/risk/dependency/reproducibility/compatibility; default weights and discount policy per Ch.5 & Ch.6.
VII. Evaluation Checklist Template (copy-ready)
Dimension | Metric | Dir. | Gate | Window | Stats/CI | Script | Notes |
|---|---|---|---|---|---|---|---|
Performance | gate_accuracy | ↑ | >=0.99 | @7d | bootstrap/95% | eval.py@a1b2c3 | Analytic or fine-grid baseline |
Latency | gate_latency | ↓ | <=2h | @7d | quantiles/mean | runner.py@9f8e7d | Batch |
Compatibility | compat_rate | ↑ | >=0.995 | @replay | binomial/95% | replay.sh@d4e5f6 | Prod replay |
Metrology | cal_residuals | ↓ | <=3σ | @validation | normal assumption | calib.py@c0ffee | See Ch.7 |
Errors | error_rate | ↓ | <=1e-3 | @24h | Poisson/Binomial | monitor@abcd12 | Online |
VIII. Machine Structure (YAML; JSON-equivalent, Mandatory)
evaluation:
window: { start: "2025-09-01T00:00:00Z", end: "2025-09-07T23:59:59Z", timezone: "UTC" }
metrics:
- { name: "gate_accuracy", direction: "↑", desc: "vs. analytic or fine-grid baseline" }
- { name: "gate_latency", direction: "↓", desc: "E2E latency for batch/stream" }
- { name: "compat_rate", direction: "↑", desc: "replay/prod compatibility pass rate" }
- { name: "cal_residuals", direction: "↓", desc: "calibration residuals" }
- { name: "error_rate", direction: "↓", desc: "operational error rate" }
gates:
hard: ["gate_accuracy>=0.99@7d","compat_rate>=0.995@replay","gate_latency<=2h@7d"]
soft: ["unit_cost<=1.0x@30d"]
stats:
method: "bootstrap"
ci: "95%"
samples: 1000
baseline:
name: "Option Base"
reference:
type: "analytic|fine_grid"
script: "fine_grid.py@bead55"
artifacts: ["yaml","json","pdf"]
scripts:
eval: "eval.py@a1b2c3"
runner: "runner.py@9f8e7d"
replay: "replay.sh@d4e5f6"
arrival_time:
caliber:
forms:
- { name: "general", expr: "( ∫ ( n_eff / c_ref ) d ell )" }
- { name: "factored", expr: "( 1 / c_ref ) * ( ∫ n_eff d ell )" }
path: "gamma(ell)"
measure: "d ell"
check_dim: true
IX. Human × Machine Mapping (Mandatory)
Human Section | Machine Field | Validation Focus |
|---|---|---|
Protocol elements | evaluation.window, evaluation.scripts.* | Window/scripts/environment explicit |
Metrics & direction | evaluation.metrics[] | ↑/↓/≈ consistent, clear descriptions |
Gates & thresholds | evaluation.gates.hard/soft | Naming/comparator/window unified |
Statistics & intervals | evaluation.stats.* | Method/CI/sample size complete |
Benchmark & controls | evaluation.baseline.* | Benchmark type and script locator |
Arrival-time caliber | arrival_time.caliber.* | Two forms + path/measure + check_dim |
X. Validation Rules (regex/consistency, Mandatory)
- Gate expression: ^gate_[a-z0-9_]+(>=|<=|==)[^@\\s]+@[^\\s]+$; compat_rate>=0.995@replay accepted equivalently.
- Time window: start ≤ end and timezone="UTC".
- Metric direction: direction ∈ {↑, ↓, ≈}.
- Arrival-time: if evaluation involves T_arr, arrival_time.caliber.path/measure must exist and check_dim=true.
XI. Citation & Cross-Reference Style (Mandatory)
; all EFT.WP.* citations must include explicit version and anchor, with a machine-readable list in references.see[].“See 《 vX.Y》 Ch.x S/P/M/I…”Fixed format:Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/