HomeDocs-Technical WhitePaper46-EFT.WP.Data.Benchmarks v1.0

Chapter 1 Overview & Scope


I. Chapter Purpose & Scope


II. Definitions & Terms

  1. Benchmark: a reproducible comparison of targets under given data and protocol.
  2. Suite: an organizational unit composed of tasks and subtasks, with shared protocol, aggregation, and governance rules.
  3. Task/Subtask: an evaluation unit specifying io_mode, input assumptions, constraints, and target metrics.
  4. Track: branches under a task with different resources/tools/openness (e.g., “closed-book/open-book”, “no-tools/tools-allowed”).
  5. Submission: an accepted evaluation run and its artifacts (with run_id, environment lock, and metric report).
  6. Artifact: a verifiable file/object (bound by sha256).
  7. Frozen splits: index‑level immutable sets S_train/S_val/S_test preventing leakage.
  8. Statistical significance: statistical decision on metric differences; report p, CI_95, and correction method.
  9. Path quantities (e.g., arrival time): if T_arr appears, use
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
      and declare gamma(ell) and d ell, with dimensional consistency checks.

III. Background & Motivation


IV. Design Principles (P01–P05)


V. In Scope & Out of Scope

  1. In scope: classification/regression/ranking/retrieval/generation/multimodal; offline batch, online A/B, streaming, and interactive evaluation.
  2. Out of scope: optimization of training recipes per se; non‑public protocols bound to sensitive commercial data; datasets that cannot be frozen at index level.
  3. Cross‑volume dependencies:
    • Data: see EFT.WP.Data.DatasetCards v1.0.
    • Models: see EFT.WP.Data.ModelCards v1.0.
    • Pipelines: see EFT.WP.Data.Pipeline v1.0.

VI. Deliverables & Release Gates

  1. Mandatory exports: benchmark.yaml/json, protocol.yaml, metrics.yaml, env.lock, splits/*.index, reports/*.jsonl, each with sha256.
  2. Gates:
    • Frozen splits and leakage guardrails enabled;
    • SI metrology and dimensional checks pass;
    • Significance and uncertainty reports included;
    • Privacy, residency, and third‑party processing registered.
  3. Leaderboard governance: stability line, shadow comparisons, submission cooldown, and arbitration process.

VII. Cross‑References & Dependencies


VIII. Machine‑Readable Overview (Normative)

suite:

id: "eift.benchmarks.core"

title: "EIFT Core Benchmarks"

version: "v1.0.0"

modalities: ["text","image","audio"]

risks: ["leakage","bias","spurious_correlation"]

tasks:

- id: "cls.binary"

io_mode: "offline"

tracks: ["closed-book"]

dataset_ref: "datasets/core_cls@v1.0"

sampling: {strategy:"stratified", strata:[{by:"label"}]}

splits:

train: {frozen:true, index:"splits/train.index", sha256:"<hex>"}

val: {frozen:true, index:"splits/val.index", sha256:"<hex>"}

test: {frozen:true, index:"splits/test.index", sha256:"<hex>"}

leakage_guard: ["per-object","per-scene"]

protocol:

seed: 1701

repeats: 5

temperature: 0.0

tools_allowed: false

runtime_limits: {timeout_s: 3600}

metrics:

- {name:"Acc", unit:"—", higher_is_better:true, agg:"macro"}

- {name:"ECE", unit:"—", higher_is_better:false}

aggregation:

levels: ["task","suite"]

weights: {task:"uniform"}

normalize: {scheme:"zscore", anchors:["baseline.logreg","baseline.rf"]}

significance:

method: "bootstrap"

B: 10000

alpha: 0.05

correction: "Holm-Bonferroni"

env:

hardware: {cpu:"16c", mem_gb:64, gpu:0}

os: "ubuntu-22.04"

containers: ["ghcr.io/eift/runner@sha256:<hex>"]

deps_lock: "env.lock"

baselines:

- {id:"baseline.logreg", impl:"I15-1.logreg", params:{C:1.0}}

- {id:"baseline.rf", impl:"I15-2.rf", params:{n_trees:200}}

export_manifest:

version: "v1.0"

artifacts:

- {path:"benchmark.yaml", sha256:"<hex>"}

- {path:"splits/train.index", sha256:"<hex>"}

- {path:"reports/summary.json", sha256:"<hex>"}

references:

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.DatasetCards v1.0:Ch.11"

- "EFT.WP.Data.ModelCards v1.0:Ch.11"


IX. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SUITE.ID_FORMAT

when: "$.suite.id"

assert: "matches('^[a-z0-9_.\\-]+$')"

level: error

- id: SPLITS.FROZEN_REQUIRED

when: "$..splits"

assert: "train.frozen == true and val.frozen == true and test.frozen == true"

level: error

- id: LEAKAGE.GUARDS

when: "$..leakage_guard"

assert: "contains_any(['per-object','per-timewindow','per-scene'])"

level: error

- id: METRICS.UNITS_SI

when: "$..metrics[*].unit"

assert: "all_units_in_SI(value) or value == '—'"

level: error

- id: PROTOCOL.SEED_AND_REPEATS

when: "$..protocol"

assert: "has_keys(seed, repeats)"

level: error

- id: SIGNIFICANCE.PARAMS

when: "$..significance"

assert: "has_keys(method, B, alpha)"

level: error

- id: EXPORT.REFERENCES_FORMAT

when: "$.export_manifest.references[*]"

assert: "matches('^[^:]+ v\\d+\\.\\d+:[A-Z].+$')"

level: error


X. Chapter Compliance Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/