HomeDocs-Technical WhitePaper46-EFT.WP.Data.Benchmarks v1.0

Chapter 3 Suite Layering & Overview


I. Chapter Purpose & Scope

, the coverage matrix and risk blind spots; provide machine-readable fields and validation posture; ensure consistency with Dataset/Model Cards/Pipeline, metrology, and citation anchors.suite → task → subtask → itemFix the benchmark hierarchy

II. Layering & Object Relationships (Normative)

  1. Hierarchy:
    • suite: overall definition, coverage, and governance;
    • task: scenario & I/O mode, protocol & metrics;
    • subtask: finer dimensions (modality/domain/locale/resource track);
    • item: minimal evaluation unit (question/sample/clip/query).
  2. Relationship constraints: suite.tasks[*].subtasks[*].items[*] is a directed containment chain; each task must have dataset_ref and splits; each subtask must declare a track or slice; each item must bind split ∈ {train,val,test}.
  3. Coverage matrix: coverage_matrix[dimension][bucket] = count/ratio; dimensions include at least modality/locale/domain/difficulty.

III. Fields & Structure (Normative)

suite:

id: "eift.benchmarks.core"

title: "EIFT Core Benchmarks"

version: "v1.0.0"

modalities: ["text","image","audio"]

risks: ["leakage","bias","spurious_correlation"]

coverage_matrix:

modality: {"text": 12000, "image": 8000, "audio": 3000}

locale: {"en": 60, "zh": 20, "es": 20} # unit: %

domain: {"news": 40, "science": 30, "open": 30} # unit: %

tasks:

- id: "qa.extractive"

io_mode: "offline|stream|interactive"

dataset_ref: "datasets/qa_core@v1.0"

sampling: {strategy:"stratified", strata:[{by:"difficulty", buckets:{"easy":40,"med":40,"hard":20}}]}

splits:

train: {frozen:true, index:"splits/train.index", sha256:"<hex>"}

val: {frozen:true, index:"splits/val.index", sha256:"<hex>"}

test: {frozen:true, index:"splits/test.index", sha256:"<hex>"}

leakage_guard: ["per-object","per-scene"]

protocol:

seed: 1701

repeats: 5

tools_allowed: false

runtime_limits: {timeout_s: 3600}

metrics:

- {name:"F1_macro", unit:"—", higher_is_better:true}

- {name:"ECE", unit:"—", higher_is_better:false}

subtasks:

- id: "qa.extractive.zh"

track: "closed-book"

slice: {locale:["zh"]}

items_ref: "lists/qa_zh_test.index"

- id: "qa.extractive.en.open"

track: "open-book"

slice: {locale:["en"], retrieval:true}

items_ref: "lists/qa_en_open.index"


IV. Coverage & Risk Posture


V. Protocol & Aggregation Mapping


VI. Metrology & Units (SI)

  1. Metrics & resources use SI: QPS(1/s), T_inf(ms), ρ(—), size_bytes, net_mbps; metrology:{units:"SI", check_dim:true} is mandatory.
  2. If tasks or features involve path quantity T_arr, record on the object: delta_form, path="gamma(ell)", measure="d ell", and use:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
      with check_dim.

VII. Machine-Readable Fragment (Drop-in)

suite:

id: "eift.bench.core"

title: "EIFT Core"

version: "v1.0.0"

modalities: ["text","image"]

risks: ["leakage","bias"]

coverage_matrix:

modality: {"text": 9000, "image": 6000}

locale: {"en":70, "zh":30}

tasks:

- id: "cls.multiclass"

io_mode: "offline"

dataset_ref: "datasets/core_cls@v1.0"

sampling: {strategy:"stratified", strata:[{by:"label"}]}

splits:

train: {frozen:true, index:"splits/train.index", sha256:"..."}

val: {frozen:true, index:"splits/val.index", sha256:"..."}

test: {frozen:true, index:"splits/test.index", sha256:"..."}

leakage_guard: ["per-object"]

protocol: {seed:1701, repeats:5, tools_allowed:false, runtime_limits:{timeout_s:3600}}

metrics: [{name:"Acc", unit:"—", higher_is_better:true}, {name:"ECE", unit:"—", higher_is_better:false}]

subtasks:

- {id:"cls.multiclass.en", track:"closed-book", slice:{locale:["en"]}, items_ref:"lists/cls_en.index"}

- {id:"cls.multiclass.zh", track:"closed-book", slice:{locale:["zh"]}, items_ref:"lists/cls_zh.index"}


VIII. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SUITE.ID_FORMAT

when: "$.suite.id"

assert: "matches('^[a-z0-9_.\\-]+$')"

level: error

- id: SUITE.COVERAGE_DIM_REQUIRED

when: "$.suite.coverage_matrix"

assert: "has_keys(modality)"

level: error

- id: TASK.DATASET_AND_SPLITS

when: "$.tasks[*]"

assert: "has_key(dataset_ref) and has_key(splits) and splits.train.frozen and splits.val.frozen and splits.test.frozen"

level: error

- id: TASK.LEAKAGE_GUARD

when: "$.tasks[*].leakage_guard"

assert: "contains_any(['per-object','per-timewindow','per-scene'])"

level: error

- id: SUBTASK.TRACK_OR_SLICE

when: "$.tasks[*].subtasks[*]"

assert: "has_key(track) or has_key(slice)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


IX. Cross-References


X. Chapter Compliance Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/