Home / Docs-Technical WhitePaper / 46-EFT.WP.Data.Benchmarks v1.0
Chapter 3 Suite Layering & Overview
I. Chapter Purpose & Scope
, the coverage matrix and risk blind spots; provide machine-readable fields and validation posture; ensure consistency with Dataset/Model Cards/Pipeline, metrology, and citation anchors.suite → task → subtask → itemFix the benchmark hierarchyII. Layering & Object Relationships (Normative)
- Hierarchy:
- suite: overall definition, coverage, and governance;
- task: scenario & I/O mode, protocol & metrics;
- subtask: finer dimensions (modality/domain/locale/resource track);
- item: minimal evaluation unit (question/sample/clip/query).
- Relationship constraints: suite.tasks[*].subtasks[*].items[*] is a directed containment chain; each task must have dataset_ref and splits; each subtask must declare a track or slice; each item must bind split ∈ {train,val,test}.
- Coverage matrix: coverage_matrix[dimension][bucket] = count/ratio; dimensions include at least modality/locale/domain/difficulty.
III. Fields & Structure (Normative)
suite:
id: "eift.benchmarks.core"
title: "EIFT Core Benchmarks"
version: "v1.0.0"
modalities: ["text","image","audio"]
risks: ["leakage","bias","spurious_correlation"]
coverage_matrix:
modality: {"text": 12000, "image": 8000, "audio": 3000}
locale: {"en": 60, "zh": 20, "es": 20} # unit: %
domain: {"news": 40, "science": 30, "open": 30} # unit: %
tasks:
- id: "qa.extractive"
io_mode: "offline|stream|interactive"
dataset_ref: "datasets/qa_core@v1.0"
sampling: {strategy:"stratified", strata:[{by:"difficulty", buckets:{"easy":40,"med":40,"hard":20}}]}
splits:
train: {frozen:true, index:"splits/train.index", sha256:"<hex>"}
val: {frozen:true, index:"splits/val.index", sha256:"<hex>"}
test: {frozen:true, index:"splits/test.index", sha256:"<hex>"}
leakage_guard: ["per-object","per-scene"]
protocol:
seed: 1701
repeats: 5
tools_allowed: false
runtime_limits: {timeout_s: 3600}
metrics:
- {name:"F1_macro", unit:"—", higher_is_better:true}
- {name:"ECE", unit:"—", higher_is_better:false}
subtasks:
- id: "qa.extractive.zh"
track: "closed-book"
slice: {locale:["zh"]}
items_ref: "lists/qa_zh_test.index"
- id: "qa.extractive.en.open"
track: "open-book"
slice: {locale:["en"], retrieval:true}
items_ref: "lists/qa_en_open.index"
IV. Coverage & Risk Posture
- Coverage: report counts and ratios for modality/locale/domain/difficulty; ratios use % (dimensionless).
- Risks: risks[] must include at least leakage|bias|spurious_correlation; provide detection rules and thresholds for each (e.g., shift ψ<=0.2).
- Frozen consistency: all coverage reports are computed on frozen S_val/S_test; training coverage is forbidden.
V. Protocol & Aggregation Mapping
- Protocol mapping: task.protocol aligns with Model Cards Ch.11 (seed/repeats/tools/runtime_limits).
- Aggregation mapping: suite-level summaries follow Chapter 8 aggregation rules macro|micro|weighted; the origin of cross-task weights w_i must be explicit (uniform/sample-share/expert).
VI. Metrology & Units (SI)
- Metrics & resources use SI: QPS(1/s), T_inf(ms), ρ(—), size_bytes, net_mbps; metrology:{units:"SI", check_dim:true} is mandatory.
- If tasks or features involve path quantity T_arr, record on the object: delta_form, path="gamma(ell)", measure="d ell", and use:
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
with check_dim.
VII. Machine-Readable Fragment (Drop-in)
suite:
id: "eift.bench.core"
title: "EIFT Core"
version: "v1.0.0"
modalities: ["text","image"]
risks: ["leakage","bias"]
coverage_matrix:
modality: {"text": 9000, "image": 6000}
locale: {"en":70, "zh":30}
tasks:
- id: "cls.multiclass"
io_mode: "offline"
dataset_ref: "datasets/core_cls@v1.0"
sampling: {strategy:"stratified", strata:[{by:"label"}]}
splits:
train: {frozen:true, index:"splits/train.index", sha256:"..."}
val: {frozen:true, index:"splits/val.index", sha256:"..."}
test: {frozen:true, index:"splits/test.index", sha256:"..."}
leakage_guard: ["per-object"]
protocol: {seed:1701, repeats:5, tools_allowed:false, runtime_limits:{timeout_s:3600}}
metrics: [{name:"Acc", unit:"—", higher_is_better:true}, {name:"ECE", unit:"—", higher_is_better:false}]
subtasks:
- {id:"cls.multiclass.en", track:"closed-book", slice:{locale:["en"]}, items_ref:"lists/cls_en.index"}
- {id:"cls.multiclass.zh", track:"closed-book", slice:{locale:["zh"]}, items_ref:"lists/cls_zh.index"}
VIII. Lint Rules (Excerpt, Normative)
lint_rules:
- id: SUITE.ID_FORMAT
when: "$.suite.id"
assert: "matches('^[a-z0-9_.\\-]+$')"
level: error
- id: SUITE.COVERAGE_DIM_REQUIRED
when: "$.suite.coverage_matrix"
assert: "has_keys(modality)"
level: error
- id: TASK.DATASET_AND_SPLITS
when: "$.tasks[*]"
assert: "has_key(dataset_ref) and has_key(splits) and splits.train.frozen and splits.val.frozen and splits.test.frozen"
level: error
- id: TASK.LEAKAGE_GUARD
when: "$.tasks[*].leakage_guard"
assert: "contains_any(['per-object','per-timewindow','per-scene'])"
level: error
- id: SUBTASK.TRACK_OR_SLICE
when: "$.tasks[*].subtasks[*]"
assert: "has_key(track) or has_key(slice)"
level: error
- id: METROLOGY.SI_AND_CHECKDIM
when: "$.metrology"
assert: "units == 'SI' and check_dim == true"
level: error
IX. Cross-References
- Data splits & distribution: EFT.WP.Data.DatasetCards v1.0, Ch.11.
- Evaluation protocol & metrics: EFT.WP.Data.ModelCards v1.0, Ch.11.
- Coverage/monitoring metrology: EFT.WP.Data.Pipeline v1.0, Ch.12.
- Unit & dimension checks: EFT.WP.Core.Metrology v1.0:check_dim.
X. Chapter Compliance Checklist
- Suite/task/subtask hierarchy complete; dataset_ref/splits/leakage_guard and coverage matrix present.
- protocol/metrics aligned with Model Cards; aggregation rules from Chapter 8 referenced and applied.
- Frozen splits & leakage guardrails active; coverage computed on val/test.
- SI metrology active with check_dim=true; if T_arr appears, delta_form/path/measure registered and validated.
- export_manifest.references[] use “Volume vX.Y:Anchor”; the machine-readable fragment is drop-in and passes lint.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/