46-EFT.WP.Data.Benchmarks v1.0 | Chapter 3 Suite Layering & Overview

Home ／ Docs-Technical WhitePaper (V6.0) ／ 46-EFT.WP.Data.Benchmarks v1.0

Chapter 3 Suite Layering & Overview

I. Chapter Purpose & Scope

, the coverage matrix and risk blind spots; provide machine-readable fields and validation posture; ensure consistency with Dataset/Model Cards/Pipeline, metrology, and citation anchors.suite → task → subtask → itemFix the benchmark hierarchy

II. Layering & Object Relationships (Normative)

Hierarchy:
- suite: overall definition, coverage, and governance;
- task: scenario & I/O mode, protocol & metrics;
- subtask: finer dimensions (modality/domain/locale/resource track);
- item: minimal evaluation unit (question/sample/clip/query).
Relationship constraints: suite.tasks[*].subtasks[*].items[*] is a directed containment chain; each task must have dataset_ref and splits; each subtask must declare a track or slice; each item must bind split ∈ {train,val,test}.
Coverage matrix: coverage_matrix[dimension][bucket] = count/ratio; dimensions include at least modality/locale/domain/difficulty.

III. Fields & Structure (Normative)

suite:

id: "eift.benchmarks.core"

title: "EIFT Core Benchmarks"

version: "v1.0.0"

modalities: ["text","image","audio"]

risks: ["leakage","bias","spurious_correlation"]

coverage_matrix:

modality: {"text": 12000, "image": 8000, "audio": 3000}

locale: {"en": 60, "zh": 20, "es": 20} # unit: %

domain: {"news": 40, "science": 30, "open": 30} # unit: %

tasks:

- id: "qa.extractive"

io_mode: "offline|stream|interactive"

dataset_ref: "datasets/qa_core@v1.0"

sampling: {strategy:"stratified", strata:[{by:"difficulty", buckets:{"easy":40,"med":40,"hard":20}}]}

splits:

train: {frozen:true, index:"splits/train.index", sha256:"<hex>"}

val: {frozen:true, index:"splits/val.index", sha256:"<hex>"}

test: {frozen:true, index:"splits/test.index", sha256:"<hex>"}

leakage_guard: ["per-object","per-scene"]

protocol:

seed: 1701

repeats: 5

tools_allowed: false

runtime_limits: {timeout_s: 3600}

metrics:

- {name:"F1_macro", unit:"—", higher_is_better:true}

- {name:"ECE", unit:"—", higher_is_better:false}

subtasks:

- id: "qa.extractive.zh"

track: "closed-book"

slice: {locale:["zh"]}

items_ref: "lists/qa_zh_test.index"

- id: "qa.extractive.en.open"

track: "open-book"

slice: {locale:["en"], retrieval:true}

items_ref: "lists/qa_en_open.index"

IV. Coverage & Risk Posture

Coverage: report counts and ratios for modality/locale/domain/difficulty; ratios use % (dimensionless).
Risks: risks[] must include at least leakage|bias|spurious_correlation; provide detection rules and thresholds for each (e.g., shift ψ<=0.2).
Frozen consistency: all coverage reports are computed on frozen S_val/S_test; training coverage is forbidden.

V. Protocol & Aggregation Mapping

Protocol mapping: task.protocol aligns with Model Cards Ch.11 (seed/repeats/tools/runtime_limits).
Aggregation mapping: suite-level summaries follow Chapter 8 aggregation rules macro|micro|weighted; the origin of cross-task weights w_i must be explicit (uniform/sample-share/expert).

VI. Metrology & Units (SI)

Metrics & resources use SI: QPS(1/s), T_inf(ms), ρ(—), size_bytes, net_mbps; metrology:{units:"SI", check_dim:true} is mandatory.
If tasks or features involve path quantity T_arr, record on the object: delta_form, path="gamma(ell)", measure="d ell", and use:
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
  with check_dim.

VII. Machine-Readable Fragment (Drop-in)

suite:

id: "eift.bench.core"

title: "EIFT Core"

version: "v1.0.0"

modalities: ["text","image"]

risks: ["leakage","bias"]

coverage_matrix:

modality: {"text": 9000, "image": 6000}

locale: {"en":70, "zh":30}

tasks:

- id: "cls.multiclass"

io_mode: "offline"

dataset_ref: "datasets/core_cls@v1.0"

sampling: {strategy:"stratified", strata:[{by:"label"}]}

splits:

train: {frozen:true, index:"splits/train.index", sha256:"..."}

val: {frozen:true, index:"splits/val.index", sha256:"..."}

test: {frozen:true, index:"splits/test.index", sha256:"..."}

leakage_guard: ["per-object"]

protocol: {seed:1701, repeats:5, tools_allowed:false, runtime_limits:{timeout_s:3600}}

metrics: [{name:"Acc", unit:"—", higher_is_better:true}, {name:"ECE", unit:"—", higher_is_better:false}]

subtasks:

- {id:"cls.multiclass.en", track:"closed-book", slice:{locale:["en"]}, items_ref:"lists/cls_en.index"}

- {id:"cls.multiclass.zh", track:"closed-book", slice:{locale:["zh"]}, items_ref:"lists/cls_zh.index"}

VIII. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SUITE.ID_FORMAT

when: "$.suite.id"

assert: "matches('^[a-z0-9_.\\-]+$')"

level: error

- id: SUITE.COVERAGE_DIM_REQUIRED

when: "$.suite.coverage_matrix"

assert: "has_keys(modality)"

level: error

- id: TASK.DATASET_AND_SPLITS

when: "$.tasks[*]"

assert: "has_key(dataset_ref) and has_key(splits) and splits.train.frozen and splits.val.frozen and splits.test.frozen"

level: error

- id: TASK.LEAKAGE_GUARD

when: "$.tasks[*].leakage_guard"

assert: "contains_any(['per-object','per-timewindow','per-scene'])"

level: error

- id: SUBTASK.TRACK_OR_SLICE

when: "$.tasks[*].subtasks[*]"

assert: "has_key(track) or has_key(slice)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

IX. Cross-References

Data splits & distribution: EFT.WP.Data.DatasetCards v1.0, Ch.11.
Evaluation protocol & metrics: EFT.WP.Data.ModelCards v1.0, Ch.11.
Coverage/monitoring metrology: EFT.WP.Data.Pipeline v1.0, Ch.12.
Unit & dimension checks: EFT.WP.Core.Metrology v1.0:check_dim.

X. Chapter Compliance Checklist

Suite/task/subtask hierarchy complete; dataset_ref/splits/leakage_guard and coverage matrix present.
protocol/metrics aligned with Model Cards; aggregation rules from Chapter 8 referenced and applied.
Frozen splits & leakage guardrails active; coverage computed on val/test.
SI metrology active with check_dim=true; if T_arr appears, delta_form/path/measure registered and validated.
export_manifest.references[] use “Volume vX.Y:Anchor”; the machine-readable fragment is drop-in and passes lint.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05