Home / Docs-Technical WhitePaper / 46-EFT.WP.Data.Benchmarks v1.0
Chapter 15 Machine-readable Schema & Lint
I. Chapter Purpose & Scope
.no Chinese and portal/CI auto-validation. Keys use snake_case; cross-volume citations use “Volume vX.Y:Anchor”; math uses backticks with parentheses and pre-release blocking for benchmark suites, covering structure/type/regex/dependencies/cross-volume citation anchors/dimensional checks/frozen splits & leakage guardrails/scoring–normalization–significance minima/compliance minima; used for Lint ruleset and normative JSON SchemaProvide theII. Normative Artifacts (Release-Critical)
artifacts:
- path: "schema/benchmark.schema.json"
- path: "schema/lint_rules.yaml"
- path: "schema/examples/minimal.yaml"
- path: "schema/examples/full.yaml"
These artifacts must be listed in export_manifest.artifacts[] with sha256; citation anchors match this volume’s posture.III. Normative JSON Schema (Core Excerpt)
The references[] regex enforces “Volume vX.Y:Anchor”; metrology.units="SI" and check_dim=true are mandatory.IV. Lint Rules (Normative)
version: "v1.0"
rules:
- id: STRUCT.REQUIRED
when: "$"
assert: "has_keys(suite,tasks,metrology,export_manifest)"
level: error
- id: SUITE.VERSION.SEMVER
when: "$.suite.version"
assert: "matches('^v\\d+\\.\\d+(\\.\\d+)?$')"
level: error
- id: SUITE.ID_FORMAT
when: "$.suite.id"
assert: "matches('^[a-z0-9_.\\-]+$')"
level: error
- id: TASK.REQUIRED_KEYS
when: "$.tasks[*]"
assert: "has_keys(id, io_mode, dataset_ref, splits, protocol, metrics, leakage_guard)"
level: error
- id: DATASET.REF_FORMAT
when: "$.tasks[*].dataset_ref"
assert: "matches('^datasets/[a-z0-9_\\-]+@v\\d+\\.\\d+$')"
level: error
- id: SPLITS.FROZEN_REQUIRED
when: "$.tasks[*].splits"
assert: "splits.train.frozen and splits.val.frozen and splits.test.frozen and splits.freeze_indices == true"
level: error
- id: SPLITS.RATIO_SUM
when: "$.tasks[*].splits.ratio"
assert: "abs(value.train + value.val + value.test - 1) <= 1e-6"
level: error
- id: LEAKAGE.GUARD_ALLOWED
when: "$.tasks[*].leakage_guard"
assert: "contains_any(['per-object','per-timewindow','per-scene'])"
level: error
- id: METRICS.FAMILY_UNIT
when: "$.tasks[*].metrics[*]"
assert: "has_keys(name, family, unit, higher_is_better)"
level: error
- id: METRICS.UNIT_SI_OR_DIMLESS
when: "$.tasks[*].metrics[*].unit"
assert: "all_units_in_SI(value) or value in ['—','%']"
level: error
- id: PROTOCOL.MODE_ALLOWED
when: "$.tasks[*].protocol.mode"
assert: "value in ['offline','online','stream','interactive']"
level: error
- id: SIG.PARAMS
when: "$.tasks[*].significance"
assert: "has_keys(method, alpha)"
level: error
- id: SCORE.AGG_LEVELS
when: "$.tasks[*].aggregation.levels"
assert: "contains_all(['task'])"
level: error
- id: SCORE.NORM_SCHEME
when: "$.scoring.normalization.scheme"
assert: "value in ['zscore','minmax','fixed-anchor']"
level: warn
- id: ROBUST.THRESHOLDS_MIN
when: "$.robustness.thresholds"
assert: "has_keys(drop_rel_max, acc_robust_min)"
level: warn
- id: FAIR.THRESHOLDS_MIN
when: "$.fairness_ethics.thresholds"
assert: "has_keys(fairness_warn, fairness_block)"
level: warn
- id: METROLOGY.SI_AND_CHECKDIM
when: "$.metrology"
assert: "units == 'SI' and check_dim == true"
level: error
- id: REFERENCES.FORMAT
when: "$.export_manifest.references[*]"
assert: "matches('^[^:]+ v\\d+\\.\\d+:[A-Z].+$')"
level: error
: STRUCT.REQUIRED, SUITE.VERSION.SEMVER, SUITE.ID_FORMAT, TASK.REQUIRED_KEYS, DATASET.REF_FORMAT, SPLITS.FROZEN_REQUIRED, SPLITS.RATIO_SUM, LEAKAGE.GUARD_ALLOWED, METRICS.FAMILY_UNIT, METRICS.UNIT_SI_OR_DIMLESS, PROTOCOL.MODE_ALLOWED, METROLOGY.SI_AND_CHECKDIM, REFERENCES.FORMAT.BlockingV. Failure Examples & Diagnostics (Excerpt)
fail_examples:
- case: "bad reference"
input: {export_manifest:{references:["Core.DataSpec:EXPORT"]}}
expect: {rule:"REFERENCES.FORMAT", level:"error",
fix:"Use 'EFT.WP.Core.DataSpec v1.0:EXPORT'"}
- case: "splits not frozen"
input: {tasks:[{id:"cls", io_mode:"offline", dataset_ref:"datasets/core@v1.0",
splits:{train:{frozen:false,index:"..."}, val:{frozen:true,index:"..."}, test:{frozen:true,index:"..."}, freeze_indices:false},
protocol:{mode:"offline"}, metrics:[], leakage_guard:["per-object"]}]}
expect: {rule:"SPLITS.FROZEN_REQUIRED", level:"error",
fix:"Set all splits to frozen and freeze_indices=true"}
- case: "metric without unit"
input: {tasks:[{id:"cls", io_mode:"offline", dataset_ref:"datasets/core@v1.0",
splits:{train:{frozen:true,index:"..."}, val:{frozen:true,index:"..."}, test:{frozen:true,index:"..."}, freeze_indices:true},
protocol:{mode:"offline"}, metrics:[{name:"F1_macro"}], leakage_guard:["per-object"]}]}
expect: {rule:"METRICS.FAMILY_UNIT", level:"error",
fix:"Provide family/unit/higher_is_better for each metric"}
Lint outputs must include rule/path/message/fix to enable one-click remediation.VI. Minimal Working Example (Validates)
suite:
id: "eift.bench.core"
title: "EIFT Core Benchmarks"
version: "v1.0"
modalities: ["text"]
tasks:
- id: "cls.binary"
io_mode: "offline"
dataset_ref: "datasets/core_cls@v1.0"
splits:
train: {frozen:true, index:"splits/train.index", sha256:"..."}
val: {frozen:true, index:"splits/val.index", sha256:"..."}
test: {frozen:true, index:"splits/test.index", sha256:"..."}
freeze_indices: true
ratio: {train:0.8, val:0.1, test:0.1}
leakage_guard: ["per-object"]
protocol: {mode:"offline", seed:1701, repeats:5}
metrics:
- {name:"F1_macro", family:"classification", unit:"—", higher_is_better:true, agg:"macro"}
aggregation: {levels:["task"], weights:{scheme:"uniform"}}
significance: {method:"bootstrap", alpha:0.05}
metrology: {units:"SI", check_dim:true}
export_manifest:
version: "v1.0"
artifacts: [{path:"benchmark.yaml", sha256:"..."}]
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
VII. Coupling with Export Manifest (Normative)
export_manifest:
artifacts:
- {path:"schema/benchmark.schema.json", sha256:"..."}
- {path:"schema/lint_rules.yaml", sha256:"..."}
- {path:"schema/examples/minimal.yaml", sha256:"..."}
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Data.ModelCards v1.0:Ch.11"
and must be listed and verifiable; references carry “Volume vX.Y:Anchor”.blockingSchema & Lint areVIII. Validation Interfaces (Ixx-?; Unified Return)
def validate_benchmark(spec: dict) -> dict: ...
def lint_benchmark(spec: dict, rules: dict) -> dict: ...
def check_units(spec: dict) -> dict: ... # uses Core.Metrology v1.0:check_dim
def verify_references(spec: dict) -> dict: ...# regex + anchor reachability
Return {"ok": bool, "errors":[...], "warnings":[...], "metrics":{...}} for portal/CI use.IX. Chapter Compliance Checklist
- benchmark.schema.json and lint_rules.yaml produced and registered in export_manifest with sha256.
- Schema enforces metrology.units="SI"&check_dim=true and the anchor regex in export_manifest.references[]; Lint blocks unfrozen splits, missing leakage guardrails, metrics without units, and invalid protocol/references.
- Each task includes dataset_ref/splits/protocol/metrics/leakage_guard, with ratios summing to 1±1e-6.
- Scoring/normalization/significance/fairness & robustness minima enabled; cross-volume citations resolvable.
- Minimal example validates once under Schema & Lint; validation interfaces integrated and returning the unified structure.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/