43-EFT.WP.Data.DatasetCards v1.0 | Chapter 12 Quality & Baselines

Home ／ Docs-Technical WhitePaper (V6.0) ／ 43-EFT.WP.Data.DatasetCards v1.0

Chapter 12 Quality & Baselines

I. Chapter Purpose & Scope

. no ChineseFix quality gates (pass criteria), coverage metrics, and a unified posture for baseline tasks/metrics; define evaluation protocol, statistical significance, and reproducibility requirements; keep consistency with splits, labels/ontology, metrology, and uncertainty. Keys use snake_case; cross-volume citations follow “Volume+Version+Anchor”; math uses backticks and parentheses with

II. Terminology & Dependencies

Terminology source: EFT.WP.Core.Terms v1.0; this chapter only adds fields related to quality and baselines.
Dependent volumes: Data contract/export: Core.DataSpec v1.0; metrology/units & dimensional checks: Core.Metrology v1.0; splits & distribution: Chapter 11; labels & ontology: Chapter 8; uncertainty & error budget: Chapter 10; citation/version posture: Citation Spec v0.1.

III. Fields & Structure (Normative)

quality:

gates: # Quality gates (must all pass before release)

- {name: "label_consistency", threshold: 0.98, metric: "kappa"}

- {name: "leakage", threshold: 0.0, metric: "leakage_rate"}

- {name: "coverage_min", threshold: 0.99, metric: "split_coverage"}

- {name: "checksum_integrity", threshold: 1.0, metric: "sha256_ok_ratio"}

coverage: # Coverage & distribution monitoring

samples: 0 # replace with actual count at release

per_class: {} # {"FRB": 520, "RFI": 2100, ...}

per_region: {} # space/site/channel dimensions, etc.

ci_method: "bootstrap-bca"

target_ci: 0.95

baseline:

tasks: # Baseline task list (cls/retrieval/regression/detection…)

- {name:"cls_frb_vs_rfi", type:"classification", split:"test"}

metrics: # Metrics & definitions

- {name:"accuracy"}

- {name:"f1_macro"}

- {name:"roc_auc"}

- {name:"pr_auc"}

- {name:"ece"} # Expected Calibration Error

- {name:"brier"}

- {name:"rmse"} # regression/time-series

- {name:"map"} # detection/retrieval

eval_protocol: # Evaluation protocol

splits: "frozen" # must use frozen splits

seeds: [0,1,2,3,4]

repeats: 5

ci: {method:"bootstrap-bca", level:0.95}

significance: {test:"permutation", alpha:0.05}

fairness: {by:["class","region"], gap_metric:"abs_diff"}

robustness: {shift_tests:["snr_drop","time_jitter","spec_notch"]}

reports: # Deliverables & traceability

tables: ["quality/summary.csv","quality/per_class.csv"]

plots: ["quality/roc.png","quality/pr.png","quality/calibration.png"]

see:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

(Consistent with Chapter 11 frozen splits, Chapter 8 labels/ontology, and Chapters 9–10 metrology/uncertainty.)

IV. Quality Gates (Definitions)

Consistency: Label consistency via kappa or f1_agreement ≥ threshold; conflicting samples go to rework.
Leakage: Cross-splits object/time-window leakage rate = 0; sourced from Chapter 6–7 audits.
Coverage: Per-class/region/modality coverage ≥ targets; report with target_ci.
Integrity: Package and shard sha256 verification ratio = 1.0; cross-check with export manifest.

V. Coverage & Distribution Monitoring

Statistical posture: Report proportions, counts, and 95% CIs for per_class / per_region / per_modality; default Bootstrap-BCa.
Drift monitoring: Compare train/val/test via KL/JS/KS; if over threshold, flag “distribution shift” in the report.
Metrology consistency: All numeric units/dimensions validated via Chapter 9 metrology.

VI. Baseline Tasks & Metrics

Classification: accuracy, f1_macro, roc_auc, pr_auc; Calibration: ece, brier.
Retrieval/Detection: map, mAP@[IoU], top-k recall.
Regression/Time-series: rmse, mae, mape, nll.
Confidence & significance: Provide intervals with point estimates; for baseline comparisons, provide permutation or paired bootstrap p-values.
Physical consistency: Expressions like SNR = ( signal / noise ) must use backticks/parentheses and pass check_dim.

VII. Evaluation Protocol

Frozen splits: Only Chapter 11 frozen indices allowed; custom splits are forbidden.
Randomness: Fix seeds and repeats; report mean ± CI.
Robustness: Define synthetic shifts (snr_drop, time_jitter, spec_notch) and report relative degradation.
Fairness: For sensitive axes (e.g., class/region), report performance gaps via gap_metric; explain when exceeding thresholds.

VIII. Coupling with Uncertainty & Metrology

(Chapter 10), first normalize units/dimensions per Chapter 9, then combine and report. For path-dependent metrics like T_arr, register delta_form, path="gamma(ell)", measure="d ell", and pass check_dim. metrological uncertainty (resampling/bootstrap) and statistical uncertaintyWhen reporting

IX. Reporting & Traceability

Tables: Overall and stratified metrics; Plots: ROC, PR, calibration, and coverage waterfall.
Artifacts: Register all tables/plots under reports.tables/plots and list them in the export manifest with sha256.
Narrative posture: Explicit metric definitions, CI semantics (confidence/coverage), and significance test methods.

X. Machine-Readable Fragment (Drop-in)

quality:

gates:

- {name:"label_consistency", metric:"kappa", threshold:0.98}

- {name:"leakage", metric:"leakage_rate", threshold:0.0}

- {name:"coverage_min", metric:"split_coverage", threshold:0.99}

coverage:

samples: 15000

per_class: {"FRB":520, "RFI":2100, "Noise":12380}

ci_method: "bootstrap-bca"

target_ci: 0.95

baseline:

tasks:

- {name:"cls_frb_vs_rfi", type:"classification", split:"test"}

metrics: [{name:"f1_macro"}, {name:"roc_auc"}, {name:"ece"}, {name:"brier"}]

eval_protocol:

splits: "frozen"

seeds: [0,1,2,3,4]

repeats: 5

ci: {method:"bootstrap-bca", level:0.95}

significance: {test:"permutation", alpha:0.05}

robustness: {shift_tests:["snr_drop","time_jitter","spec_notch"]}

reports:

tables: ["quality/summary.csv","quality/per_class.csv"]

plots: ["quality/roc.png","quality/pr.png","quality/calibration.png"]

see:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

(Align with export_manifest.artifacts[]/references[].)

XI. Coupling with Export Manifest (Normative)

export_manifest:

artifacts:

- {path:"quality/summary.csv", sha256:"..."}

- {path:"quality/per_class.csv", sha256:"..."}

- {path:"quality/roc.png", sha256:"..."}

- {path:"quality/calibration.png", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

(Artifacts must be verifiable and carry anchors; no shortcodes/aliases.)

XII. Chapter Compliance Checklist

All gates set and passed: consistency, leakage, coverage, integrity.
Baseline tasks/metrics and evaluation protocol complete; splits frozen; randomness and significance methods fixed.
Metrics and numbers honor unit/dimension consistency; combine with Chapter 10 uncertainty when needed; for T_arr, path/measure registered and check_dim passed.
Report tables/plots listed in the export manifest with sha256; references carry “Volume+Version+Anchor.”

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05