Home / Docs-Technical WhitePaper / 43-EFT.WP.Data.DatasetCards v1.0
I. Chapter Purpose & Audience
- Purpose: Establish the role of the Dataset Card within the EFT system, the minimum compliance requirements, usage boundaries, and cross-volume citation posture.
- Audience: Data providers, pipeline/platform engineers, metrology and quality owners, report authors, audit and reproducibility operators.
II. Terminology & Citation Posture
- Terminology Source: General terms follow EFT.WP.Core.Terms v1.0. This volume only adds incremental field names and constraints. Citations must carry volume name + version and preferably point to clause-level anchors P/S/M/I. Example: See "EFT.WP.Core.Equations v1.1" Ch.2 S20-1.
- Inline Symbols: Always wrap symbols in backticks, e.g., T_arr, c_ref, n_eff; any expression with division/integral/composite operators uses parentheses and explicitly declares path gamma(ell) and measure d ell.
- Two Forms of Arrival Time (cross-volume unified examples):
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- T_arr = ( ∫ ( n_eff / c_ref ) d ell )
Whenever T_arr appears, metadata must register delta_form, path="gamma(ell)", measure="d ell".
III. In Scope
- Objects: Metadata and documentation requirements for any dataset used in EFT research, engineering, or releases, including field sets, value constraints, examples, validation, and export.
- Process Coverage: Provenance & sampling, cleaning & preprocessing, labels & ontologies, metrology & units, uncertainty & error budget, splits & distribution, quality & baselines, privacy/ethics/compliance, release & versioning, machine-readable Schema & Lint, implementation binding & validation API, template & best practices — mapping to Ch.3–Ch.18.
- Dependency Citations: Data contracts and file organization follow EFT.WP.Core.DataSpec v1.0; units/dimensions and uncertainty follow EFT.WP.Core.Metrology v1.0; equations with path dependence follow EFT.WP.Core.Equations v1.1.
IV. Out of Scope
: Model training details and tuning, algorithm implementations and performance optimization, theory derivations unrelated to data, institutional procedures for cross-disciplinary ethics review. See Methods/ModelCards/Benchmarks/domain protocols for such content. ExcludesV. Deliverables & Compliance Gate
- Deliverables:
- dataset_card.yaml (or JSON) — compliant with this volume’s Schema and required fields;
- export_manifest — includes version and references[];
- Validation report (Lint, Schema checks, metrology checks, and uncertainty composition).
- Minimum Compliance (must pass before release):
- Required fields complete; types and regex constraints satisfied;
- Units/dimensions check check_dim passes;
- All T_arr-related fields recorded for delta_form, path, measure;
- References use the fixed “volume+version+anchor” style, no shortcodes or aliases.
VI. Document Structure & Cross-Volume Dependency Map
- Structure Map:
- Ch.3–Ch.5 define field inventory and layers (required vs. optional extensions);
- Ch.6–Ch.8 cover provenance/sampling, cleaning/preprocessing, labels/ontologies;
- Ch.9–Ch.10 implement metrology/units and uncertainty/error budget;
- Ch.11–Ch.12 cover splits/distribution and quality/baselines;
- Ch.13–Ch.14 cover compliance and versioning;
- Ch.15–Ch.16 provide machine-readable Schema, Lint, and validation API;
- Ch.17–Ch.18 provide examples and templates.
- Dependency Constraints:
- Terms always reference Core.Terms v1.0;
- Dimensions/units and uncertainty reference Core.Metrology v1.0;
- Path-dependent equations reference Core.Equations v1.1.
VII. Field Hierarchy & Naming Conventions
- Naming Style: Keys use snake_case; arrays are denoted by []; reserved names and enums are explicit in the Schema.
- Name-Conflict Enforcement: T_fil vs. T_trans must not be conflated; n vs. n_eff strictly distinguished; no Chinese in formulas/symbols/definitions.
VIII. Machine-Readable & Validation Interfaces (Overview)
- Schema & Lint: This volume provides a minimal dataset_card.schema.json and lint_rules.yaml covering required/type/regex/dependency rules.
- Implementation Binding Essentials:
- Field see uses "Volume vX.Y:Anchor" (e.g., "EFT.WP.Core.Equations v1.1:S20-1");
- export_manifest includes version and references[];
- Validation must support dimensional consistency (check_dim) and joint checks on path/measure fields.
IX. Quality, Reproducibility & Audit
- Quality Gates: Align with Ch.12’s quality and baseline requirements; provide thresholds and coverage metrics.
- Reproducibility: Provenance, cleaning steps, randomness control, environment and dependencies must be replayable.
- Audit Trail: Versioning, citation anchors, and metrology posture are traceable in the exported artifacts.
X. Usage & Maintenance
- Usage: Before release, complete the card and validations and publish alongside the dataset; external materials should cite only stabilized minor versions (e.g., v1.*).
- Maintenance: When field postures or dependency entries change, follow Ch.14’s versioning strategy to publish a new version, and reflect reference changes in the export manifest.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/