Home / Docs-Technical WhitePaper / 43-EFT.WP.Data.DatasetCards v1.0
I. Chapter Purpose & Applicability
with semantics, types, constraints, and examples; provide drop-in Schema fragments and validation points. Naming uses snake_case; clause-level citations use “Volume+Version+Anchor”. required fieldsFix theII. Master Table (Required Fields)
Key | Type | Constraint/Regex | Semantic Definition | Cross-Ref/Anchor |
|---|---|---|---|---|
dataset_id | string | ^[a-z0-9_\\-.]+$ | Unique dataset identifier (root key for public release & lineage) | File org & release: Core.DataSpec v1.0 Ch.1–3. |
title | string | length ≥ 3 | Human-readable title | — |
version | string | ^v\\d+\\.\\d+(\\.\\d+)?$ | Semantic version (publicly cite stable minor line) | Version carrying: Citation Spec v0.1. |
summary | string | 100–300 words | Purpose, coverage, and limits | — |
modality | string[] | enum | `radio | optical |
sources | string[] | URL/identifier | Upstream sources or dataset_id@version | Release posture: Core.DataSpec v1.0. |
license | string | enum | License label (SPDX-compatible) | Public posture: Core.DataSpec v1.0. |
access | string | `open | restricted | closed` |
provenance | object | schema | Collection setup, spatiotemporal coverage, source chain | Aligned to Methods/Cleaning. |
splits | object | required: train,validation,test | Split definitions and ratios | Exports must include hashes: Core.DataSpec v1.0. |
checksums | object | sha256 | Package & shard integrity | Export policy: Core.DataSpec v1.0. |
metrology | object | schema | Units/dimensions baseline; enable check_dim | Dimensional check: Core.Metrology v1.0. |
quality | object | schema | Quality gates & coverage metrics | Quality/Baselines: Data.Benchmarks. |
export_manifest | object | schema | Export manifest with version, references[], artifacts | Machine-readable citations: Citation Spec v0.1. |
III. Field Definitions & Examples
1) dataset_id
- Semantics: Stable primary key across versions; recommend org prefix (e.g., org.project.dataset).
- Constraint: ^[a-z0-9_\\-.]+$; case sensitive: No.
- Example: eift.obs.frb_catalog, labx_radio.arraysim.v1.
- Ref: Bound to export structure, see Core.DataSpec v1.0.
2) version
- Semantics: Release version; public materials cite only stable minor line v1.*.
- Constraint: ^v\\d+\\.\\d+(\\.\\d+)?$.
- Example: v1.0, v1.2.3.
- Ref: Version carrying is mandatory, Citation Spec v0.1.
3) modality
- Semantics: Observation/data modality classification.
- Constraint: Enum; multi-valued allowed.
- Example: ["radio","time_series"].
- Ref: Terms per Core.Terms v1.0.
4) sources
- Semantics: Upstream origins (URL, DOI, or dataset_id@version).
- Constraint: At least one resolvable entry.
- Example: ["doi:10.1234/abcd","eift.surveys.sky@v1.1"].
- Ref: File org & public posture, Core.DataSpec v1.0.
5) provenance
- Semantics: Collection method, instruments/stations, temporal/spatial coverage, selection bias.
- Structure:
- provenance:
- collection_method: "beamformed-array"
- instruments: [{name:"LOFAR", station:"DE601"}]
- time_coverage: "2019-01-01..2024-12-31"
- spatial_coverage: "RA/Dec ranges or tiles"
- selection_bias: "flux-limited, SNR>7"
- Ref: Align with Methods.Cleaning/Repro.
6) splits
- Semantics: Train/validation/test partitioning.
- Structure & Constraints:
- splits:
- train: {count: 12000, ratio: 0.8}
- validation: {count: 1500, ratio: 0.1}
- test: {count: 1500, ratio: 0.1}
Validity: ratios sum to 1±1e-6; count is non-negative integer.
- Ref: Export & hash checks, Core.DataSpec v1.0.
7) checksums
- Semantics: Artifact integrity.
- Structure:
- checksums:
- package: {sha256: "…"}
- shards:
- - {path: "train-000.tgz", sha256: "…"}
- - {path: "train-001.tgz", sha256: "…"}
- Ref: Delivered with export manifest.
8) metrology
- Semantics: Unit system and dimensional consistency.
- Structure:
- metrology:
- units: "SI"
- c_ref: 299792458 # m/s
- check_dim: true
- Rules: No Chinese in formulas; symbols wrapped in backticks; any division/integral/composite operator must use parentheses and explicitly declare gamma(ell) and d ell.
9) quality
- Semantics: Quality gates (pass criteria) and coverage metrics.
- Structure:
- quality:
- gates:
- - {name:"label_consistency", threshold: 0.98}
- - {name:"snr_min", threshold: 7.0}
- coverage:
- samples: 15000
- classes: {"FRB": 520, "RFI": 2100, "Noise": 12380}
- Ref: Aligned with baselines/evaluation volumes.
10) export_manifest
- Semantics: Exported artifacts & reference list (audit trail).
- Minimal Fragment:
- export_manifest:
- version: "v1.0"
- artifacts:
- - path: "datasets/foo/train-000.tgz"
- sha256: "…"
- references:
- - "EFT.WP.Core.DataSpec v1.0:EXPORT"
- - "EFT.WP.Core.Equations v1.1:S20-1"
- - "EFT.WP.Core.Metrology v1.0:check_dim"
- Rule: version and references[] are mandatory; references[] uses "Volume vX.Y:Anchor"; no shortcodes/aliases.
IV. Mandatory Registration for Path/Arrival-Time Data (if applicable)
- If data contains path-dependent quantities (e.g., T_arr), register at minimum:
- path_dependence:
- applies_to: ["T_arr"]
- delta_form: "const-factor" # or "general"
- path: "gamma(ell)"
- measure: "d ell"
- see:
- - "EFT.WP.Core.Equations v1.1:S20-1"
- - "EFT.WP.Core.Metrology v1.0:check_dim"
- Two coexisting forms of arrival time:
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- T_arr = ( ∫ ( n_eff / c_ref ) d ell )
Record delta_form, path, measure, and pass dimensional checks.
V. Machine-Readable Schema (Excerpt, Normative)
# I15-1 Dataset Card Schema (required subset)
type: object
required: [dataset_id, title, version, summary, modality, sources, license, access, provenance, splits, checksums, metrology, quality, export_manifest]
properties:
dataset_id: {type: string, pattern: "^[a-z0-9_\\-\\.]+$"}
title: {type: string, minLength: 3}
version: {type: string, pattern: "^v\\d+\\.\\d+(\\.\\d+)?$"}
summary: {type: string, minLength: 100, maxLength: 600}
modality: {type: array, items: {type: string, enum: [radio,optical,image,time_series,text,tabular]}, minItems: 1}
sources: {type: array, items: {type: string}, minItems: 1}
license: {type: string}
access: {type: string, enum: [open,restricted,closed]}
provenance:
type: object
required: [collection_method, time_coverage]
properties:
collection_method: {type: string}
instruments: {type: array, items: {type: object}}
time_coverage: {type: string}
spatial_coverage: {type: string}
selection_bias: {type: string}
splits:
type: object
required: [train, validation, test]
properties:
train: {type: object, required: [count, ratio]}
validation: {type: object, required: [count, ratio]}
test: {type: object, required: [count, ratio]}
checksums:
type: object
properties:
package: {type: object, properties: {sha256: {type: string}}}
shards: {type: array, items: {type: object, properties: {path:{type:string}, sha256:{type:string}}}}
metrology:
type: object
required: [units, c_ref, check_dim]
properties:
units: {type: string, const: "SI"}
c_ref: {type: number}
check_dim: {type: boolean, const: true}
quality:
type: object
properties:
gates: {type: array}
coverage: {type: object}
export_manifest:
type: object
required: [version, artifacts, references]
properties:
version: {type: string}
artifacts: {type: array, items: {type: object}}
references: {type: array, items: {type: string, pattern: "^[^:]+ v\\d+\\.\\d+:[A-Z].+$"}}
see:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Equations v1.1:S20-1"
- "EFT.WP.Core.Metrology v1.0:check_dim"
(Citation/anchor format, unit & dimension checks, “no Chinese in math”, and inline-symbol rules follow Comprehensive Template v0.1 and Citation Spec v0.1.)
VI. Name-Conflict Rules & Prohibitions (applies to all fields)
and declare gamma(ell) and d ell. must use parentheses in formulas/symbols/definitions; any division/integral/composite operator no Chinese; strictly distinguished; n (number density) vs. n_eff (effective refractive index) must not be mixedT_fil (tension) vs. T_trans (transmission coefficient)VII. Chapter Compliance Checklist
- All required keys in Section II exist and pass type/regex checks; reserved names are not repurposed.
- export_manifest contains version and references[]; references[]/see[] use "Volume vX.Y:Anchor"; no shortcodes/aliases.
- Any T_arr data registers delta_form, path, measure, and passes check_dim.
- Math complies with backticks/parentheses and no-Chinese rules; T_fil/T_trans and n/n_eff never conflated.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/