Home / Docs-Technical WhitePaper / 52-Dataset Card Template v1.0
Chapter 4 — Structure & Schema (Fields / Units / Dimensions)
I. Purpose & Scope
- Standardize authoring, validation, and release conventions for dataset structure and data contracts (Schema/Contract), ensuring fields, types, units, and dimensions are traceable and auditable.
- For path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, and the data side records delta_form ∈ {general, factored}; publication requires p_dim = 1.0 with check_dim_report.json attached.
II. Inputs & Dependencies
- Contract baseline: schema.json and contract.yaml; aligned with TARR (data spec).
- Citations & versions: use “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.
- Metrology & parameters: align covariance/coverage with Error Budget; version/freshness policies with Parameter Card.
III. Field Table (minimal template — fields/units/dimensions)
field | type | unit | dim | domain/shape | nullable | description | see |
|---|---|---|---|---|---|---|---|
record_id | string | 1 | 1 | ULID/UUIDv4 | no | primary key | — |
acq.ts_start/ts_end | string | 1 | 1 | ISO-8601 | no | acquisition time | — |
instrument.id/mode | string | 1 | 1 | enum | no | instrument/mode | Metrology.* |
path.gamma_ell | array | m | L | N≥2 | no | path parameter | Core.DataSpec:TARR |
path.d_ell | array | m | L | N≥2 | no | path measure | ibid. |
medium.n_eff_profile | array | 1 | 1 | N≥2 | no | effective index | S20-1 |
ref.c_ref | number | m/s | L·T^-1 | (2.9e8,3.1e8) | no | reference limit | Terms P10-* |
ref.lambda_ref | number | m | L | >0 | opt. | reference wavelength | S21-2 |
obs.T_arr | number | s | T | — | opt. | arrival time | S20-1 |
obs.Phi | number | rad | 1 | — | opt. | phase | S21-2 |
quality.flags | array | 1 | 1 | — | yes | quality flags | — |
quality.score_Q | number | 1 | 1 | [0,1] | no | robust quality | — |
see/references/version | array/string | 1 | 1 | — | no | citations/version | — |
Mandatory: any expression with division/integrals/composites must use parentheses; arrays for path quantities meet len(gamma_ell)=len(d_ell)=len(n_eff)≥2.
IV. Normative Path Forms
- Arrival time (two equivalent):
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ); T_arr = ( ∫ ( n_eff / c_ref ) d ell ). - Phase accumulation:
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell ).
In text, explicitly show gamma(ell) and d ell; record delta_form (general|factored) on the data side.
V. Schema Rules (missing/enum/consistency)
- Missingness: numeric missing as null or omitted field; textual NaN/Inf forbidden; causes in quality.flags.
- Enums: declare instrument.mode, clock_state, etc.; encode mutex/dependency in constraints.
- Consistency: schemas (fields/units/dimensions) at both ends of edges must match or have explicit mapping; implicit unit conversion is forbidden.
VI. Units & Dimensions / Coverage
- SI/international symbols (m, s, rad, 1, m/s, 1/m, Pa, N, J, Hz).
- Before release run I70-dim_check, require p_dim = 1.0; attach check_dim_report.json.
- Coverage mode consistent between data and publication: coverage.mode ∈ {k, alpha, quantile}.
VII. Machine-Readable Contracts (excerpts)
A. schema.json
{
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Dataset v1.0.0 (structure)",
"type":"object",
"required":["record_id","acq","path","medium","ref","see","version"],
"properties":{
"record_id":{"type":"string"},
"acq":{"type":"object","required":["ts_start","ts_end"],
"properties":{"ts_start":{"type":"string","format":"date-time"},"ts_end":{"type":"string","format":"date-time"}}},
"instrument":{"type":"object","properties":{"id":{"type":"string"},"mode":{"type":"string"}}},
"path":{"type":"object","required":["gamma_ell","d_ell"],
"properties":{"gamma_ell":{"type":"array","items":{"type":"number"},"minItems":2},
"d_ell":{"type":"array","items":{"type":"number"},"minItems":2}}},
"medium":{"type":"object","required":["n_eff_profile"],
"properties":{"n_eff_profile":{"type":"array","items":{"type":"number"},"minItems":2}}},
"ref":{"type":"object","properties":{"c_ref":{"type":"number"},"lambda_ref":{"type":"number"}}},
"see":{"type":"array","items":{"type":"string"},"minItems":1},
"version":{"type":"string"}
}
}
B. contract.yaml
version: "1.0.0"
units:
T_arr: "s"
Phi: "rad"
c_ref: "m/s"
lambda_ref: "m"
path:
required: true
gamma: "gamma(ell)"
measure: "d ell"
delta_form: "general" # or "factored"
constraints:
enum:
clock_state: ["locked","holdover","free"]
mutex:
- of: ["locked","free"]
rule: "not_both"
missing:
numeric: "null"
reason_to: "quality.flags"
coverage:
mode: "k" # k | alpha | quantile
k: 2
VIII. Gates & Validation
- G1 | Schema completeness: fields/types/index/window aligned; contract/data aligned.
- G3 | Path conventions: gamma/measure/delta_form present; len(path)≥2; Δell ≤ ( c_ref / f_s ) / max(n_eff).
- G4 | Dimensional closure: pass I70-dim_check, p_dim = 1.0.
- G6 | Coverage: coverage.mode ∈ {k, alpha, quantile} aligned with publication.
- Stops (S1–S5): dimensional failure/path missing/citation non-compliance, etc., must reject and block release; tag [Restricted] when needed.
IX. Anti-Patterns & Fixes
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: T_arr = ( ∫ ( n_eff / c_ref ) d ell ).
- Anti: declaring gamma(ell) without d ell/delta_form → Fix: complete and align with n_eff.
- Anti: unit % as text → Fix: unit 1, note “percent” in comments.
- Anti: ingest schema differs from contract → Fix: sync schema.json/contract.yaml and backfill data.
X. Cross-References
- Source & lineage: Ch. 3; Splits & versioning: Ch. 6; Gates & integrity: Ch. 7; Uncertainty & covariance: Ch. 8.
- Pipeline Card: inbound contract (Ch. 4), stage control (Ch. 6).
- Error Budget Card: covariance & propagation (Ch. 5/6), intervals & conventions (Ch. 8).
XI. Checklist
- schema.json/contract.yaml complete & consistent; field table units & dimensions present.
- For path quantities, explicit gamma(ell)/d ell with delta_form recorded; len(path) ≥ 2, Δell compliant.
- Unified arrival/phase forms used; I70-dim_check passed, p_dim = 1.0.
- coverage.mode consistent with publication; see[]/references[]/version compliant with anchor coverage ≥ 90%.
- /validate passes G1/G3/G4/G6; anti-patterns fixed or appropriately tagged [Restricted].
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/