Home / Docs-Technical WhitePaper / 06-EFT.WP.Core.DataSpec v1.0
Chapter 2 — Schemas and the Field Dictionary
I. Type System and Fundamental Constraints
- Scalar types: int32, int64, float32, float64, decimal(p,s), bool, string, bytes.
- Time and identifiers: timestamp(UTC) (short ts), date, time, rid, uid, sid, tid, pid.
- Structured types: array<T>, vector<T,k>, matrix<T,m,n>, struct{...}, geom{Point|LineString} (must declare CRS).
- Physical quantities and metrology binding: every physical field binds via unit(field_i) and dim(field_i); unit="1" denotes dimensionless.
- Constraint primitives: nullable(field_i) ∈ {True, False}, default(field_i), range[min,max], regex, enum{...}, unique, foreign_key.
- Ordering and windows: time series must be ORDER BY ts non-decreasing; windows are [t0, t1) with Delta_t = t1 - t0.
- Path consistency: path data are parameterized by gamma(ell) with non-decreasing ell; L_gamma = ( ∫_gamma 1 d ell ) is used for interval checks.
II. Field-Entry Structure and Registration Workflow
- Minimal field-entry set:
name, desc, type, unit, dim, nullable, default, aliases, roles, see. - Entry naming rules: use snake_case, ASCII, lowercase; do not place units or dimensions in name—declare them only in unit and dim.
- Role annotation: roles ⊆ {key,time,geom,measure,quality,mask,meta,index}.
- Registration workflow (I60):
- Create entries with register_field(name, type, unit, dim, desc, aliases).
- Assemble schema via register_schema(name, version, fields, constraints, units, pk, idx, see).
- Validate persisted data with validate_dataset(schema, ds, strict=True).
- Publish and freeze with export_schema(SRef,"yaml").
III. Unit and Dimension Binding
- Binding requirements:
- Any field that participates in equations must specify unit(•) and dim(•) and pass check_dim( y - f(x; theta) ).
- Only affine unit conversions are allowed: v_to = a * v_from + b, with b ≠ 0 only for zero-offset pairs (e.g., degC ↔ K).
- Under the convention t0 = L0 / c_ref, the integrand ( n_eff / c_ref ) * d ell can be written as bar_n_eff * d bar_ell and treated as dimensionless, where bar_n_eff = n_eff and d bar_ell = d ell / L0.
- Arrival-time dual binding:
- Constant-factored: T_arr = ( 1 / c_ref ) * ( ∫_gamma n_eff d ell )。
- General form: T_arr = ( ∫_gamma ( n_eff / c_ref ) d ell )。
- Difference metric:
delta_form = | ( 1 / c_ref ) * ( ∫_gamma n_eff d ell ) - ( ∫_gamma ( n_eff / c_ref ) d ell ) |。
IV. Nullability, Defaults, and Missingness
- Missingness: denote with m ∈ {0,1} where m=0 means missing; never substitute with dummy values.
- Defaults: only for fields that are semantically derivable or do not affect dimensional closure; defaults must honor unit and dim.
- Imputation records: when imputing or applying environmental corrections, write derived fields as corr_env(x; RefCond) and record RefCond = { p_ref, Temp_ref, humidity_ref } and u(•) in the manifest.
V. Standard Field Dictionary (40 Examples)
- FD.core.uid — type=string, unit="1", dim="1", nullable=False, roles={key}, desc=universal id.
- FD.core.rid — type=string, unit="1", dim="1", nullable=False, roles={key,meta}, desc=record id.
- FD.core.sid — type=string, unit="1", dim="1", nullable=True, roles={key}, desc=site id.
- FD.core.tid — type=string, unit="1", dim="1", nullable=True, roles={key}, desc=trajectory id.
- FD.path.pid — type=string, unit="1", dim="1", nullable=False, roles={key}, desc=path id (= gamma id).
- FD.time.ts — type=timestamp(UTC), unit="s", dim="[T]", nullable=False, roles={time}, desc=event time.
- FD.time.ts_start — type=timestamp(UTC), unit="s", dim="[T]", nullable=True, roles={time}, desc=window start.
- FD.time.ts_end — type=timestamp(UTC), unit="s", dim="[T]", nullable=True, roles={time}, desc=window end.
- FD.time.delta_t — type=float64, unit="s", dim="[T]", nullable=True, roles={measure}, desc=window width.
- FD.time.fs — type=float64, unit="Hz", dim="[T]^-1", nullable=True, roles={measure}, desc=sample rate.
- FD.geo.lon — type=float64, unit="deg", dim="1", nullable=True, roles={geom}, desc=WGS84 longitude.
- FD.geo.lat — type=float64, unit="deg", dim="1", nullable=True, roles={geom}, desc=WGS84 latitude.
- FD.geo.alt — type=float64, unit="m", dim="[L]", nullable=True, roles={geom}, desc=altitude AMSL.
- FD.cart.x — type=float64, unit="m", dim="[L]", nullable=True, roles={geom}, desc=Cartesian x.
- FD.cart.y — type=float64, unit="m", dim="[L]", nullable=True, roles={geom}, desc=Cartesian y.
- FD.cart.z — type=float64, unit="m", dim="[L]", nullable=True, roles={geom}, desc=Cartesian z.
- FD.geo.crs — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=coordinate reference system.
- FD.path.ell — type=float64, unit="m", dim="[L]", nullable=False, roles={key,measure}, desc=path coordinate.
- FD.path.l_gamma — type=float64, unit="m", dim="[L]", nullable=True, roles={measure}, desc=L_gamma = ( ∫_gamma 1 d ell ).
- FD.optics.n_eff — type=float64, unit="1", dim="1", nullable=True, roles={measure}, desc=effective refractive index.
- FD.optics.c_ref — type=float64, unit="m s^-1", dim="[L][T]^-1", nullable=False, roles={measure,meta}, desc=reference speed.
- FD.arrival.t_arr — type=float64, unit="s", dim="[T]", nullable=True, roles={measure}, desc=arrival time.
- FD.arrival.delta_form — type=float64, unit="s", dim="[T]", nullable=True, roles={quality}, desc=formulation difference.
- FD.env.temp_ref — type=float64, unit="K", dim="[Temp]", nullable=True, roles={meta}, desc=reference temperature.
- FD.env.p_ref — type=float64, unit="Pa", dim="[M][L]^-1[T]^-2", nullable=True, roles={meta}, desc=reference pressure.
- FD.env.humidity_ref — type=float64, unit="1", dim="1", nullable=True, roles={meta}, desc=relative humidity.
- FD.meas.u_x — type=float64, unit="same_as(x)", dim="dim(x)", nullable=True, roles={quality}, desc=standard uncertainty of x.
- FD.meas.U_x — type=float64, unit="same_as(x)", dim="dim(x)", nullable=True, roles={quality}, desc=expanded uncertainty of x.
- FD.quality.m — type=int8, unit="1", dim="1", nullable=False, roles={mask}, desc=missingness mask ∈ {0,1}.
- FD.quality.q_score — type=float32, unit="1", dim="1", nullable=True, roles={quality}, desc=quality score ∈ [0,1].
- FD.quality.drift — type=float32, unit="1", dim="1", nullable=True, roles={quality}, desc=drift indicator.
- FD.trace.source — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=origin source.
- FD.trace.method — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=process method.
- FD.trace.artifact — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=produced artifact.
- FD.trace.checksum_sha256 — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=hash_sha256(blob).
- FD.trace.signature — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=signature.
- FD.stats.r — type=float64, unit="same_as(y)", dim="dim(y)", nullable=True, roles={measure}, desc=residual r = y - f(x; theta).
- FD.stats.r_bar — type=float64, unit="1", dim="1", nullable=True, roles={quality}, desc=normalized residual r_bar = r / sigma.
- FD.stats.w — type=float64, unit="1", dim="1", nullable=True, roles={quality}, desc=weight w.
- FD.stats.chi2 — type=float64, unit="1", dim="1", nullable=True, roles={quality}, desc=chi2 = r^T R r.
- FD.stats.R2 — type=float32, unit="1", dim="1", nullable=True, roles={quality}, desc=coefficient of determination.
- FD.stats.SNR_dB — type=float32, unit="dB", dim="1", nullable=True, roles={quality}, desc=signal-to-noise ratio.
- FD.labels.tag — type=string, unit="1", dim="1", nullable=True, roles={meta}, desc=free-form tag.
- FD.release.schema_version — type=string, unit="1", dim="1", nullable=False, roles={meta}, desc=semantic version.
- FD.release.fmt — type=string, unit="1", dim="1", nullable=False, roles={meta}, desc=serialization format.
VI. Composite Types and Structured Fields
- Structured vectors:
Expand tri-axial acceleration into scalars ax, ay, az (rather than persisting vector<float64,3>); each declares unit="m s^-2", dim="[L][T]^-2". - Matrices and Jacobians:
Flatten Jacobian entries, e.g., J_y_xi for ∂y/∂x_i; bind with unit(J_y_xi) = unit(y) / unit(x_i) and dim(J_y_xi) = dim(y) * dim(x_i)^-1. - Geometries:
Use geom{Point} for indexing/spatial queries but persist redundant lon,lat,alt for cross-system interop; CRS is mandatory.
VII. Contracts and Validation Mapping
Typical contracts (map to assert_contract):- Uniqueness: unique(pk), unique(pid, ell), unique(uid, ts).
- Monotonicity: non_decreasing(ts), non_decreasing(ell).
- Valid ranges: range(lon, -180, 180), range(lat, -90, 90), range(q_score, 0, 1).
- Dimensional closure: check_dim( y - f(x; theta) ).
- Arrival-time consistency: delta_form ≤ tol_Tarr.
- Missingness consistency: m ∈ {0,1} and m=0 ⇒ value is NULL.
- Explicit RefCond: fields involving corr_env(•; RefCond) require RefCond to be present.
VIII. Indexing and Ordering
- Required indexes:
- Time series: idx(ts) and composite idx(uid, ts).
- Path series: idx(pid, ell); optional spatial index idx(lon, lat).
- Order preservation: persisted/exported order follows the primary key or (pid, ell) / (uid, ts) to guarantee idempotent replay.
IX. Binding Essentials for Arrival-Time Datasets
- Required fields: pid, ell, n_eff (if available), c_ref, ts (if time-varying) plus geometries and CRS.
- Compute and verify:
- Compute T_arr under the chosen form;
- Compute delta_form and compare to threshold;
- Validate T_arr via check_dim against "[T]";
- Write RefCond and the path measure d ell into the manifest.
X. Schema Evolution and Compatibility
- Semantic versioning: major=breaking, minor=backward compatible, patch=docs/metadata fixes.
- Evolution rules:
- Field rename: keep the old name in aliases for at least one release cycle; supply a map via diff_datasets.
- New fields: must default nullable=True or supply a default.
- Field removal: major+1 and provide a compatibility layer in import_manifest.
XI. Field-Entry Template (Text)
name=<snake_case>; type=<dtype>; unit=<SI|1|compound>; dim=<dimstr>; nullable=<True|False>; default=<value|None>; roles={<...>}; aliases=[...]; desc=<one-line>; see=[<Sxx-?>,<I60-?>,<DS.*>]XII. Unit/Dimension Binding Examples (20 cases)
- ts -> unit="s" | dim="[T]"
- ell -> unit="m" | dim="[L]"
- lon -> unit="deg" | dim="1"
- lat -> unit="deg" | dim="1"
- alt -> unit="m" | dim="[L]"
- fs -> unit="Hz" | dim="[T]^-1"
- delta_t -> unit="s" | dim="[T]"
- n_eff -> unit="1" | dim="1"
- c_ref -> unit="m s^-1" | dim="[L][T]^-1"
- t_arr -> unit="s" | dim="[T]"
- l_gamma -> unit="m" | dim="[L]"
- q_score -> unit="1" | dim="1"
- m -> unit="1" | dim="1"
- temp_ref -> unit="K" | dim="[Temp]"
- p_ref -> unit="Pa" | dim="[M][L]^-1[T]^-2"
- r -> unit="same_as(y)" | dim="dim(y)"
- r_bar -> unit="1" | dim="1"
- chi2 -> unit="1" | dim="1"
- SNR_dB -> unit="dB" | dim="1"
- checksum_sha256 -> unit="1" | dim="1"
XIII. Interface Crosswalk (I60-*)
- register_field maps 1:1 to this chapter’s entry template;
- register_schema composes fields, pk, idx, constraints and freezes units;
- validate_dataset implements the contract set in §VII;
- bind_to_equations traces bindings via see=[Sxx-?, Pxx-?];
- enforce_arrival_time_convention generates and validates delta_form;
- export_schema, export_manifest produce auditable release artifacts.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/