Home / Docs-Technical WhitePaper / 06-EFT.WP.Core.DataSpec v1.0
Appendix A — Schema Registry Model
I. Purpose and Scope
- Establish the object model, field catalog, and required constraints for the Schema Registry so that register_schema and export_schema behave consistently across volumes.
- Provide a complete registration example for DS.TARR.PathIntegral v1, covering units/dimensions, keys and indexes, contracts, provenance, and governance.
II. Object Model Overview
- SchemaRegistryRecord (SRef)
- Semantics: a publishable schema registration record.
- Relations: an SRef references a set of FieldSpec, ConstraintSpec, IndexSpec, GovernanceSpec, and PrivacySpec.
- Cross-volume handle: SRef.id is the cross-volume reference key; register_schema(...) -> SRef.
III. SRef Top-Level Fields (Required Items and Constraints)
- name : str (required)
Pattern: ^DS\.[A-Z0-9]+(\.[A-Za-z0-9]+)+$, e.g., DS.TARR.PathIntegral. - version : str (required)
Semantic version MAJOR.MINOR[.PATCH], e.g., 1.0. - title : str (required)
Human-readable title, e.g., Arrival-time along path integrals. - description : str (required)
Purpose, sources, and outputs of the dataset. - fields : list[FieldSpec] (required, len ≥ 1)
Field names must be unique; name pattern ^[a-z][a-z0-9_]*$. - pk : list[str] (required, len ≥ 1)
Constraint: pk ⊆ { f.name }; must be provably unique. - idx : list[IndexSpec] (optional)
Secondary index set. - constraints : list[ConstraintSpec] (required, len ≥ 1)
Must include primary key uniqueness and core physical constraints (e.g., monotonicity, dimensional closure). - units : dict[str, str] (optional)
units[field] = unit(field), e.g., "c_ref_value":"m/s". - dims : dict[str, str] (optional)
dims[field] = dim(field), e.g., "T_arr_const":"T". - equations : list[str] (optional)
References to minimal equations or postulates, e.g., ["S610-1","S610-2"]. - parameters : list[str] (optional)
Parameter bindings, e.g., ["c_ref_ref","n_eff_model_ref"]. - governance : GovernanceSpec (required)
Data ownership, retention, SLA, and release policy. - privacy : PrivacySpec (required)
Field classification, de-identification/masking strategies, and exceptions. - provenance : ProvenanceSpec (required)
Trace = [source -> method -> artifact], fingerprints, and signature controls. - quality_gates : QualityGateSpec (required)
Release thresholds, such as q_score_min, delta_form_max. - manifests : list[ManifestHook] (optional)
Export hooks and manifest templates. - see : list[str] (optional)
Cross-volume references, e.g., ["Core.Equations §S610","Core.Parameters §P3x"].
IV. FieldSpec (Field Dictionary)
- name : str (required)
- type : str (required)
Allowed: {"int32","int64","float32","float64","decimal(p,s)","bool","string","bytes","timestamp(UTC)","date","struct","list<T>","map<K,V>","categorical","geometry"}. - unit : str|None (optional)
SI or derived unit text, e.g., "m", "s", "m/s". - dim : str|None (optional)
Dimension string, e.g., "L", "T", "L T^-1", "1". - nullable : bool (required)
- default : any|None (optional)
- pii_level : str (required)
Allowed: {"none","low","moderate","high"}. - desc : str (required)
- aliases : list[str]|None (optional)
- enum : list[any]|None (optional)
- tags : list[str]|None (optional)
- quality_weight : float|None (optional, in [0,1])
- Constraints:
- If unit is present, then dim must be present and consistent with check_dim(expr).
- timestamp(UTC) fields must explicitly be UTC.
V. IndexSpec (Secondary Indexes)
- keys : list[str] (required, len ≥ 1)
- kind : str (required)
Allowed: {"btree","hash","geo","inverted","composite"}. - unique : bool (required)
- desc : str (optional)
VI. ConstraintSpec (Contract Templates)
- kind : str (required)
Allowed: {"unique","not_null","range","regex","enum_set","cross_field","referential","monotonic","dim_check","arrivaltime_dualform","custom"}. - expr : str (required)
Examples: "ell_end >= ell_start", "delta_form <= tol_Tarr", "check_dim(T_arr_const)=='T'". - params : dict (optional)
Example: {"tol_Tarr":"1e-9 s","fields":["T_arr_const","T_arr_integrand"]}. - severity : str (required)
Allowed: {"ERROR","WARN","INFO"}. - message : str (required)
VII. GovernanceSpec (Governance and Release)
- owner : str (required)
- steward : str (required)
- retention_days : int (required)
- sla : dict (required)
Example: {"freshness_max":"P1D","availability_target":"99.9%"}. - release : dict (required)
Example: {"freeze_policy":"immutable","signing_key":"key://k1"}.
VIII. PrivacySpec (Classification and Policies)
- classification : dict[str,str] (required)
classification[field] = pii_level. - anonymization : dict (optional)
Example: {"gamma_path":"geohash_r6","ts":"bucket_P1M"}. - masking : dict[str,str] (optional)
Example: {"uid":"hash","sid":"salted_hash"}. - exceptions : list[str] (optional)
Fields exempted for legal or research purposes (with justification).
IX. ProvenanceSpec (Trace and Fingerprints)
- trace : list[str] (required)
Example: ["sensor.S1","method.integrate_path","artifact.T_arr_v1.parquet"]. - checksum : dict (required)
Example: {"algo":"sha256","field":"hash_sha256"}. - signature : dict (required)
Example: {"keyref":"key://k1","field":"signature"}.
X. QualityGateSpec (Quality Gates)
- q_score_min : float (required, in [0,1])
- delta_form_max : str (required, time text, e.g., "1e-9 s")
- completeness_min : float (required)
- drift_method : str (required, e.g., "KL")
- drift_max : float (required)
XI. Units and Dimensional Mapping Rules
- If equations involve T_arr, both formulations must be declared and validated:
- T_arr_const = ( 1 / c_ref_value ) * ( ∫_gamma n_eff d ell )。
- T_arr_integrand = ( ∫_gamma ( n_eff / c_ref_value ) d ell )。
- dim(n_eff) = 1, dim(c_ref) = L/T, dim( ( ∫_gamma • d ell ) ) = L, hence dim(T_arr_*) = T.
- delta_form = | T_arr_const - T_arr_integrand |, with unit "s".
XII. Registration Example (YAML, Minimal and Usable)
name: DS.TARR.PathIntegral
version: "1.0"
title: Arrival-time along path integrals
description: Arrival time T_arr computed along gamma(ell) with dual-form check.
fields:
- { name: pid, type: string, unit: null, dim: null, nullable: false, pii_level: "none", desc: "path id" }
- { name: seg_id, type: int32, unit: null, dim: null, nullable: false, pii_level: "none", desc: "segment id" }
- { name: ts, type: timestamp(UTC), unit: "s", dim: "T", nullable: false, pii_level: "none", desc: "UTC time" }
- { name: CRS, type: string, unit: null, dim: null, nullable: false, pii_level: "none", desc: "coord ref sys" }
- { name: ell_start, type: float64, unit: "m", dim: "L", nullable: false, pii_level: "none", desc: "path coord start" }
- { name: ell_end, type: float64, unit: "m", dim: "L", nullable: false, pii_level: "none", desc: "path coord end" }
- { name: n_eff_mean, type: float64, unit: "1", dim: "1", nullable: false, pii_level: "none", desc: "mean effective index" }
- { name: c_ref_ref, type: string, unit: null, dim: null, nullable: false, pii_level: "none", desc: "parameter ref" }
- { name: c_ref_value,type: float64, unit: "m/s",dim: "L T^-1", nullable: false, pii_level: "none", desc: "resolved c_ref" }
- { name: T_arr_const,type: float64, unit: "s", dim: "T", nullable: false, pii_level: "none", desc: "const-pulled form" }
- { name: T_arr_integrand,type: float64, unit: "s", dim: "T", nullable: false, pii_level: "none", desc: "general integrand form" }
- { name: delta_form, type: float64, unit: "s", dim: "T", nullable: false, pii_level: "none", desc: "dual-form gap" }
- { name: q_score, type: float64, unit: "1", dim: "1", nullable: false, pii_level: "none", desc: "quality score" }
- { name: hash_sha256,type: string, unit: null, dim: null, nullable: false, pii_level: "none", desc: "checksum" }
- { name: signature, type: string, unit: null, dim: null, nullable: true, pii_level: "none", desc: "signature" }
pk: ["pid","seg_id"]
idx:
- { keys: ["ts"], kind: "btree", unique: false, desc: "time scan" }
- { keys: ["pid","seg_id"], kind: "btree", unique: true, desc: "segment lookup" }
constraints:
- { kind: "unique", expr: "unique(pid,seg_id)", severity: "ERROR", message: "pk must be unique" }
- { kind: "monotonic", expr: "ell_end >= ell_start", severity: "ERROR", message: "ell non-decreasing" }
- { kind: "dim_check", expr: "check_dim(T_arr_const)=='T'", severity: "ERROR", message: "dim(T_arr_const)=T" }
- { kind: "dim_check", expr: "check_dim(T_arr_integrand)=='T'", severity: "ERROR", message: "dim(T_arr_integrand)=T" }
- { kind: "arrivaltime_dualform", expr: "delta_form <= tol_Tarr", params: { tol_Tarr: "1e-9 s" }, severity: "WARN", message: "dual form mismatch" }
equations: ["S610-1","S610-2"]
parameters: ["c_ref_ref","n_eff_model_ref"]
governance:
owner: "team.eft-data"
steward: "user:alice"
retention_days: 3650
sla: { freshness_max: "P1D", availability_target: "99.9%" }
release: { freeze_policy: "immutable", signing_key: "key://k1" }
privacy:
classification: { pid: "none", seg_id: "none", ts: "none", CRS: "none" }
anonymization: { }
masking: { }
exceptions: [ ]
provenance:
trace: ["sensor.S1","method.integrate_path","artifact.T_arr_v1.parquet"]
checksum: { algo: "sha256", field: "hash_sha256" }
signature: { keyref: "key://k1", field: "signature" }
quality_gates:
q_score_min: 0.80
delta_form_max: "1e-9 s"
completeness_min: 0.98
drift_method: "KL"
drift_max: 0.02
see: ["Core.Equations §S610","Core.Parameters §P3x","Core.Metrology §Mx-?","Core.Errors §I50"]
XIII. Registration and Export (I60 Bindings)
- register_schema(name:str, version:str, fields:list[dict], constraints:list[str], units:dict, pk:list[str], idx:list[list[str]], see:list[str]) -> SRef
Notes: fields must include pii_level, unit, dim, and nullable; constraints must include unique(pk) and core physical contracts. - export_schema(SRef, format:str="yaml") -> str
Emits a YAML equivalent to §XII; guarantees lossless round-trip. - register_field(...) -> FRef
Prefer reusing entries from the field dictionary to ensure cross-schema consistency.
XIV. Validation Focus (Pre-Release Checklist)
- Name & version: name matches the pattern; version is semver and not already taken.
- Keys & indexes: pk ⊆ fields; unique(pk) is provable via validate_dataset.
- Units & dimensions: units/dims consistent with equations; all check_dim(expr) pass.
- Contract closure: constraints cover uniqueness, not-null, range/regex, cross-field, monotonicity, dimensional checks, and the two-form consistency.
- Privacy & governance: pii_level classified; retention_days and release.freeze_policy ready.
- Provenance: checksum/signature/trace present and verifiable on samples.
- Interoperability: parameters and equations resolvable via bind_to_parameters, bind_to_equations.
XV. Common Errors and Remedies (Linked to Core.Errors)
- E.SCHEMA.NAME.INVALID: name pattern violation → fix naming and retry.
- E.SCHEMA.VERSION.CONFLICT: duplicate version → bump_version then register.
- E.SCHEMA.FIELD.DIM.MISMATCH: inconsistency between unit/dim and check_dim → fix mappings or equation references.
- E.SCHEMA.CONSTRAINT.UNCOVERED: missing critical contract (e.g., dual-form consistency) → add the relevant ConstraintSpec.
- E.SCHEMA.PRIVACY.UNCLASSIFIED: unclassified fields detected → complete pii_level classification and review.
XVI. Arrival-Time Dual-Form Specialized Constraints
- Definitions:
T_arr_const = ( 1 / c_ref_value ) * ( ∫_gamma n_eff d ell )。
T_arr_integrand = ( ∫_gamma ( n_eff / c_ref_value ) d ell )。
delta_form = | T_arr_const - T_arr_integrand |。 - Contract:
kind="arrivaltime_dualform", expr="delta_form <= tol_Tarr", params={"tol_Tarr":"<time>"}。
Release gate: write delta_form_max into quality_gates and enforce in assert_contract.
XVII. Compatibility and Change Log Fragments (for Release Notes)
- change_log : list[ChangeSpec] (optional)
ChangeSpec = { since:"1.0", type:"add|modify|deprecate|remove", path:"fields.T_arr_const", note:"..." }。 - Breaking changes (e.g., pk change or field removal) require major+1 and a migration pointer in see.
XVIII. Minimal Viable Template (YAML, Placeholders)
name: DS.<DOMAIN>.<Subject>
version: "X.Y"
title: <human-readable title>
description: <what this dataset is for>
fields: [ { name: <f>, type: <t>, unit: <u|null>, dim: <d|null>, nullable: <bool>, pii_level: <level>, desc: <text> }, ... ]
pk: [ <field1>, <field2> ]
idx: [ { keys: [<f1>,<f2>], kind: "btree", unique: false } ]
constraints: [ { kind: "unique", expr: "unique(<k1>,<k2>)", severity: "ERROR", message: "pk unique" } ]
equations: [ ]
parameters: [ ]
units: { }
dims: { }
governance: { owner: "<team>", steward: "<user>", retention_days: <int>, sla: { freshness_max: "P?D", availability_target: "99.9%" }, release: { freeze_policy: "immutable" } }
privacy: { classification: { }, anonymization: { }, masking: { }, exceptions: [ ] }
provenance: { trace: [ ], checksum: { algo: "sha256", field: "hash_sha256" }, signature: { keyref: "key://...", field: "signature" } }
quality_gates: { q_score_min: 0.8, delta_form_max: "1e-9 s", completeness_min: 0.98, drift_method: "KL", drift_max: 0.02 }
see: [ ]
XIX. Summary
- This Schema Registry model centers on SRef and mandates a five-pack: primary keys, units/dimensions, contracts, privacy, and governance.
- For cross-volume measures like T_arr, it embeds dual-form consistency and release gating to ensure end-to-end alignment and traceability across metrology, equations, and data realization.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/