Home / Docs-Technical WhitePaper / 06-EFT.WP.Core.DataSpec v1.0
Chapter 3 — Metadata and the Trace Chain
I. Objectives and Scope
- Establish a metadata system centered on the manifest and the Trace, ensuring unique dataset identity, auditable provenance, arrival-time two-form consistency, and verifiable units and dimensions.
- Provide content addressing and non-repudiation via hash_sha256(blob) and signature; construct the evidence chain by linking Trace = [source -> method -> artifact].
- Maintain anchor alignment with Core.Equations, Core.Parameters, and Core.Metrology—in particular gamma(ell), d ell, n_eff(x,t), c_ref, T_arr.
II. Core Definitions and Symbols
- manifest def= { dataset_id, schema_ref, schema_version, pk, idx, units, dim, created_ts, author, tool_rev, env, lineage, checksum, signature }.
- Trace def= [source -> method -> artifact], where source, method, and artifact are nodes; edges represent processing or transfer.
- checksum def= hash_sha256(canonical(ds)), signature def= Sign(checksum, keyref).
- EvidenceChain def= <Trace, manifest, checksum, signature>.
- canon(ds) def= stable serialization of ds with order(pk) and normalized units.
- TraceID def= hash_sha256(canonical(Trace)).
III. Minimal Required Manifest Set
- Identity and schema
dataset_id, schema_ref, schema_version, pk, idx, fmt. - Metrology and units
units : { field -> unit(field) }, dim : { field -> dim(field) }, check_dim_status ∈ {"pass","fail"}. - Generation environment
created_ts, author, tool_rev, env = { os, cpu, gpu, libs, locale }. - Lineage and fingerprints
lineage = { parents : [checksum_i], TraceID }, checksum, signature, keyref. - Arrival-time specifics
path = { pid, CRS, orientation, L_gamma }, integrand = { n_eff, c_ref }, measure = "d ell", formulation = {"factored"|"general"}, delta_form.
IV. Trace Model and Evidence-Chain Structure
- Node types
- source: artifacts from raw acquisition or external provision;
- method: deterministic or stochastic processing step, recording version and params;
- artifact: a processing output bound to checksum and schema_ref.
- Normalization requirements
- Every method node records code_rev and params;
- A given artifact’s checksum uniquely determines its content; signature binds keyref;
- The Trace must be a DAG; compute TraceID via hash_sha256.
- Minimal closure of the evidence chain
At any point, EvidenceChain must include parent fingerprints, the current checksum, signature, and TraceID, and it must be replayable to any ancestor.
V. Fingerprints, Signatures, and Reproducibility
- Fingerprint workflow (Mx-1)
- Produce canon(ds) by ordering with order(pk) and conforming to field specifications;
- Compute checksum = hash_sha256(canon(ds));
- Generate signature = Sign(checksum, keyref);
- Write checksum, signature, and keyref to the manifest, appending parents.
- The reproducibility triple
ReproTriple def= <checksum, schema_version, code_rev>; only when all three are present do we claim strong reproducibility. - Verification steps
- Verify signature against keyref;
- Recompute hash_sha256(canon(ds)) and compare with checksum;
- Check schema_version compatibility with the local schema;
- Replay method steps in the Trace, expecting identical checksum.
VI. Metadata Namespaces and Field Dictionary
- MD.core.*
- MD.core.dataset_id : string
- MD.core.schema_ref : string
- MD.core.schema_version : string
- MD.core.pk : array<string>
- MD.core.idx : array<array<string>>
- MD.env.*
MD.env.os : string, MD.env.cpu : string, MD.env.gpu : string, MD.env.libs : array<string>, MD.env.locale : string - MD.trace.*
MD.trace.parents : array<string>, MD.trace.TraceID : string, MD.trace.code_rev : string, MD.trace.params : string - MD.sec.*
MD.sec.checksum_sha256 : string, MD.sec.signature : string, MD.sec.keyref : string - MD.quality.*
MD.quality.q_score : float, MD.quality.drift : float, MD.quality.completeness : float - MD.arrival.*
- MD.arrival.pid : string, MD.arrival.CRS : string, MD.arrival.orientation : {"forward"|"reverse"}
- MD.arrival.L_gamma : float, MD.arrival.formulation : {"factored"|"general"}, MD.arrival.delta_form : float
VII. Metadata for Arrival-Time Two-Form Consistency
- Formulation declaration
- formulation="factored" means T_arr = ( 1 / c_ref ) * ( ∫_gamma n_eff d ell );
- formulation="general" means T_arr = ( ∫_gamma ( n_eff / c_ref ) d ell )。
- Discrepancy recording
- delta_form = | ( 1 / c_ref ) * ( ∫_gamma n_eff d ell ) - ( ∫_gamma ( n_eff / c_ref ) d ell ) |;
- The manifest must include delta_form and threshold tol_Tarr, and the contract must assert delta_form ≤ tol_Tarr.
- Path consistency
Persist pid; ensure non-decreasing ell; declare CRS; record L_gamma = ( ∫_gamma 1 d ell ).
VIII. Auditable Manifest Template (Text)
- dataset_id=<string>
- schema_ref=<string>; schema_version=<semver>
- pk=[<field>]; idx=[[<field>...]]; fmt=<jsonl|csv|parquet|nc|tfrecord>
- units={ field:unit(...) }; dim={ field:dim(...) }; check_dim_status=<pass|fail>
- created_ts=<ISO8601>; author=<string>; tool_rev=<string>
- env={ os, cpu, gpu, libs, locale }
- lineage={ parents:[<checksum>...], TraceID=<hash> }
- security={ checksum_sha256=<hash>, signature=<sig>, keyref=<kid> }
- arrival={ pid=<id>, CRS=<epsg>, orientation=<forward|reverse>, L_gamma=<float>, formulation=<factored|general>, delta_form=<float>, tol_Tarr=<float> }
IX. Contract Mapping and Validation Interfaces
- assert_contract typical assertions
- unique(dataset_id);
- non_decreasing(ts) and non_decreasing(ell);
- check_dim( y - f(x; theta) );
- range(q_score, 0, 1);
- delta_form ≤ tol_Tarr;
- exists(MD.sec.checksum_sha256) and verify(signature, keyref)。
- Interface crosswalk
- attach_provenance(ds, trace) → writes MD.trace.* and TraceID;
- compute_checksum(ds,"sha256") → produces checksum;
- sign_data(ds,keyref) → produces signature;
- export_manifest(ds) → emits the key set in this chapter’s template.
X. Incorporating Drift and Quality Metadata
- Quality dimensions
completeness = N_observed / N_expected, validity = N_valid / N_observed, consistency ∈ [0,1], timeliness = now - created_ts. - Drift recording
- drift = monitor_drift(ds_ref, ds_new, fields, method="KL")["score"];
- Record under MD.quality.* in the manifest, including ref_window and threshold.
XI. Arrival-Time Use Case: End-to-End Traceability
Steps- Acquire and produce artifact_0, recording MD.core.* and MD.env.*;
- Generate canon(artifact_0), write checksum_0 and signature_0;
- Compute T_arr from gamma(ell) and n_eff (per declared formulation) to obtain artifact_1;
- Compute delta_form and assert delta_form ≤ tol_Tarr;
- Build Trace = [artifact_0 -> method_compute_Tarr -> artifact_1], compute TraceID;
- In artifact_1’s manifest, update parents=[checksum_0], add checksum_1, signature_1, and MD.arrival.*.
XII. Governance, Postulates, and Compliance Essentials
- P63-1 (Content-addressing): checksum uniquely identifies a data entity; a changed checksum is a new entity.
- P63-2 (Reproducibility): ReproTriple = <checksum, schema_version, code_rev> must be complete; without any one of them, reproducibility must not be claimed.
- P63-3 (Non-repudiation): any externally published artifact must carry a signature and a traceable keyref.
XIII. Implementation Checklist for Cross-Volume Binding
- bind_to_equations(ds, eqn_refs): list Sxx-? and Pxx-? under manifest.see;
- bind_to_parameters(ds, params): record parameter version and provenance;
- enforce_arrival_time_convention(ds): generate and validate delta_form, persist under MD.arrival.*;
- Core.Errors integration: when check_dim_status="fail" or verify(signature) fails, return E.DataSpec.ContractViolation and block release.
XIV. Publication and Freeze
Freeze workflow- export_schema(SRef,"yaml") and export_manifest(ds) to publish;
- freeze_release(ds, tag) to lock the version;
- Publish the ReproTriple and TraceID and archive for audit.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/