Home / Docs-Technical WhitePaper / 33-EFT.WP.Cosmo.EarlyObjects v1.0
Appendix B — Data Specification and I/O
I. One-Sentence Goal
Anchor all data objects and I/O for Early Objects to Template v0.1 (EFT Technical Whitepaper & Engineering Memos — Complete Checklist v0.1). Define schemas, units, serialization, directory layout, I/O contracts, and error semantics so that Catalog/Seeds/Trajectory, Phi_T/grad_Phi_T, L_nu/LC, n_eff, { ell_i }, Delta_T_sigma, {R_env,T_trans,A_sigma}, and both arrival-time forms T_arr/Delta_T_arr are operational, reproducible, and auditable.
II. Scope & Non-Goals
- Covered: object model & primary keys, field+unit rules, serialization & directory layout, I/O contracts, DQ checks & consistency tests, Template-family alignment, JSONL examples, workflow mapping.
- Not covered: physics/numerics re-derivations; instrument/pipeline-specific formats; opaque or unverifiable formats.
III. Global Constraints & Conventions
- Coords/metric/units are mandatory: coords_spec, metric_spec, units_spec must be present; normalize ingress to SI. If inputs arrive in km/ms, map to m/s and log the mapping.
- Inline symbols: always use backticks for T_arr, Delta_T_arr, n_eff, c_ref, gamma(ell), Sigma_env, Delta_T_sigma, etc.
- Naming isolation: T_fil ≠ T_trans; n ≠ n_eff.
- Dimensionality & lower bound: ingress must pass check_dimension. Enforce dim(T_arr)=[T], dim(n_eff)=1, dim(c_ref)=[L][T^-1]. Outputs must satisfy the lower bound T_arr ≥ L_path / c_ref (the general form is equivalent).
- Energy consistency at interfaces: every event satisfies R_env + T_trans + A_sigma = 1, and must produce in-band curves with residuals.
- Two-form arrival time:
- Constant pull-out: T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- General form: T_arr = ( ∫ ( n_eff / c_ref ) d ell )
Record mode ∈ {constant, general}.
IV. Data Objects & Primary Keys (minimal fields)
Contract (measurement contract)
- Required: id, spec_version, coords_spec, units_spec, metric_spec, mode, gauge:{x_ref,t_ref}, boundary_config, tolerances:{eps_T,eta_T,eta_w,tau_switch}
- Dependencies: n_eff_dependencies (e.g., F(Phi_T, grad_Phi_T, rho, f))
- Hashes: hash(Catalog), hash(Seeds), hash(Trajectory), hash(SeaProfile) (if coupled), hash(Phi_T), hash(grad_Phi_T), hash(n_eff), hash(gamma), hash(code)
Catalog (object directory)
- Required: { id, type, z_form, z_obs, env_ref, seed_ref }, plus hash(Catalog)
Seeds/Triggers
- Required: priors, seed_samples (incl. seed_rng), triggers:[{event,type,time}], hash(Seeds)
Trajectory (state series)
- Required: state_series:[{t, M, R, J, a_bh, SFR, Z, …}], events:[…], hash(Trajectory)
Field (fields & refractive index)
- Required: name ∈ {Phi_T, grad_Phi_T, n_eff}, storage ∈ {grid, trajectory}, coords_spec, units_spec
- Grid: grid_axes:{x:[],y:[],z:[]}; Trajectory: samples:{path_id:[…]}
SeaProfile / Interfaces (optional)
- SeaProfile: layers:[{model, chi_k, Delta_k, sigma_k, …}], eta_w, hash(SeaProfile)
- Interfaces: sigma_id, type ∈ {continuous, jump_phi, jump_flux, anisotropic}, location (implicit function or grid)
- Optional events: C_sigma, J_sigma, R_env, T_trans, A_sigma
Path
- Required: path_id, gamma:[…] (coordinates), Δell:[…] (line elements, same length as gamma)
- Optional: t_hat:[…]
- Interfaces: interface_marks:[idx…] (discrete indices/interpolation locations for { ell_i })
Spectral/Obs
- L_nu(f) (intrinsic spectrum), F_nu(f) (observed spectrum), LC(t) (light curve)
- Observations:{ T_arr_obs_s, Delta_T_arr_obs_s, F_nu_obs, LC_obs } with uncertainties and ISO-8601 UTC timestamps
RTParams (energy triplet)
- Required: in-band curves & clamped intervals for R_env(f), T_trans(f), A_sigma(f)
CalibCref (reference speed calibration)
- Required: gamma_ref_id, T_arr_ref_s, n_eff_ref_hash, c_ref_est, u_stat, u_sys, env_block
Report/Log
- Required: run_id, contract_id, hashes, metrics:{eps_T,eta_T,eta_c,eta_w,tau_switch,GB,u_c}, notes
V. Serialization & Directory Layout
- Formats: static data in JSONL/Parquet; large grid fields in Zarr/NetCDF (field names still follow this spec).
- Suggested layout:
- /contracts/ *.contract.json
- /catalog/ *.catalog.json
- /seeds/ *.seeds.json
- /traj/ *.trajectory.jsonl
- /fields/ phi_t.*, grad_phi_t.*, neff.*
- /seaprofile/ *.sea.json
- /interfaces/ sigma_env.*
- /paths/ *.path.jsonl
- /spectra/ Lnu.*, Fnu.*, LC.*
- /obs/ *.obs.jsonl
- /rtparams/ rt.*
- /calib/ c_ref.*
- /artifacts/ reports, logs, hash manifests, replay scripts
- Naming: <object>-<id>-<hash8>.<ext> where hash is content-hash (first 8 chars).
VI. Field & Unit Rules (key fields)
- f_hz: Hz = s^-1; T_arr_obs_s / Delta_T_arr_obs_s: s; Δell: m; c_ref: m•s^-1
- n_eff, R_env, T_trans, A_sigma: dimensionless
- Phi_T may be non-dimensionalized; otherwise Phi_ref must be declared in Contract; grad_Phi_T unit is dim(Phi_T)[L^-1]
- L_nu: W•Hz^-1 (or Contract photometric system); F_nu: W•m^-2•Hz^-1; LC: declared in Contract
- Delta_T_sigma, tau_switch: s
- All coordinates/metric/units must match the Contract; cross-system data must include explicit mapping and logs.
VII. I/O Contracts (aligned to Template family)
This section anchors Template APIs (not the volume’s implementation). Engineering mappings may be appended as “Template → I70-*”.
End-to-end (object → spectrum → propagation)
- Input: Catalog/Seeds/Trajectory, Phi_T/grad_Phi_T or T_fil+G(•), optional SeaProfile/Sigma_env, Path, f_grid, c_ref or CalibCref
- API family: I.Build.*, I.Path.Capture|Segment, I.Arrival.Constant|General|Delta, (optional) I.Interface.ApplyMatching, I.Report.*
- Output: L_nu/F_nu/LC, T_arr/Delta_T_arr, and audit logs for consistency/energy/switching
Causation & triggers
- Input: priors, environmental slices (Phi_T/SeaProfile)
- API family: I.Build.* (seed sampling, trigger process)
- Output: Seeds/Triggers (with seed_rng and hashes)
Energy consistency & interface audit
- Input: Sigma_env/SeaProfile, Path, observations or simulation outputs
- API family: I.Interface.ApplyMatching, I.RT.Estimate, I.Report.Log
- Output: RTParams and residual curves
VIII. Data-Quality Checks (DQC, automated)
- DQC-1 Dimension check: check_dimension covers both arrival-time forms, discrete segmentation, and layer/interface terms (see Appendix A).
- DQC-2 Unit coherence: Δell, gamma, c_ref share consistent units; if remapped at ingress, record mapping.
- DQC-3 Lower bound: T_arr_obs ≥ L_path / c_ref; near-margin samples within −k•u_c must be flagged.
- DQC-4 Two-form consistency: if both forms are available, eta_T ≤ threshold.
- DQC-5 Energy consistency: for every interface/band, ensure R_env + T_trans + A_sigma = 1.
- DQC-6 Thin/thick coherence: tau_switch = | T_arr^{thick} − (T_arr^{thin}+Delta_T_sigma) | ≤ limit.
- DQC-7 Differential coherence: same gamma[k], Δell[k] and Delta_T_sigma setup for all frequency pairs on the same path.
- DQC-8 Clamping statistics: record n_eff ∈ [1,n_max] clamping rate and impact.
- DQC-9 Reproducibility: SolverCfg, random seed, hash(*), and replay commands are present.
IX. Error Semantics (aligned to Template error family)
- E-DIM-001: dimensional inconsistency or missing units (reject)
- E-GAUGE-002: unspecified/ambiguous gauge (request gauge completion)
- E-NEFF-003: n_eff < 1 or assembly failure (reject and log falsification sample)
- E-PATH-004: illegal path discretization or measure mismatch (request {gamma, Δell} rebuild)
- E-INTF-005: interface matching failure or parameter out of bounds (reject; attach Sigma_env/SeaProfile tags)
- E-QAD-006: quadrature non-convergence or unmet eps_T (return local error breakdown)
- E-CREF-007: c_ref calibration unsolved/unstable (return environment block)
- E-CONSIST-008: two-form consistency failure
- E-EO-010: thin/thick inconsistency or Delta_T_sigma vs. volume integral gap beyond threshold
X. JSONL Examples (minimal viable)
Contract (/contracts/eo.contract.json)
{
"id": "ct-eo-001",
"spec_version": "EFT.WP.Cosmo.EarlyObjects v1.0",
"coords_spec": "Comoving-Spherical",
"units_spec": {"length":"m","time":"s","speed":"m•s^-1","frequency":"Hz"},
"metric_spec": {"type":"FLRW-like","S_k":"sin","a_ref":1.0},
"mode": "constant",
"gauge": {"x_ref":[0,0,0], "t_ref":"2025-01-01T00:00:00Z"},
"boundary_config": {"type":"Dirichlet","Phi_T_far":0},
"tolerances": {"eps_T":1e-9,"eta_T":5e-10,"eta_w":0.03,"tau_switch":5e-12},
"n_eff_dependencies": "F(Phi_T, grad_Phi_T, rho, f)",
"hashes": {
"hash(Catalog)":"aa22bb33",
"hash(SeaProfile)":"77cc11dd",
"hash(Phi_T)":"ab12cd34",
"hash(grad_Phi_T)":"de98fa76",
"hash(gamma)":"ef56ab78",
"hash(code)":"aa11bb22"
}
}
Catalog (/catalog/eo.catalog.json)
{"objects":[{"id":"obj001","type":"BHSeed","z_form":18.2,"z_obs":12.7,"env_ref":"sea_v1","seed_ref":"sd001"}]}
Seeds (/seeds/sd001.seeds.json)
{"id":"sd001","priors":{"M0":{"dist":"lognormal","mu":2e4,"sigma":0.3}},"seed_samples":[{"M0":2.3e4,"R0":1.5e15,"J0":1.0e50}],"seed_rng":20250905}
SeaProfile (/seaprofile/sea.v1.json)
{"layers":[{"model":"tanh","chi_k":1.2e3,"Delta_k":2.0e2,"sigma_k":1.0e2}],"eta_w":0.03,"hash(SeaProfile)":"77cc11dd"}
Path (/paths/p001.path.jsonl)
{"path_id":"p001","gamma":[[0,0,1.1e3],[0,0,1.3e3],[0,0,2.3e3]],"Δell":[2.0e2,1.0e3],"t_hat":[[0,0,1],[0,0,1]],"interface_marks":[1]}
Observations (/obs/p001.obs.jsonl)
{"obs_id":"o001","path_id":"p001","f_hz":1.0e9,"T_arr_obs_s":6.2001e-3,"Delta_T_arr_obs_s":-7.0e-7,"u_stat_s":2.0e-6,"u_sys_s":3.0e-6,"timestamp":"2025-01-01T00:00:00Z"}
{"obs_id":"o002","path_id":"p001","f_hz":1.05e9,"T_arr_obs_s":6.2008e-3,"Delta_T_arr_obs_s":0.0,"u_stat_s":2.0e-6,"u_sys_s":3.0e-6,"timestamp":"2025-01-01T00:00:01Z"}
RTParams (/rtparams/rt.p001.json)
{"R_env":[["9.5e8",0.18],["1.0e9",0.20],["1.05e9",0.19]],
"T_trans":[["9.5e8",0.77],["1.0e9",0.76],["1.05e9",0.78]],
"A_sigma":[["9.5e8",0.05],["1.0e9",0.04],["1.05e9",0.03]]}
CalibCref (/calib/c_ref.json)
{"gamma_ref_id":"p_ref","T_arr_ref_s":6.2000e-3,"n_eff_ref_hash":"99aa33bb",
"c_ref_est":2.99792458e8,"u_stat":5.0e3,"u_sys":1.0e3,
"env_block":{"temp_C":20.0,"clock":"UTC"}}
XI. Typical I/O Workflow Alignment (Template family)
The Template family is authoritative; engineering may add a “Template → I70-*” mapping.
A. Object → spectrum → propagation (E2E)
- I.Build.Catalog|Seeds|Trajectory → produce Catalog/Seeds/Trajectory
- I.Build.Phi|Neff → assemble Phi_T/grad_Phi_T/n_eff (optionally with SeaProfile)
- I.Path.Capture|Segment → { gamma[k], Δell[k] }, { ell_i }
- I.Arrival.Constant|General|Delta → T_arr/Delta_T_arr
- I.Report.Log|Emit → persist hashes, thresholds, falsification samples, replay entrypoints
B. Energy consistency & interface audit
- I.Interface.ApplyMatching (if coupled to SeaProfile/Sigma_env)
- I.RT.Estimate → { R_env, T_trans, A_sigma }
- I.Report.Log → residual curves & side-limit checks
C. Causation & triggers
- I.Build.Seeds|Triggers → sampling & registry
- I.Report.Log → priors, random seeds, parameter hashes
XII. Data Quality & Audit Checklist (pre-publish self-check)
- DimReport present; Δell / c_ref units consistent; metric_spec explicit.
- { ell_i } endpoints explicit in integrals; no cross-interface interpolation.
- eta_T, tau_switch, lower-bound and energy-consistency margins pass.
- Differential reuse of the same path discretization & corrections; out-of-band leakage recorded.
- Clamping rate logged; hash(*), SolverCfg, seed, and replay command present.
XIII. Security & Integrity
- Read-only mounts: recommend read-only for /contracts, /obs, /interfaces.
- Content hashing: content-hash (excluding filename/timestamp) for cross-environment invariance.
- Minimal metadata: logs retain only necessary indicators & hashes to avoid exposing sensitive path info.
- Integrity checks: write SHA-256 and file length for critical objects; re-verify on import.
XIV. Cross-Volume Alignment (data side)
- With Propagation.TensionPotential v1.0: two-form fields, Path/Field names & units.
- With Cosmo.LayeredSea v1.0: SeaProfile/Interfaces fields and tau_switch semantics.
- With Core.Metrology v1.0: units_spec/coords_spec/metric_spec/traceability.
- With Core.Errors v1.0: naming and reporting for u_stat/u_sys/u_c.
XV. Deliverables
- Data architecture compendium: schemas + exemplars for Contract/Catalog/Seeds/Trajectory/Field/SeaProfile/Interfaces/Path/Spectral/Observations/RTParams/CalibCref/Report.
- I/O contract boilerplates: I/O fields, units, requiredness, and error-semantics mapping (per Template family).
- Audit-bundle template: hash manifest, DimReport, SolverCfg, run logs, and falsification sample list.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/