Home / Docs-Technical WhitePaper / 47-PTN Template v1.0
Chapter 5 — Data Specification & Pipeline Contract
I. Field Dictionary
Conventions: snake_case field names; wrap inline symbols in backticks; any path-dependent quantity must explicitly declare gamma(ell) and measure d ell; every quantitative field must state its unit in-field or in metadata.
- Core observations & path
- record_id | primary key | string | global unique ID (ULID/UUIDv4).
- acq.ts_start | start time | string (ISO-8601 with timezone).
- acq.ts_end | end time | string (ISO-8601 with timezone).
- path.gamma_ell | path parameterization | array[number] | discretized path coordinates; pairs with path.d_ell.
- path.d_ell | measure step | array[number] | same length as gamma(ell); unit m.
- medium.n_eff_profile | n_eff sequence | array[number] | effective refractive index along path; dimensionless.
- ref.c_ref | c_ref | number | reference propagation limit; unit m/s.
- obs.T_arr | T_arr | number | arrival time; unit s; record delta_form.
- obs.Phi | Phi | number | phase accumulation; unit rad; requires lambda_ref.
- lambda_ref | reference wavelength | number | unit m; must co-exist with Phi.
- Instrument & calibration
- instrument.id | device ID | string.
- instrument.mode | observing mode | string (enum: imaging|spectral|timing|mixed).
- calib.version | calibration version | string (SemVer).
- calib.timestamp | calibration time | string (ISO-8601).
- calib.theta | calibration parameters | object (key–value with units).
- Uncertainty & quality
- uncertainty.obs_T_arr | u(T_arr) | number | standard uncertainty; unit s.
- uncertainty.obs_Phi | u(Phi) | number | unit rad.
- noise.model | noise model | string (enum: gaussian|student|huber|custom).
- quality.flags | quality flags | array[string] (e.g., geometry_ok, calib_fresh).
- quality.score_Q | Q | number (0–1).
- Dependencies, citations & versioning
- see | inline citations | array[string] (e.g., "EFT.WP.Core.Equations v1.1:S20-1").
- references | external references | array[string] (for release manifests).
- version | data-object version | string (SemVer).
- checksum.sha256 | checksum | string (64 hex).
- signature | release signature | string (optional, CMS/PGP).
II. Domains, Units & Constraints
- Basic constraints
- obs.T_arr must co-exist with path.d_ell, medium.n_eff_profile, and ref.c_ref. n_eff dimensionless; d_ell in m; c_ref in m/s; T_arr in s.
- path.gamma_ell.length = path.d_ell.length = medium.n_eff_profile.length ≥ 2.
- Timestamps use ISO-8601 (e.g., 2025-09-24T14:08:00Z or with offset).
- Value ranges
- ref.c_ref ∈ (2.9e8, 3.1e8); medium.n_eff_profile[i] ∈ (0.8, 2.5) (engineering guardrails, overridable by domain).
- quality.score_Q ∈ [0,1]; noise.model ∈ {gaussian,student,huber,custom}.
- lambda_ref > 0; if obs.Phi present, lambda_ref is required.
- Missingness & anomalies
- Numerical missingness: null (JSON) or field omitted; never use textual NaN/Inf.
- Mark anomalies via quality.flags; do not delete in lieu of flagging; pre-register exclusion rules in the analysis plan.
- Dimensional checks
- Computed fields must carry unit metadata or a see anchor to a dimension check; include check_dim_report on delivery.
III. Quality Gates & Audit Trail
- Gates (execution order)
- G1 | Schema: presence, types, required fields.
- G2 | Citation compliance: see/references use “volume + version + anchor (P/S/M/I)”; coverage ≥ 90%.
- G3 | Path–measure integrity: paired gamma(ell)/d ell, length ≥ 2, synchronized with n_eff_profile.
- G4 | Dimensional closure: T_arr = ( ∫ ( n_eff / c_ref ) d ell ) has unit s; phase in rad.
- G5 | Calibration freshness: acq.ts_start − calib.timestamp ≤ τ_calib (domain-defined).
- G6 | Noise-residual gate: residual metric Q_res within admissible band; robust schemes report a second-order surrogate.
- G7 | Conservation checks: ε_flux meets O(θ^2); ΔM ≤ τ_M (if applicable).
- G8 | Uniqueness: unique record_id; non-duplicated checksum.sha256.
- Audit trail
- audit.run_id (ULID), audit.started_at/ended_at (ISO-8601), audit.tools (versions), audit.random_seeds, audit.input_hashes[], audit.operator.
- Produce audit.jsonl per run with parameter snapshots, references[], version, and gate pass/fail statuses.
IV. Export & Release
- Required artifacts
- manifest.yaml: dataset_id, version, references[], see[], created_at, producer, checksum, licenses (if any).
- schema.json: JSON Schema (see example).
- check_dim_report.json: dimension-check results.
- quality_report.json: gate outcomes and metrics.
- Directory layout (recommended)
PTN_EXPORT/
manifest.yaml
data/
observations.parquet
paths.parquet
schema/
schema.json
reports/
check_dim_report.json
quality_report.json
audit.jsonl
README.md
SIGNATURE.asc
- Filenames
- dataset_id-version-date.ext (e.g., ptn-demo-1.0.0-20250924.parquet); optional hash suffix +sha.<8>.
- Release tiers
- internal (inside circulation); public (external release, cite only v1.*).
V. Machine-Readable Contract Examples (drop-in)
A. manifest.yaml
dataset_id: "ptn-demo"
version: "1.0.0"
created_at: "2025-09-24T16:00:00Z"
producer: "PTN.Workgroup.Core"
see:
- "EFT.WP.Core.Equations v1.1:S20-1"
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Core.DataSpec v1.0:TARR"
references:
- "EFT.WP.Core.Terms v1.0:P10-3"
checksum:
algo: "sha256"
value: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
licenses:
- "CC-BY-4.0"
release_tier: "public"
B. schema.json (excerpt)
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "PTN Data Object v1.0.0",
"type": "object",
"required": ["record_id","acq","path","medium","ref","obs","version","see","references"],
"properties": {
"record_id": { "type": "string", "pattern": "^[0-9A-HJKMNP-TV-Z]{26}$|^[0-9a-fA-F-]{36}$" },
"acq": {
"type": "object",
"required": ["ts_start","ts_end"],
"properties": {
"ts_start": { "type": "string", "format": "date-time" },
"ts_end": { "type": "string", "format": "date-time" }
}
},
"path": {
"type": "object",
"required": ["gamma_ell","d_ell"],
"properties": {
"gamma_ell": { "type": "array", "items": { "type": "number" }, "minItems": 2 },
"d_ell": { "type": "array", "items": { "type": "number" }, "minItems": 2 }
}
},
"medium": {
"type": "object",
"required": ["n_eff_profile"],
"properties": {
"n_eff_profile": { "type": "array", "items": { "type": "number" }, "minItems": 2 }
}
},
"ref": {
"type": "object",
"required": ["c_ref"],
"properties": {
"c_ref": { "type": "number", "minimum": 2.9e8, "maximum": 3.1e8 }
}
},
"obs": {
"type": "object",
"properties": {
"T_arr": { "type": "number" },
"Phi": { "type": "number" }
}
},
"see": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
"references": { "type": "array", "items": { "type": "string" }, "minItems": 1 },
"version": { "type": "string", "pattern": "^(0|[1-9]\\d*)\\.(0|[1-9]\\d*)\\.(0|[1-9]\\d*)(?:[-+].*)?$" }
}
}
C. pipeline.yaml (processing contract)
version: "1.0.0"
pipeline:
- id: step-10-ingest
in: ["raw/*.parquet"]
out: ["stage/ingested.parquet"]
checks: ["G1","G8"]
- id: step-20-calibrate
in: ["stage/ingested.parquet"]
out: ["stage/calibrated.parquet"]
checks: ["G5"]
- id: step-30-arrival
in: ["stage/calibrated.parquet"]
out: ["stage/arrival.parquet"]
compute:
form: "T_arr = ( ∫ ( n_eff / c_ref ) d ell )"
requires: ["path.gamma_ell","path.d_ell","medium.n_eff_profile","ref.c_ref"]
delta_form: "general"
checks: ["G3","G4"]
see:
- "EFT.WP.Core.Equations v1.1:S20-1"
- id: step-40-noisefit
in: ["stage/arrival.parquet"]
out: ["stage/denoised.parquet","reports/noise.json"]
model: "huber"
checks: ["G6"]
- id: step-50-exports
in: ["stage/denoised.parquet"]
out: ["PTN_EXPORT/"]
checks: ["G2","G4","G7"]
audit:
run_id: "01JXYZABCD0EFG7H8JK9MN0PQ"
seeds: [20250924]
tools:
- name: "ptn-cli"
version: "1.4.2"
exports:
must_include: ["manifest.yaml","schema.json","check_dim_report.json","quality_report.json","audit.jsonl"]
D. Dimensional check (example)
Check: T_arr = ∫ ( n_eff / c_ref ) d ell
Dims : [1] / [m·s^-1] * [m] = [s] ✅
VI. Results Export Page (minimal publication set)
- Project name, dataset_id, version, generation time, producer.
- Metric overview: record count, mean path length, T_arr summary stats, Q distribution.
- Quality & checks: gate pass rates, p_dim, Q_res.
- Citations & dependencies: see[], references[] (volume + version + anchor).
- Retrieval & verification: checksum, SIGNATURE.asc, directory layout, file sizes.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/