Home / Docs-Technical WhitePaper / 51-Pipeline Card Template v1.0
Chapter 4 — Inbound Data & Contracts (Source / Schema / Versioning)
I. Purpose & Scope
- Standardize inbound data sources (Source), data contracts (Schema/Contract), and version evolution for authoring, validation, and release, ensuring cross-pipeline alignment, dimensional closure, and traceability.
- For inbound fields involving path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, and the data side must record delta_form ∈ {general, factored}; publication requires p_dim = 1.0 with check_dim_report.json attached.
II. Source Definition
- Source types: file/object (partitioned batch files), stream (events/message queues), near-rt (shared memory/micro-batch), api (HTTP/gRPC).
- Minimal metadata: source_id, producer, created_at (ISO-8601), partition/window, checksum.sha256, optional signature.
- Compliance prerequisite: upstream artifacts must list see[]/references[]/version (volume + version + anchor) in manifest.yaml, with anchor coverage ≥ 90%, and no external links/aliases.
III. Inbound Contract / Schema
- Align with TARR: field names, units, and dimensions match TARR (data spec). Key path fields:
- path.gamma_ell: array<m>; path.d_ell: array<m>; medium.n_eff_profile: array<1>;
- ref.c_ref: m/s; (for phase) lambda_ref: m; obs.T_arr: s, obs.Phi: rad (if present).
- Field spec (minimal set):
- Identity: record_id (ULID/UUIDv4), acq.ts_start/ts_end (ISO-8601), instrument.id/mode.
- Quality: quality.flags[], quality.score_Q (0..1), uncertainty.* (if available).
- Citations & version: see[], references[], version (SemVer), checksum.
- Units & dimensions: all numeric fields carry unit or are defined in the contract; any expression with division/integrals/composites must use parentheses.
- Missingness: use null or field omission; textual NaN/Inf is forbidden; encode reasons in quality.flags.
IV. Schema Evolution
- Compatibility matrix:
- Backward-compatible addition (MINOR): add fields only; old consumers remain valid.
- Breaking changes (MAJOR): change/delete semantics/units/dimensions.
- Fixes (PATCH): defaults/descriptions; no semantic change.
- Version stamping: inbound bundle and schema.json both carry version (SemVer); keep in sync in manifest.yaml.
- Migration: for MAJOR changes provide explicit mapping/conversion (e.g., rad = deg * π/180) and rollback plan.
V. Validation Pipeline (Mx-?)
- Schema check (G1): field existence/types/required; partition/window alignment and index consistency.
- Citation compliance (G2): see[]/references[] with anchor coverage ≥ 90%; no external links/aliases.
- Path conventions (G3): gamma(ell)/d ell/delta_form present; len(gamma_ell)=len(d_ell)=len(n_eff)≥2, Δell satisfies sampling constraints.
- Dimensional closure (G4): pass I70-dim_check; export check_dim_report.json; p_dim = 1.0.
- Freshness (G5): clock_state="locked", |ts_start − calib.timestamp| ≤ τ_calib.
- Coverage (G6): if uncertainty fields exist, coverage ∈ {k, alpha, quantile} and aligned with publication.
- Covariance consistency (G7): cov_group and kernel params consistent with Error Budget Ch. 5; Σ PD.
- Uniqueness (G8): non-duplicated record_id and checksum; audit trail complete.
VI. Machine-Readable Contracts
A. schema.json (excerpt)
{
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Inbound v1.0.0",
"type":"object",
"required":["record_id","acq","path","medium","ref","version","see"],
"properties":{
"record_id":{"type":"string"},
"acq":{"type":"object","required":["ts_start","ts_end"],
"properties":{"ts_start":{"type":"string","format":"date-time"},"ts_end":{"type":"string","format":"date-time"}}},
"path":{"type":"object","required":["gamma_ell","d_ell"],
"properties":{"gamma_ell":{"type":"array","items":{"type":"number"},"minItems":2},
"d_ell":{"type":"array","items":{"type":"number"},"minItems":2}}},
"medium":{"type":"object","required":["n_eff_profile"],
"properties":{"n_eff_profile":{"type":"array","items":{"type":"number"},"minItems":2}}},
"ref":{"type":"object","required":["c_ref"],"properties":{"c_ref":{"type":"number"}}},
"see":{"type":"array","items":{"type":"string"},"minItems":1},
"version":{"type":"string"}
}
}
B. contract.yaml (inbound contract)
version: "1.0.0"
source:
id: "SRC-telemetry-rt"
mode: "streaming" # file|streaming|near-rt|api
schema: "schemas/inbound/schema.json"
units:
c_ref: "m/s"
T_arr: "s"
Phi: "rad"
path:
required: true
delta_form: "general"
quality_gates: ["G1","G2","G3","G4","G5","G6","G7","G8"]
C. manifest.yaml (inbound artifact list)
dataset_id: "ptn-ingest-202509"
version: "1.0.0"
created_at: "2025-09-24T16:00:00Z"
producer: "pipeline.ingest"
see:
- "EFT.WP.Core.Equations v1.1:S20-1"
- "EFT.WP.Core.Metrology v1.0:check_dim"
checksum: { algo: "sha256", value: "<64-hex>" }
VII. Anti-Patterns & Fixes
- Anti: path data missing d ell or delta_form; Fix: add gamma/measure/delta_form and align length with n_eff.
- Anti: units encoded as text %; Fix: use unit 1 and note “percent” in comments.
- Anti: T_arr = ∫ n_eff / c_ref d ell (no parentheses); Fix: T_arr = ( ∫ ( n_eff / c_ref ) d ell ).
- Anti: references lack version/anchor; Fix: See "EFT.WP.Core.Equations v1.1" Ch.2 S20-1.
VIII. Validation & Alerts
- /validate returns G1–G8 results per batch/window and stops (S1–S5); dimensional closure follows check_dim_report.json.
- Alerts: schema violations, low anchor coverage, p_dim < 1, clock_state != locked, non-PD covariance, path inconsistency.
- Audit: audit.jsonl logs timestamp, operator, input hashes, seed, decisions, signatures.
IX. Release & Layout
PTN_EXPORT/
inbound/
contract.yaml
schema.json
data/
*.parquet
reports/
check_dim_report.json
validate_report.json
audit.jsonl
manifest.yaml
SIGNATURE.asc
X. Cross-References
- Architecture & graph: Ch. 3.
- Timebase, sync & buffering: Ch. 5.
- Stage specs & control equations: Ch. 6.
- Gates & monitoring: Ch. 9.
- Parameter/Error/Protocol templates: Parameter Card Ch. 9, Error Budget Ch. 5/6/8/9, Experimental Protocol Ch. 5.
XI. Checklist
- contract.yaml / schema.json / manifest.yaml present and consistent; see[]/references[] compliant with anchor coverage ≥ 90%.
- Path fields explicit gamma(ell)/d ell, delta_form recorded; len(path) ≥ 2, Δell compliant.
- I70-dim_check passed, p_dim = 1.0; freshness clock_state="locked", τ_calib compliant.
- Coverage matches U = k·u_c/quantiles; cov_group consistent with Error Budget, Σ PD.
- /validate passes G1–G8, no S1–S5; inbound artifacts include checksum and signature.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/