34-EFT.WP.Astro.Acceleration v1.0 | Chapter 14 Data, Pipelines & Benchmarks

Home ／ Docs-Technical WhitePaper (V6.0) ／ 34-EFT.WP.Astro.Acceleration v1.0

Chapter 14 Data, Pipelines & Benchmarks

I. Abstract & Scope
This chapter defines unified specifications and release workflows I75-* / M75-* for data, pipelines, and benchmarks: JSON schemas and registries for dataset/model/pipeline cards, data validation and dimensional consistency, pipeline execution and reproducible environments, benchmark suites and acceptance gates, and externally released reproducible bundles with audit trails. All symbols use English notation wrapped in backticks; SI units. Any ToA-related field must record both forms with explicit path gamma(ell) and measure d ell.

II. Dependencies & References

Unified symbols & units: Chapter 2 Tab. 2-1 and P12-*.
Kinematics & channels: Chapter 3 S20-; reconnection/shear: Chapter 4 S30-, Chapter 5 S40-; comparator: Chapter 6 S45-.
Spectrum formation & transport: Chapter 7 S50-, Chapter 8 S52-.
Domain branches: GRB (Chapter 10 M62-), FRB (Chapter 11 M64-).
Simulation stack: Chapter 12 M70-* (products & metrics).
Inference & falsification: Chapter 13 M72-* (evidence, masks, deliverables).

III. Normative Anchors (added in this chapter, I75-/M75-)

I75-0 (Card Schemas & Registry): establish JSON Schemas for three cards—DatasetCard, ModelCard, PipelineCard; unify required fields, Unit/Dim, see: anchors, {code_hash, data_hash}, and versioning.
I75-1 (DataSpec & Field Constraints): every numeric column must carry unit and dim; ToA fields must store T_arr^A, T_arr^B, and delta_form in parallel.
I75-2 (PipelineSpec & DAG): pipelines are directed acyclic graphs G=(V,E) with node types {ingest, calibrate, simulate, fit, validate, export}; nodes declare inputs/outputs/env/seed.
I75-3 (Product Layout & Naming): standard directories: products/, metrics.json, masks/, delta_form.log, repro/, cards/; filenames include {sim_id|run_id|stamp}.
I75-4 (Interface Prototypes):
- export_dataset_card(ds: DataSpec) -> DatasetCard
- run_pipeline(p: PipelineCard, cfg: SimCfg) -> ArtifactBundle
- register_benchmark(bundle) -> BenchmarkID
M75-1 (Ingest & Validation): validate fields/units/dimensions per DataSpec; verify hashes and integrity; emit DatasetCard.
M75-2 (Pipeline Execution & Reproducibility): lock environment (container/dep versions/RNG seeds) and execute per PipelineCard; produce ArtifactBundle and metrics.json.
M75-3 (Benchmarks & Acceptance): apply Chapter 12 metrics/thresholds; run regression; if metrics meet gates, enqueue for release.
M75-4 (Audit & Archival): archive {code_hash, data_hash, rng_state, SimCfg, cards, masks, delta_form}; emit an audit manifest.
M75-5 (Release & Versioning): semantic versioning MAJOR.MINOR.PATCH; MAJOR changes ship compatibility notes and migration scripts; release bundle is repro_bundle.

IV. Body Structure

I. DatasetCard

Purpose & scope: describe origin, conventions, units, and covariance of raw/processed datasets.
Required fields:
- meta: {dataset_id, version, instrument, band, time_span}
- spec: {columns:[{name, unit, dim, description, see}], sampling, calibration}
- quality: {systematics, covariance, masks}
- integrals: {path:"gamma(ell)", measure:"d ell"} (for ToA-related columns)
- hash: {data_hash, card_hash}
- see: anchors to volumes/sections
Dual-form ToA: store side-by-side
T_arr^A = ( 1 / c_ref ) * ( ∫ n_eff d ell ) and T_arr^B = ( ∫ ( n_eff / c_ref ) d ell ), with delta_form.

II. ModelCard

Purpose & scope: describe model/parameterization and priors, versioning, and compatibility.
Required fields:
- model_id, version, family (S30/S40/S50/S52/…)
- params: {name, transform, prior, bounds, unit, dim}
- hyper: hierarchical priors and shared hyperparameters
- channels: switches and default weights for {A_rec, A_shear, A_dsa, A_turb}
- diagnostics: summaries of evidence and information criteria from training/fits
- hash: {code_hash, card_hash}

III. PipelineCard

Purpose & scope: standardize a reproducible execution graph.
Required fields:
- pipeline_id, version; graph: nodes/edges
- node[i]: {type, inputs, outputs, image/env, seed, resources}
- acceptance: thresholds mapped to Chapter 12 gates
- exports: {products/, metrics.json, masks/, delta_form.log, repro/}
- provenance: {who, when, where} aligned with {code_hash, data_hash}

IV. Validation, Execution & Release

Data validation (M75-1): enforce schema and Unit/Dim audits; ensure ToA columns state path and measure explicitly.
Pipeline execution (M75-2): fix seed and environment; produce artifacts and metrics.json; failing nodes must return a minimal replayable state.
Acceptance (M75-3): compare against Chapter 12 metrics; produce pass/fail and diffs.
Release (M75-5): pack ArtifactBundle, all three cards, and repro_bundle into the registry; generate indices and retrieval keys.

V. Cross-References within/beyond this Volume

Metrics & gates: Chapter 12 (SpecMAE/LagRMS/PA_RMS/ToAΔ).
Evidence & masks: Chapter 13 (posterior, evidence, masks, falsification_line).
ToA fields: Chapters 7–8 (spectrum/transport mapping); Chapters 10–11 (timebase & path corrections).
Model families & params: Chapters 4–6 (S30/S40/S45) and Chapters 7–8 (S50/S52).

VI. Validation, Criteria & Counterexamples

Positive criteria:
- DatasetCard/ModelCard/PipelineCard pass schema and Unit/Dim checks.
- All metrics.json indicators meet or exceed thresholds.
- Reproduction in an independent environment succeeds with matching hashes.
Negative criteria:
- Dimensional closure fails; ToA not stored in dual form or path not explicit.
- Regression degrades beyond thresholds versus prior release.
- Audit manifest lacks critical {hash/seed/SimCfg} fields.
Contrasts:
- Minimal-change regressions for {data-card only, model-card only, pipeline-card only}.
- Compare ToA {Form A, Form B, A+B} impacts on products and evidence.

VII. Summary & Handoff
This chapter standardizes data–pipeline–benchmark schemas, execution, and release via I75-* / M75-*, ensuring dimensional consistency, verifiable gates, and full-chain reproducibility, aligned with the metrics and evidence systems of Chapters 12–13. Chapter 15 proceeds to “Implementation Bindings & APIs” (I80-*) for external interfaces and acceptance use cases.

V. Figures & Tables (this chapter)

Tab. 14-1 Minimal required fields for the three cards

Card	Required fields (subset)
DatasetCard	dataset_id, version, columns{name,unit,dim}, covariance, masks, data_hash, see
ModelCard	model_id, version, params{name,prior,bounds,unit,dim}, hyper, code_hash, family
PipelineCard	pipeline_id, version, graph{nodes,edges}, env, seed, acceptance, exports

Tab. 14-2 Pipeline node types & fields

type	required	outputs	notes
ingest	uri, schema	staged data	validation/standardization
calibrate	calib, masks	calib data	systematics correction
simulate	SimCfg	products/	see Chapter 12
fit	ModelCard	posterior, evidence	see Chapter 13
validate	thresholds	metrics.json	acceptance gates
export	targets	bundle	release artifacts

Tab. 14-3 Registry keys & audit items

key	example	purpose
sim_id	ASTROACC_GRB_M_v1	global index
code_hash	sha256:…	provenance
data_hash	sha256:…	integrity
rng_state	JSON	reproduction
delta_form	A/B	ToA form flag

Tab. 14-4 Acceptance thresholds (map to Chapter 12)

Metric	Threshold	Gate
SpecMAE	≤ 3%	pass/fail
IndexErr	≤ 0.05	pass/fail
LagRMS	≤ 5%	pass/fail
PA_RMS	≤ 3°	pass/fail
ToAΔ	≤ 0.1 ms	pass/fail

Tab. 14-5 Release-bundle layout

path	content
cards/	DatasetCard/ModelCard/PipelineCard
products/	synthetic & fitted products
metrics.json	metrics & gate results
masks/	dominant energy/time masks
delta_form.log	ToA dual-form records
repro/	environment lock & scripts

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05