Home / Docs-Technical WhitePaper / 45-EFT.WP.Data.Pipeline v1.0
Chapter 16 Machine-readable Schema & Lint
I. Chapter Purpose & Scope
.no Chinese for pipelines, covering structure/type/regex/dependencies/citation anchors/dimensional checks/idempotency & retries/frozen splits & leakage guardrails/minimal security & compliance checks; artifacts are used for pre-release blocking checks and portal auto-validation. Keys use snake_case; cross-volume citations use “Volume vX.Y:Anchor”; math uses backticks with parentheses and Lint ruleset and normative JSON SchemaProvide theII. Normative Artifacts (Release-Critical)
artifacts:
- path: "schema/pipeline.schema.json"
- path: "schema/lint_rules.yaml"
- path: "schema/examples/minimal.yaml"
- path: "schema/examples/full.yaml"
These artifacts must be listed in export_manifest.artifacts[] with sha256; citation anchors follow this volume’s posture.III. Normative JSON Schema (Core Excerpt)
The references[] regex enforces “Volume vX.Y:Anchor”; metrology.units="SI" and check_dim=true are mandatory.IV. Lint Rules (Normative)
version: "v1.0"
rules:
# Structure & versioning
- id: STRUCT.REQUIRED
when: "$"
assert: "has_keys(pipeline, metrology, export_manifest)"
level: error
- id: VERSION.SEMVER
when: "$.pipeline.version"
assert: "matches('^v\\d+\\.\\d+(\\.\\d+)?$')"
level: error
# Topology & contracts
- id: LAYERS.NOT_EMPTY
when: "$.pipeline.layers"
assert: "len(value) > 0"
level: error
- id: EDGES.COMPAT_SCHEMA
when: "$.pipeline.edges[*]"
assert: "schema_compat(edge.from.Σ_out, edge.to.Σ_in)"
level: error
# Sampling & splits
- id: SPLIT.RATIO_SUM
when: "$..stages[?(@.type=='export.splits')].splits"
assert: "abs(train.ratio + validation.ratio + test.ratio - 1) <= 1e-6"
level: error
- id: SPLIT.FREEZE_REQUIRED
when: "$..stages[?(@.type=='export.splits')].policy.freeze_indices"
assert: "value == true"
level: error
- id: LEAKAGE.GUARDS_PRESENT
when: "$..stages[?(@.type=='export.splits')].policy.leakage_guard"
assert: "contains_any(['per-object','per-timewindow','per-scene'])"
level: error
# Validation & DQ
- id: DQ.SCHEMA_REF_REQUIRED
when: "$..stages[?(@.type=='validate.dq')]"
assert: "has_key('schema_ref')"
level: error
- id: DQ.SAMPLE_DEFINED
when: "$..stages[?(@.type=='validate.dq')].dq.sample"
assert: "value.rows > 0 and value.strategy in ['head','random','stratified']"
level: error
# Transform & feature
- id: TF.IDEMPOTENT_REQUIRED
when: "$..stages[?(@.type^='transform.')]"
assert: "idempotent == true"
level: error
- id: FEAT.FS_REQUIRED
when: "$..stages[?(@.type^='feature.')]"
assert: "has_key('feature_space')"
level: error
# Security & compliance minimal checks
- id: SEC.CREDENTIALS_REF
when: "$..stages[?(@.type^='source.')].params"
assert: "has_key('credentials_ref') and not has_key('plain_secret')"
level: error
- id: PRIV.MINIMIZATION_ON
when: "$.privacy.data_minimization"
assert: "value == true"
level: error
# Metrology
- id: METROLOGY.SI_AND_CHECKDIM
when: "$.metrology"
assert: "units == 'SI' and check_dim == true"
level: error
# Citation anchors
- id: REFERENCES.FORMAT
when: "$.export_manifest.references[*]"
assert: "matches('^[^:]+ v\\d+\\.\\d+:[A-Z].+$')"
level: error
Blocking rules include STRUCT.REQUIRED, VERSION.SEMVER, EDGES.COMPAT_SCHEMA, SPLIT.*, TF.IDEMPOTENT_REQUIRED, FEAT.FS_REQUIRED, SEC.CREDENTIALS_REF, METROLOGY.SI_AND_CHECKDIM, REFERENCES.FORMAT.V. Failure Examples & Diagnostics (Excerpt)
fail_examples:
- case: "bad reference format"
input: {export_manifest:{references:["Core.DataSpec:EXPORT"]}}
expect: {rule:"REFERENCES.FORMAT", level:"error",
fix:"Use 'EFT.WP.Core.DataSpec v1.0:EXPORT'"}
- case: "split ratios sum != 1"
input: {stages:[{type:"export.splits", splits:{train:{ratio:0.7}, validation:{ratio:0.2}, test:{ratio:0.2}}}]}
expect: {rule:"SPLIT.RATIO_SUM", level:"error",
fix:"Normalize ratios so they sum to 1±1e-6"}
- case: "no credentials_ref"
input: {stages:[{type:"source.s3", params:{endpoint:"...", plain_secret:"abc"}}]}
expect: {rule:"SEC.CREDENTIALS_REF", level:"error",
fix:"Remove plaintext secret; reference a secrets manager via credentials_ref"}
Lint outputs must include rule/path/message/fix.VI. Minimal Working Example (Validates under Schema & Lint)
pipeline:
id: "eift.ingest-validate-transform-export"
version: "v1.0"
layers:
- name: "ingest"
stages:
- name: "src.s3.pull"
type: "source.s3"
impl: "I16-1.s3_pull"
params: {endpoint:"https://s3.amazonaws.com", bucket_or_db:"eift-data",
prefix_or_table:"raw/2025/09/", query_or_pattern:"*.jsonl",
credentials_ref:"secrets://aws/ingest_ro", format:"json"}
outputs: ["raw_blob"]
idempotent: true
retries: {max:3, backoff:"expo", jitter_ms:200}
timeout_s: 1800
- name: "validate"
stages:
- name: "dq.scan"
type: "validate.dq"
impl: "I16-7.dq_scan"
inputs: ["raw_blob"]
outputs: ["dq_report"]
schema_ref: "contracts/raw_json@v1.2"
dq: {sample:{rows:100000, strategy:"stratified"}, significance:{alpha:0.05},
gates:[{id:"DQ_001", kind:"not_null", cols:["id","ts"], level:"block"}]}
edges:
- {from:"src.s3.pull:raw_blob", to:"dq.scan:raw_blob"}
metrology: {units:"SI", check_dim:true}
export_manifest:
version: "v1.0"
artifacts: [{path:"pipeline.yaml", sha256:"..."}]
references: ["EFT.WP.Core.DataSpec v1.0:EXPORT","EFT.WP.Core.Metrology v1.0:check_dim"]
VII. Coupling with Export Manifest (Normative)
export_manifest:
artifacts:
- {path:"schema/pipeline.schema.json", sha256:"..."}
- {path:"schema/lint_rules.yaml", sha256:"..."}
- {path:"schema/examples/minimal.yaml", sha256:"..."}
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
and must be listed and verifiable; references carry “Volume vX.Y:Anchor”.blockingSchema and Lint areVIII. Validation Interfaces (Implementation Binding Ixx-?; Unified Return)
def validate_pipeline(spec: dict) -> dict: ...
def lint_pipeline(spec: dict, rules: dict) -> dict: ...
def check_units(spec: dict) -> dict: ... # uses Core.Metrology v1.0:check_dim
def verify_references(spec: dict) -> dict: ...# regex + anchor reachability
Return shape: {"ok": bool, "errors":[...], "warnings":[...], "metrics":{...}} for portal/CI.IX. Chapter Compliance Checklist
- pipeline.schema.json and lint_rules.yaml produced and registered in export_manifest with sha256.
- Schema enforces metrology.units="SI" & check_dim=true and the anchor regex in references[]; Lint blocks topology incompatibility, unfrozen splits, missing leakage guardrails, missing idempotency, and plaintext secrets.
- Sampling/splits/distribution aligns with Dataset Cards; feature & I/O contracts and units align with metrology.
- Minimal example validates once under Schema & Lint; validation interfaces integrated and returning the unified structure.
- All citations use “Volume vX.Y:Anchor”; no shortcodes/aliases/missing-version refs.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/