HomeDocs-Technical WhitePaper45-EFT.WP.Data.Pipeline v1.0

Chapter 8 Feature Pipelines & Reuse


I. Chapter Purpose & Scope

specifications: feature extraction/aggregation/alignment, dictionary & embedding management, materialization & caching, cross-task/multi-modal reuse, versioning & dependency mapping; ensure consistency with data contracts, Model Card feature space & task I/O, the Metrology chapter, and citation anchors.feature pipelineFix

II. Terminology & Dependencies


III. Fields & Structure (Normative)

stage:

name: "<feat.map|feat.aggregate|feat.join|feat.encode|feat.embed|feat.materialize>"

type: "feature.<op>"

impl: "I16-4.<impl_id>"

inputs: ["<Σ_in>"]

outputs: ["<Σ_out>"]

params:

key: ["<entity_id>", "<ts?>"]

point_in_time:

enabled: true

lookback: "PT7D|P30D|N/A"

tolerance: "PT5M"

dict_ref: "dicts/<name>@vX.Y"

embed:

store: "faiss|annoy|milvus|custom"

dim: 768

metric: "cosine|l2"

index_ref: "embeddings/<name>@vX.Y"

aggregate:

window: "PT1H|P1D"

funcs: ["mean","max","count","std"]

fillna: {"method":"pad|zero|drop"}

join:

on: ["<entity_id>","<ts?>"]

how: "left|inner|asof"

materialize:

mode: "none|cache|persist"

cache: {ttl: "P7D", max_gb: 128}

idempotent: true

schema_ref: "contracts/feat_<name>@vX.Y"

feature_space:

type: "<tabular|sequence|image|audio_spec|embedding>"

shape: "<(…)>"

dtype: "<float32|int32|...>"

normalization: "<zscore|minmax|robust|unit-norm|none>"


IV. Feature Operators & Postures


V. Reuse & Dependency Mapping


VI. Consistency & Point-in-Time (PIT) Alignment


VII. Dictionary & Embedding Management


VIII. Metrology & Units (SI)

  1. Performance: QPS (1/s), T_inf (ms {p50,p95,p99}), ρ (—); bandwidth net_mbps; storage/index volume size_bytes.
  2. metrology:{units:"SI", check_dim:true} is mandatory; normalize units first before composition/aggregation.
  3. For path-quantity features (e.g., T_arr), register delta_form, path="gamma(ell)", measure="d ell", use one of the equivalences below, and pass check_dim:
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ).

IX. Machine-Readable Fragment (Drop-in)

layers:

- name: "feature"

stages:

- name: "feat.map.stats"

type: "feature.map"

impl: "I16-4.feature_map"

inputs: ["std_rows"]

outputs: ["feat_rows"]

params:

key: ["entity_id","ts"]

point_in_time: {enabled:true, lookback:"P30D", tolerance:"PT5M"}

aggregate: {window:"P1D", funcs:["mean","std","count"], fillna:{method:"pad"}}

idempotent: true

schema_ref: "contracts/feat_stats@v1.1"

feature_space: {type:"tabular", shape:"(N,D)", dtype:"float32", normalization:"zscore"}

- name: "feat.encode.cat"

type: "feature.encode"

impl: "I16-4.encode"

inputs: ["feat_rows"]

outputs: ["feat_enc"]

params:

dict_ref: "dicts/category_voc@v2.0"

encode: {vocab_ref:"dicts/category_voc@v2.0", unk:"<UNK>", pad:"<PAD>"}

idempotent: true

schema_ref: "contracts/feat_enc@v1.0"

- name: "feat.materialize"

type: "feature.materialize"

impl: "I16-4.materialize"

inputs: ["feat_enc"]

outputs: ["feat_pkg"]

params:

materialize: {mode:"cache", cache:{ttl:"P7D", max_gb:256}}

idempotent: true

schema_ref: "contracts/feat_pkg@v1.0"


X. Lint Rules (Excerpt, Normative)

lint_rules:

- id: FEAT.FS_REQUIRED

when: "$.layers[*].stages[?(@.type^='feature.')]"

assert: "has_key('feature_space')"

level: error

- id: FEAT.DICT_VERSIONED

when: "$.layers[*].stages[?(@.type=='feature.encode')].params.dict_ref"

assert: "matches('^dicts/[a-z0-9_\\-]+@v\\d+\\.\\d+$')"

level: error

- id: FEAT.PIT_PARAMS

when: "$.layers[*].stages[*].params.point_in_time"

assert: "value.enabled == true -> (has_key('lookback') and has_key('tolerance'))"

level: error

- id: FEAT.MATERIALIZE_POLICY

when: "$.layers[*].stages[?(@.type=='feature.materialize')].params.materialize"

assert: "value.mode in ['none','cache','persist']"

level: error

- id: FEAT.UNITS_CHECKDIM

when: "$.pipeline.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

- id: FEAT.LEAKAGE_GUARDS_FOR_TRAIN_EXPORT

when: "$.layers[*].stages[*].outputs"

assert: "produces_train_eval(outputs) -> has_leakage_guards()"

level: error


XI. Export Manifest & Audit

export_manifest:

version: "v1.0"

artifacts:

- {path:"features/feat_view.yaml", sha256:"..."}

- {path:"features/dict_category_v2.hash", sha256:"..."}

- {path:"features/feat_pkg.manifest.json", sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.ModelCards v1.0:Ch.6"

- "EFT.WP.Data.ModelCards v1.0:Ch.9"


XII. Chapter Compliance Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/