HomeDocs-Technical WhitePaper14-EFT.WP.Methods.Inference v1.0

Chapter 4: Data & Feature Interfaces


I. Scope & Objectives

  1. Establish a unified convention for inference data and feature interfaces, covering input modalities, window alignment, feature processing, dimensional checks, lineage, and drift monitoring. The goal is training–inference consistency and cross-domain portability.
  2. Target outputs
    • Field and version requirements for FeatureCard and InferPipelineCard.
    • Minimal equations S42-* and postulates P41-* that govern feature processing (additions introduced in this chapter).
    • Offline/online feature-parity metric delta_fp and the gate gate.inf.feature.
  3. Intended readers: feature engineering, data platform, model, and runtime teams. Pass criteria chain to the audit trail with Chapter 3 & Chapter 6 via delta_offon, ECE/NLL.

II. Terms & Symbols

  1. Data & time
    • x_raw (raw input stream), x_feat (processed feature vector), y (label), id (entity/session id), ts_event (event time), ts_proc (processing time), window = [t0, t1], lookback, stride, lag_k.
    • tau_mono, ts, alpha, beta (time-base mapping).
  2. Processing & checks
    z (standardized value), mu, sigma, epsilon, winsor(a,b), clip(a,b), impute, onehot, emb(W), norm(•), check_dim(expr).
  3. Lineage & signatures
    hash(•), fingerprint, EnvLock, anchor, FeatureCard, Lineage.
  4. Parity & drift
    delta_fp = ( norm( x_feat_off - x_feat_on ) / norm( x_feat_off ) ), R_infer = 1 - delta_offon, D_KL(p || q) = Σ p_i * ln( p_i / q_i ), W1 (first-order Wasserstein).

III. Postulates & Minimal Equations

  1. P41-4 Reproducible features (locked environment & spec)
    With fixed FeatureCard, Graph(theta), and EnvLock, given the same anchor and input x_raw, offline and online must satisfy x_feat_off = x_feat_on. If randomized operators exist, nondet_guard = true still yields equality.
  2. P41-5 No-leakage postulate (lookahead = 0)
    Any feature may depend only on information available within window = [t0, t1] with ts_event <= t1. Disallow lead_k (k > 0) and any use of y or its proxies as inputs.
  3. S42-5 Standardization & robustness
    z = ( x - mu ) / ( sigma + epsilon ); for the robust form, mu = median(x), sigma = mad(x). Optional winsor(a,b) or clip(a,b) may limit outlier influence.
  4. S42-6 Window aggregation & measures
    • Mean: agg_mean = ( 1 / |W| ) * Σ_{t ∈ W} x(t).
    • Exponential moving average: agg_ema = ( Σ w_t * x(t) ) / ( Σ w_t ), with w_t = exp( - lambda * ( t1 - t ) ).
    • Frequency-domain energy (aligned with spectral convention): var( x ) ≈ ( ∫ S_xx(f) df ). Always declare the window and ENBW.
  5. S42-7 Feature parity & drift metrics
    • Sample-level parity: delta_fp = ( norm( x_feat_off - x_feat_on ) / norm( x_feat_off ) ).
    • Distribution drift: D_KL( p_off || p_on ), W1( p_off, p_on ), and moment gaps | mu_off - mu_on |, | sigma_off - sigma_on |.
    • Example gates: delta_fp ≤ tau_fp, D_KL ≤ tau_kl, W1 ≤ tau_w1.
  6. S42-8 Processing signature & traceability
    fingerprint = hash( FeatureCard || code_rev || anchor || {alpha,beta} || schema ). Every feature release publishes fingerprint and Lineage.

IV. Data & Manifest Conventions


V. Algorithms & Implementation Bindings

  1. New prototypes
    • I40-11 build_features(stream:any, card:dict) -> {x_feat:any, qc:dict}
    • I40-12 validate_features(x_feat:any, card:dict) -> {pass:bool, report:dict}
    • I40-13 align_windows(records:any, alpha:float, beta:float, spec:dict) -> records
    • I40-14 compare_feature_parity(off:any, on:any, policy:dict) -> {delta_fp:float, pass:bool}
    • I40-15 monitor_feature_drift(dist_off:any, dist_on:any, metrics:list) -> DriftReport
  2. Pseudocode (abridged)
    • Align: records ← align_timebase(records, {alpha,beta}).
    • Window: slice by window_spec and construct context/history.
    • Process: execute impute → transform → aggregate → standardize in the ops order.
    • Validate: validate_features runs check_dim/range/missing and leakage scans.
    • Trace: generate fingerprint and Lineage.

VI. Metrology Flows & Run Diagram


VII. Verification & Test Matrix

  1. Minimum required cases
    • Unit consistency: random sampling through check_dim(expr); assert all pass.
    • Leakage guard: for tasks with y, run feature scans; assert no lead_k (k>0) and no functional dependence on y.
    • Robust standardization: inject outliers; compare z from mean-variance vs. median-MAD.
    • Window endpoints: at t1 boundaries, verify include/exclude policy, timezone, and DST consistency.
    • Offline/online parity: compute delta_fp; assert delta_fp ≤ tau_fp.
    • Spectral convention: for time-series features, verify var( x ) ≈ ( ∫ S_xx(f) df ) and that ENBW configuration matches.
  2. Boundary & extreme scenarios
    Late/out-of-order events, batch replays, empty windows, full missingness, ultra-low cardinality classes, cold-start for emb(W), cross-device time stretching with beta ≠ 1.

VIII. Cross-References & Dependencies


IX. Risks, Limitations & Open Questions


X. Deliverables & Versioning

  1. Deliverables
    • FeatureCard/*.json (containing window_spec/ops/params/unit/timebase).
    • FeatureLineage.md (sources, lineage, conventions, fingerprint).
    • FeatureParityReport (delta_fp, D_KL, W1, mu/sigma).
    • QCReport (missingness rates, range violations, dimensional audits).
  2. Versioning policy
    • Any change to ops, window_spec, impute_policy, {mu,sigma}, {alpha,beta}, or external schema bumps the minor version and triggers full Mx-47 parity regression.
    • Documentation/visual updates alone do not trigger regression, but must roll the fingerprint and append change records (see Appendix C).

Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/