HomeDocs-Technical WhitePaper16-EFT.WP.Methods.Cleaning v1.0

Appendix A Interface Reference (I10 Full Set)


One-Sentence Goal
Summarize the I10-* interfaces in this volume—their signatures, parameters, returns, and invariants—and define mappings and call-sequence conventions relative to the Core volumes.


I. Unified Conventions & Naming

  1. Notation & types
    • ds: dataset (sequence of records)
    • rec: a single record
    • SRef: standard schema registry & constraints
    • policy.*: policy bundle (units, time, anomalies, SLOs, etc.)
    • tests.*: contract test suites
    • tags: set of labels and alerts
    • TS.sli.*: service-layer indicators
    • manifest: manifest and signature artifacts
    • TraceID: end-to-end trace identifier
  2. Preset invariants (apply to all interfaces)
    • Units & dimensions: every field entering an expression carries unit(x) and dim(x); check_dim(expr) = 0.
    • Time base: compute on tau_mono, publish on ts; sync metadata includes offset/skew/J.
    • Arrival-time two forms: whenever T_arr is computed, produce both forms in parallel and satisfy delta_form ≤ tol_Tarr.
    • Path: non_decreasing(ell); L_gamma = ( ∫_gamma 1 d ell ) is computable.
    • Traceability: outputs admit hash_sha256(blob) and are verifiable by signature.

II. Core Data Structures (Summary)


III. I10-3 Standard Inputs & Schema Binding

  1. register_schema(SRef) -> registry_id
    • Effect: register standard schema with aliases, keys, and a contract subset.
    • Invariant: unique(pk); no conflicts in alias_map.
  2. standardize_names(ds, registry) -> ds', report
    • Role: unify field names per alias_map and complete the minimal term set.
    • Report keys: added, renamed, dropped, missing_required.
  3. validate_dataset(ds, SRef, strict) -> assert_report
    • Checks: field presence, types, pk uniqueness, resolvable foreign_key.
    • Pass condition: assert_report.fail = 0.

IV. I10-4 Units, Dimensions & Metrological Harmonization

  1. repair_units(ds, policy.units) -> ds', report
    • Role: unit normalization, dimensional checks, numeric conversions.
    • Invariant: for any expression y - f(x), check_dim( y - f(x) ) = 0.
    • Report keys: converted, coerced, rejected.
  2. check_dim_expr(ds, exprs[]) -> report
    Use: batch-validate dimensional consistency of key expressions.

V. I10-5 Time Axis & Synchronization Cleansing

  1. align_timebase(ds, sync_ref) -> ds', timing_report
    • Role: establish tau_mono ↔ ts mapping; estimate and record offset/skew/J.
    • Invariant: non_decreasing(tau_mono).
    • Report keys: offset, skew, J, u(offset), dropped_out_of_window.
  2. resample_window(ds, Delta_t, mode) -> ds'
    Use: windowed alignment and aggregation while preserving time semantics.

VI. I10-6 Path & Arrival-Time Cleansing

  1. enforce_arrival_time_convention(ds, c_ref, tol_Tarr) -> ds', delta_report
    • Computations:
      1. T_arr_1 = ( 1 / c_ref ) * ( ∫_{gamma(ell)} n_eff d ell )
      2. T_arr_2 = ( ∫_{gamma(ell)} ( n_eff / c_ref ) d ell )
      3. delta_form = | T_arr_1 - T_arr_2 |
    • Assertion: delta_form ≤ tol_Tarr; persist both forms and delta_form.
    • Report keys: count, violations, P99(delta_form).
  2. check_path_monotonicity(ds, ell_field) -> report
    Checks: non_decreasing(ell) and computability of L_gamma.

VII. I10-7 Missingness, Mask & Imputation Governance

  1. handle_missing(ds, strategy.missing) -> ds', manifest_missing
    • Behavior: generate mask m ∈ {0,1}; drop/impute per rules; record RefCond and uncertainty.
    • Invariant: imputed fields carry explicit provenance and method.
    • Manifest keys: mask_coverage, impute_method, u(imputed).
  2. mark_quality(ds, rules) -> ds'
    Use: create or update the components that form q_score ∈ [0,1] (coverage, integrity, timeliness).

VIII. I10-8 Anomalies, Drift & Outlier Governance

  1. detect_outlier(ds, method, fields, Delta_t, params) -> tags, report
    • Method examples: zscore, MAD, IQR, robust_lof.
    • Report keys: rate, by_field, suppressed.
  2. monitor_drift(ds, ref, method, Delta_t) -> drift, report
    • Use: distribution drift detection (e.g., KS, PSI, ADWIN, CUSUM).
    • Invariant: on threshold breach, do not modify source data; only label and alert.

IX. I10-9 Deduplication, Association & Referential Integrity

  1. deduplicate(ds, keys, semantics, tiebreaker) -> ds', dup_report
    • Semantics examples: exact, time_window, fuzzy.
    • Invariant: unique(keys) holds post-op.
    • Report keys: groups, resolved, conflicts.
  2. drop_orphan(ds, foreign_key) -> ds', fk_report
    • Role: remove foreign_key orphans and count them.
    • Invariant: foreign_key relations are fully resolvable.

X. I10-10 Compliance, Contracts & Release Freeze

  1. assert_contract(ds, tests.*) -> assert_report, pass
    • Coverage: unique, monotone, range, foreign_key, dim, arrival_forms.
    • Invariant: pass = ( violations = 0 ).
  2. export_manifest(ds, context) -> manifest
    Contents: version, SRef, timing, units, arrival_forms, asserts, hash, signature.
  3. freeze_release(ds, tag) -> manifest
    Role: freeze artifacts, sign, tag, and produce an auditable snapshot.
  4. emit_audit(event) -> audit_head
    Use: extend the audit hash chain; event includes who/when/what/hash_prev.

XI. I10-11 Streaming Cleansing & Backpressure Nodes


XII. I10-12 Environmental Correction & Arrival-Time Harmonization

apply_env_correction(ds, RefCond, model) -> ds', corr_report

XIII. I10-13 Density, Probability & Normalization Cleansing


XIV. I10-14 Quality Scoring, SLO & Audit


XV. I10-15 Composite Use-Case Interfaces


XVI. Cross-Interface Invariants & Assertion Checklist

  1. Keys & path
    • unique(pk); resolvable foreign_key; non_decreasing(ts|ell).
    • L_gamma = ( ∫_gamma 1 d ell ) is computable.
  2. Units & dimensions
    unit(t_arr) = "s", dim(t_arr) = "[T]"; check_dim( y - f(x) ) = 0.
  3. Arrival-time two forms
    delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |; delta_form ≤ tol_Tarr.
  4. Quality & SLO
    P99(TS.sli.err_rate) ≤ E_target; P99(TS.sli.lat_ms) ≤ L_target; error-budget burn is traceable.
  5. Audit & signatures
    Every release carries hash_sha256(blob) and signature, chain-verifiable.

XVII. Mapping to Core Volumes


XVIII. Versioning & Compatibility Strategy


XIX. Typical Call Sequences (Summary)


Summary
This appendix provides unified signatures, I/O, report keys, and cross-interface invariants for I10-*, and marks each mapping to the Core volumes. With these, practitioners can reuse interfaces across batch, online, and streaming scenarios—ensuring the end-to-end path from schema binding to release freeze remains computable, auditable, and revertible.


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/