Home / Docs-Technical WhitePaper / 16-EFT.WP.Methods.Cleaning v1.0
Chapter 8 Anomalies, Drift, and Outlier Governance
One-Sentence Goal
Detect, label, and handle anomalies, drift, and outliers on the tau_mono timeline, keep dimensions consistent and causality intact, minimize false alarms and regression risk, and wire decisions into the manifest and the SLO dashboard.
I. Scope & Objects
- Applicable targets
- Dataset D_clean.pre_anom after Chapter 5 time alignment, Chapter 6 path and arrival-time checks, and Chapter 7 missingness and imputation.
- Key fields: ts, tau_mono, ell, both T_arr forms and delta_form, metric fields x, quality and uncertainty q_score, u(x), u_imp, and channel indicators chan/cap/q_len.
- Target artifacts
- Produce D_anom = D_clean.pre_anom ⊕ tags, where tags ∈ {point_anom, contextual_anom, collective_anom, drift_segment, saturation, stuck, spike}.
- Generate report_anom and manifest.anomaly; update quality and publication policy.
II. Terms & Variables (Memory Anchors)
- Anomalies & outliers
- Point anomaly: point_anom (a single observation deviates strongly from baseline).
- Contextual anomaly: contextual_anom (deviation relative to a given RefCond or season).
- Collective anomaly: collective_anom (a segmental regime change).
- Outlier label: outlier, not identical to “anomaly”; treatment depends on policy.
- Drift
- Distribution drift: drift (deviation of P_t(x) from reference P_ref(x)).
- Concept drift: change in E[ y | x ] (in modeling contexts).
- Channel drift: systematic changes in q_len, rho, W_q.
- Baselines & statistics
- Mean and variance: mu, sigma; median and MAD: med, MAD; quantiles Q1/Q3, IQR = Q3 - Q1.
- Indicators: P99, KS, D_KL, PSI.
III. Axioms (P108-*)
- P108-01 Causality and time-base axiom
All detection runs on tau_mono; do not violate non_decreasing(ts) or non_decreasing(ell). - P108-02 Explicit-labeling axiom
Every anomaly, outlier, and drift must be explicitly recorded via tags and fields; silent deletion or implicit correction is forbidden. - P108-03 Dimension-consistency axiom
Criteria and thresholds must pass check_dim, avoiding unit-conversion pseudo-anomalies. - P108-04 Two-form priority axiom
For arrival-time related issues, treat delta_form as a strong signal; delta_form > tol_Tarr directly triggers the arrival_forms assertion. - P108-05 Uncertainty-accompaniment axiom
Labeling and mitigation must update u(x) or weights w_imp and propagate them downstream. - P108-06 Back-pressure safety axiom
In online settings, retries and rate-limits must not deadlock or blow up q_len; detection operators themselves are subject to cap.
IV. Minimal Equations (S108-*)
- S108-01 Z-score and robust Z-score
z = ( x - mu ) / sigma
z_robust = 0.6745 * ( x - med ) / MAD - S108-02 IQR fences
outlier = ( x < Q1 - k * IQR ) ∨ ( x > Q3 + k * IQR ) (typically k ∈ [1.5, 3]) - S108-03 Spike and saturation detection
spike = ( |x_k - x_{k-1}| > thr_grad ) ∧ ( |x_{k+1} - x_k| > thr_grad )
saturation = ( x ∈ {x_min_sat, x_max_sat} ) - S108-04 Chi-square or residual gating (model residuals)
r = y - f(x), anom = ( |r| / u(r) > thr_resid ) - S108-05 Change points and CUSUM
CUSUM^+_k = max( 0 , CUSUM^+_{k-1} + ( x_k - mu_0 - kappa ) )
Drift alarm: CUSUM^+_k > h ∨ CUSUM^-_k > h - S108-06 Distribution-drift metrics
D_KL( P || Q ) = sum_i p_i * ln( p_i / q_i )
KS = sup_x | F_P(x) - F_Q(x) |
PSI = sum_i ( p_i - q_i ) * ln( ( p_i + eps ) / ( q_i + eps ) ) - S108-07 Arrival-time two-form anomaly
arrival_anom = ( delta_form > tol_Tarr ), where
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) | - S108-08 Channel & back-pressure monitoring
rho = lambda / mu (arrival rate over service rate);
SLO_violation = ( P99( latency ) > SLO.p99 )
V. Cleaning Process (M10-8 Anomalies, Drift, and Outliers)
- Baseline and windowing
Choose rolling or seasonal window W; on tau_mono, estimate mu/sigma/med/MAD/Q1/Q3; for distributions, use binning or KDE. - Candidate detection
Run in parallel: z_robust, IQR, spike/saturation, CUSUM/change-point, delta_form, and drift metrics (D_KL/KS/PSI). - Tag fusion and de-noising
Fuse multi-criteria votes or confidences to produce tags; apply morphological closing for minimum duration and minimum spacing. - Mitigation policy
- quarantine, downweight, repair (only for reversible issues such as duplicate frames), pass_through (label then publish).
- Update q_score and w_imp: q_score' = q_score * g(tags).
- Contracting and rollback
On contract failures, degrade publication (summary or delayed), switch to robust baselines, or trigger manual audit. - Persistence and traceability
Write manifest.anomaly = { methods, params, windows, thresholds, drift_ref, vote_rule, latency_p99, FP/FN_est }; update signature and hash_sha256(blob).
VI. Contracts & Assertions (Chapter Must-Pass Items)
- Anomaly labels exist: exists tags and they cover target fields.
- No silent drops: sum( flags.silent_drop ) = 0.
- Two-form hard guard: delta_form ≤ tol_Tarr; otherwise trigger arrival_anom and the publication gate.
- Dimension conservation: check_dim( thresholds ) = true.
- Controlled latency: P99( detect_latency ) ≤ SLO.detect_p99.
- Drift guard: drift_metric ≤ tol_drift or enter a degraded path.
- Dashboard completeness: exists(report_anom) and manifest.anomaly has all fields.
VII. Implementation Binding (I10-8)
- Interface prototypes
- detect_outlier(ds, method, fields) -> tags
- detect_drift(ds_ref, ds_cur, metrics) -> drift_report
- fuse_anomaly_tags(tags_list, rule) -> tags_fused
- mitigate_anomaly(ds, tags, policy) -> ds', effects
- audit_anomaly(ds, tags) -> report_anom
- Preconditions
Chapter 4 units and dimensions consistent; Chapter 5 time-base aligned; Chapter 6 path and arrival time compliant; Chapter 7 explicit missingness and constrained imputation. - Invariants & postconditions
Do not alter the monotonicity of ts/ell; all actions are replayable via the manifest; q_score and uncertainties are updated in sync. - Failure semantics
E_DRIFT_REF_MISSING, E_DIM_THRESHOLD_INVALID, E_LATENCY_SLO_BREACH, E_RULE_CONFLICT.
VIII. Cross-References
- Arrival time & path (delta_form, gamma(ell)): Chapter 6.
- Missingness & imputation (downweight, w_imp, u_imp): Chapter 7.
- Contracting and release gate: Chapter 10.
- Streaming back-pressure and execution-graph coordination: Chapter 11.
- Quality scoring and audit dashboard: Chapter 14.
IX. Quality Metrics & Risk Control
- Core indicators
- Detection latency: detect_latency_p50/p95/p99
- Label intensity: anom_rate = mean( 1_{tags ≠ ∅} )
- Drift amplitude: D_KL, KS, PSI
- Impact surface: affected_share = fraction_of_downstream_ops_using_tagged
- False-positive/false-negative estimates: FP_hat, FN_hat (via gold sets or post-hoc checks)
- Channel health: rho, W_q, drop_rate, retry_rate
- Alert suggestions
- If anom_rate > tol_anom → enable strong down-weighting or quarantine.
- If D_KL > tol_kl ∨ KS > tol_ks → switch to robust baselines or roll back a version.
- If detect_latency_p99 > SLO.detect_p99 → reduce detector complexity or scale out.
X. Boundaries & Special Cases
- Duplicate frames and stuck sequences
stuck = ( x_k = x_{k-1} = ... = const ) persisting beyond a threshold → tag stuck and quarantine. - Saturation and range overflow
Treat saturation as an anomaly rather than a mere outlier; do not impute over it; trace the metrology chain. - Arrival-time anomalies
If arrival_anom, revisit Chapter 6 sources of n_eff and c_ref before deciding publication strategy. - Seasonality and context
Build segmented baselines within RefCond or seasonal cycles to avoid mis-flagging seasonality as drift.
XI. Audit & Panel Fields
- Minimal panel
anom_rate, drift_metric.{D_KL,KS,PSI}, detect_latency_p99, delta_form_violations, spike_count, saturation_count, stuck_segments, quarantine_share, downweight_share - Traceability fields
methods, params, windows, thresholds, vote_rule, drift_ref_window, seed, version, signature, hash_sha256(blob).
Summary
This chapter provides a governance framework for anomalies, drift, and outliers on a unified time base and dimension convention: robust statistics and change-point methods for multi-channel detection, delta_form as a strong arrival-time signal, uncertainty-aware labeling and mitigation, and seamless integration with the manifest, SLO dashboards, and back-pressure guards. The result is a publishable, auditable quality baseline that holds under both sudden shocks and slow drift.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/