05-EFT.WP.Core.Errors v1.0 | Appendix B — Quick Reference for Robust Losses and Weight Functions

Home ／ Docs-Technical WhitePaper ／ 05-EFT.WP.Core.Errors v1.0

Appendix B — Quick Reference for Robust Losses and Weight Functions

I. Usage Conventions and Notation

Residual definition: r def= y - f(x; theta); pointwise residual e_i def= r_i.
Scale and standardization: s is the robust scale, default s approx 1.4826 * MAD(r); standardized residual t_i def= e_i / s.
Weight matrix and weighted criterion: R = diag(w_i), chi2 = r^T R r = sum_i w_i * e_i^2.
Robust objective: min_theta sum_i rho( t_i ; hyper ); influence function psi(t) = d rho / d t; weight function w(t) = psi(t) / t (when t != 0, take the limit at t = 0).
IRLS update skeleton (aligned with I50-2)
- Compute t_i = e_i / s; from the chosen loss obtain w_i = w(t_i; hyper).
- Solve weighted least squares with R = diag(w_i) to update theta.
- If adaptive scale is enabled, re-estimate s, e.g., s_new = 1.4826 * MAD( r_new ); iterate to convergence.

II. Common Loss Families (rho / psi / w) — Quick Look

Convention: all use t = e / s as the standardized residual; c > 0 is a tuning constant (e.g., Huber, Tukey, Cauchy, Fair, Geman–McClure, Andrews); nu > 0 is the degrees of freedom for StudentT(nu).
Output interface alignment: loss_rho(kind, hyper) and psi_weight(kind, hyper), where hyper includes the relevant keys among {"c":..., "nu":...}.
L2 (Quadratic)
- rho(t) = 0.5 * t^2
- psi(t) = t
- w(t) = 1
- Traits: high efficiency, weak outlier resistance, no down-weighting.
L1 (Absolute)
- rho(t) = |t|
- psi(t) = sign(t) (undefined at t = 0, take 0)
- w(t) = 1 / |t| (implement numerically as 1 / max(|t|, eps))
- Traits: robust to spikes, solutions tend to sparse residuals; less efficient than L2 under small noise.
Huber (c)
- rho(t) = 0.5 * t^2 if |t| <= c; else rho(t) = c * ( |t| - 0.5 * c )
- psi(t) = t if |t| <= c; else psi(t) = c * sign(t)
- w(t) = 1 if |t| <= c; else w(t) = c / |t|
- Traits: transitional—balances efficiency and robustness; commonly c ~ 1.345 (≈95% efficiency under Normal).
Tukey (Bisquare, c)
- rho(t) = (c^2 / 6) * ( 1 - ( 1 - (t^2 / c^2) )^3 ) if |t| <= c; else rho(t) = c^2 / 6
- psi(t) = t * ( 1 - (t^2 / c^2) )^2 if |t| <= c; else psi(t) = 0
- w(t) = ( 1 - (t^2 / c^2) )^2 if |t| <= c; else w(t) = 0
- Traits: redescending, strongly suppresses outliers; commonly c ~ 4.685 (95% efficiency under Normal).
Cauchy (c)
- rho(t) = (c^2 / 2) * ln( 1 + (t^2 / c^2) )
- psi(t) = t / ( 1 + (t^2 / c^2) )
- w(t) = 1 / ( 1 + (t^2 / c^2) )
- Traits: heavy-tailed, soft down-weighting, smooth and differentiable.
StudentT (nu)
- rho(t) = 0.5 * (nu + 1) * ln( 1 + t^2 / nu )
- psi(t) = ( (nu + 1) * t ) / ( nu + t^2 )
- w(t) = (nu + 1) / ( nu + t^2 )
- Traits: tail heaviness controlled by nu; smaller nu is more robust; nu -> ∞ approaches L2.
Fair (c)
- rho(t) = c^2 * ( |t|/c - ln( 1 + |t|/c ) )
- psi(t) = t / ( 1 + |t|/c )
- w(t) = 1 / ( 1 + |t|/c )
- Traits: smooth transition; gentler down-weighting than Huber.
Geman–McClure (c)
- rho(t) = 0.5 * ( t^2 / ( 1 + t^2 / c^2 ) )
- psi(t) = t / ( 1 + t^2 / c^2 )^2
- w(t) = 1 / ( 1 + t^2 / c^2 )^2
- Traits: redescending, strong suppression of large outliers; sensitive to initialization.
Andrews (Sine, c)
- rho(t) = (c^2 / 2) * ( 1 - cos( t / c ) ) if |t| <= pi * c; else rho(t) = c^2 / 2
- psi(t) = sin( t / c ) if |t| <= pi * c; else psi(t) = 0
- w(t) = sin( t / c ) / ( t / c ) if |t| <= pi * c; else w(t) = 0
- Traits: redescending, smooth but truncated at pi c.

III. Choosing Parameters and Scale

Scale estimation s
- Baseline: s0 = 1.4826 * MAD(r); during iterations use s_new = 1.4826 * MAD( r_new ) or adjust via chi2/dof.
- With significant weighting or missingness: correct MAD jointly with mask m ∈ {0,1} and weights w.
Typical tuning constants (≈95% efficiency under Normal)
- Huber: c ~ 1.345; Tukey: c ~ 4.685; Cauchy: c ~ 2.385; Fair: c ~ 1.399.
- StudentT: nu in [3, 8] is common; nu = 4 offers stronger tail robustness.
Selection guidance
- Few moderate outliers: Huber.
- Clear heavy tails or strong outliers: Tukey / Cauchy / StudentT(nu<=5).
- Need smooth derivatives with robustness: Cauchy / Fair.
- Massive, far-out outliers: Tukey / Geman–McClure / Andrews (mind initialization).

IV. IRLS Implementation Details (aligned with I50-2)

Weight update: w_i <- w( t_i ; hyper ), t_i = e_i / s.
Jacobian for linearized models: J = ∂f/∂theta; one-step update solves
J^T R J * delta_theta = J^T R r.
Convergence: ||delta_theta|| / max(1, ||theta||) < tol and |chi2_new - chi2_old| / max(1, chi2_old) < tol.
Numerical safeguards: for L1/Fair near |t| < eps, use linear/quadratic expansions; for redescending losses (Tukey, Andrews, Geman–McClure), initialize with an outlier mask.

V. Linkage to Error Propagation and Outlier Screening

With I50-3: if zscore_detect or hampel_filter flags mask_outlier_i = 1, map it to w_i = 0 or lower c one notch for soft shielding.
With I50-4: after robust fitting, adjust input covariance Cov_x from the empirical variance of weighted standardized residuals r_bar = r / s, and propagate via Cov_y approx J * Cov_x * J^T.
Quality metrics: include chi2/dof, r_bar_max, pass_rate in reports and drift monitoring.

VI. Robust Treatment of Arrival-Time Error T_arr (Cross-Volume Anchor)

Observation model: r_T def= T_meas - ( ∫_gamma ( n_eff / c_ref ) d ell )。
Robust fitting: use rho_T( r_T / s ; hyper ) as the loss per arrival; weights w_T = w( r_T / s ) enter R.
Two-form consistency check:
- Constant-factored: T_arr = ( 1 / c_ref ) * ( ∫_gamma n_eff d ell )
- General form: T_arr = ( ∫_gamma ( n_eff / c_ref ) d ell )
- Record delta = |T_arr(form-1) - T_arr(form-2)| in reports; if delta exceeds threshold, trigger E.T_ARR.CONSISTENCY.DUAL_FORM_MISMATCH and recommend enforce_arrival_time_convention().

VII. Quick Selection Checklist (Implementation & Choice)

kind="L2": high efficiency; w(t)=1; no robustness.
kind="L1": strong outlier resistance; non-smooth; IRLS needs eps.
kind="Huber", c~1.345: balanced; industrial default.
kind="Tukey", c~4.685: redescending; strong suppression of distant outliers.
kind="Cauchy", c~2.385: smooth heavy-tail; numerically stable.
kind="StudentT", nu~4: models heavy tails; nu can be learned.
kind="Fair", c~1.399: soft down-weight; smooth.
kind="GemanMcClure", c: strong redescending; sensitive to initialization.
kind="Andrews", c: redescending; truncated at pi*c.

VIII. Reporting and Reproducibility Fields (aligned with Appendix A)

loss_kind, hyper={"c":..., "nu":...}, s, r_bar_max, chi2, pass_rate.
If path integrals are involved: record gamma_spec, L_gamma, c_ref_version, refcond_id.
Failure-code mappings: e.g., E.MODEL.FIT.NON_CONVERGENCE, E.DATA_QUALITY.OUTLIER.DETECTED, E.T_ARR.CONSISTENCY.DUAL_FORM_MISMATCH.

IX. Numerical and Implementation Notes

Smoothness: L1/Huber require subgradients or smoothing at the kink; redescending losses (Tukey/Andrews/Geman–McClure) may yield rank deficiency where w=0—use regularization or trust-region methods.
Overflow & degeneration: for large |t|, prefer the w(t) form to avoid rho overflow; for very small |t|, stabilize w with Taylor expansion.
Stopping: monitor both parameter step size and relative objective change to avoid stalling in weight oscillation zones.

X. Minimal Operational Pattern with I50-**

loss_rho(e, "Huber", {"c":1.345, "s":s}) → returns scalar loss for monitoring and visualization.
psi_weight(e, "Tukey", {"c":4.685, "s":s}) → returns per-point w_i; use R = diag(w) in the solver.
compute_residual(model, data, params) → get e; mad_scale(e) → get s; iterate w → theta → s to convergence.

XI. Reference Selection Flow (Compact Decision Tree)

Noise approximately Normal and outliers rare: choose L2 or Huber(c=1.345).
Moderate outlier rate (<10%): choose Huber or Cauchy(c≈2.4).
Heavy tails / high outlier rate: choose Tukey(c=4.685) or StudentT(nu∈[3,5]).
Need smooth derivatives with soft down-weighting: choose Fair(c≈1.4) or Cauchy.
Extreme, far-out outliers and truncation acceptable: choose Tukey / Geman–McClure / Andrews with mask-based initialization.

Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published： 2025-11-11｜Current version：v5.1
License link：https://creativecommons.org/licenses/by/4.0/