HomeDocs-Technical WhitePaper46-EFT.WP.Data.Benchmarks v1.0

Chapter 12 Robustness, Shift & Adversarial


I. Chapter Purpose & Scope

evaluation in benchmarks: shift types and severity scales, adversarial threat models and parameters, evaluation protocol and thresholds, reporting format and statistical significance, and linkage with scoring/ranking/gates; ensure consistency with task definitions, metric system, evaluation protocol, metrology, and citation anchors.adversarial, and distribution shift, robustnessFix specifications for

II. Terminology & Dependencies

  1. Terms: synthetic_shift, natural_shift, severity, Δ_rel (relative drop), adv.threat_model (whitebox|blackbox|transfer), ‖δ‖_p ≤ ε, attack_steps/restarts/targeted, robust_accuracy, auc_robust.
  2. Dependencies: metrics & units (Ch.6), evaluation protocol (Ch.7), runtime environment (Ch.10), scoring & gates (Ch.8), units & dimensions (Core.Metrology v1.0:check_dim).
  3. Math & symbols: wrap inline symbols; any division/integral/composite operator must use parentheses; for T_arr use
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
      declaring gamma(ell) and d ell. No Chinese in formulas/symbols/definitions.

III. Fields & Structure (Normative)

robustness:

shift_tests:

- {name:"snr_drop", severity:[3,6,9], unit:"dB", policy:"additive-noise"}

- {name:"time_jitter",ms:[5,10,20], policy:"shuffle-window"}

- {name:"spec_notch", bands:[["0.3","0.5"],["0.6","0.7"]], unit:"fraction"}

natural_shifts:

axes: ["device","region","season","domain","locale"]

splits: ["val","test"]

adversarial:

enabled: false

threat_model: "whitebox|blackbox|transfer"

norm: "Linf|L2|L1"

epsilon: 0.01

steps: 10

restarts: 1

targeted: false

metrics:

primary: ["Δ_rel","acc_robust","auc_robust"]

curves: ["acc-vs-ε","acc-vs-SNR","acc-vs-mask"]

thresholds:

drop_rel_max: 0.10

acc_robust_min: 0.80

ece_max_under_shift: 0.05

reporting:

table_axes: ["shift","severity","metric"]

include_ci: true

significance: {method:"bootstrap", B:10000, alpha:0.05}

online_consistency:

shadow_mode: true

window: "7d"

drift_monitors: ["drift_kl","psi"]

alert_rules:

- {name:"robust_drop", rule:"Δ_rel>0.10 for 60m", severity:"high"}


IV. Shift Types & Severity Scales

  1. Synthetic shifts:
    • snr_drop: additive noise with severity in dB; declare noise type (Gaussian/colored), random seed, and injection point (pre/post normalization).
    • time_jitter: window reshuffle/jitter; specify ms window and boundary handling.
    • spec_notch: spectral band notches; declare normalized band ranges and mask policy (zero/median).
  2. Natural shifts: axes device/region/season/domain/locale; report coverage and sample counts and check consistency with Dataset Card coverage.

V. Adversarial Evaluation (Threat Models & Parameters)


VI. Metrics, Thresholds & Coupling


VII. Statistics & Reporting


VIII. Metrology & Units (SI)


IX. Machine-Readable Fragment (Drop-in)

robustness:

shift_tests:

- {name:"snr_drop", severity:[3,6,9], unit:"dB", policy:"additive-noise"}

- {name:"time_jitter", ms:[5,10,20], policy:"shuffle-window"}

- {name:"spec_notch", bands:[["0.3","0.5"],["0.6","0.7"]], unit:"fraction"}

natural_shifts: {axes:["device","region"], splits:["val","test"]}

adversarial: {enabled:false, threat_model:"whitebox", norm:"Linf", epsilon:0.01, steps:10, restarts:1, targeted:false}

metrics: {primary:["Δ_rel","acc_robust"], curves:["acc-vs-ε","acc-vs-SNR"]}

thresholds: {drop_rel_max:0.10, acc_robust_min:0.80, ece_max_under_shift:0.05}

reporting: {table_axes:["shift","severity","metric"], include_ci:true, significance:{method:"bootstrap", B:10000, alpha:0.05}}

online_consistency:

shadow_mode: true

window: "7d"

drift_monitors: ["drift_kl","psi"]

alert_rules: [{name:"robust_drop", rule:"Δ_rel>0.10 for 60m", severity:"high"}]

metrology: {units:"SI", check_dim:true}


X. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SHIFT.SPEC_DEFINED

when: "$.robustness.shift_tests[*]"

assert: "has_keys(name) and (has_key(severity) or has_key(ms) or has_key(bands))"

level: error

- id: ADV.THREAT_ALLOWED

when: "$.robustness.adversarial.threat_model"

assert: "value in ['whitebox','blackbox','transfer']"

level: error

- id: ADV.PARAMS_VALID

when: "$.robustness.adversarial"

assert: "value.enabled == false or (has_keys(norm, epsilon, steps) and epsilon > 0 and steps >= 1)"

level: error

- id: METRIC.THRESHOLDS_DEFINED

when: "$.robustness.thresholds"

assert: "has_keys(drop_rel_max, acc_robust_min)"

level: error

- id: REPORT.CI_REQUIRED

when: "$.robustness.reporting"

assert: "value.include_ci == true and has_keys(significance.method, significance.alpha)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


XI. Cross-Reference Anchors


XII. Chapter Compliance Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/