46-EFT.WP.Data.Benchmarks v1.0 | Chapter 12 Robustness, Shift & Adversarial

Home ／ Docs-Technical WhitePaper (V6.0) ／ 46-EFT.WP.Data.Benchmarks v1.0

Chapter 12 Robustness, Shift & Adversarial

I. Chapter Purpose & Scope

evaluation in benchmarks: shift types and severity scales, adversarial threat models and parameters, evaluation protocol and thresholds, reporting format and statistical significance, and linkage with scoring/ranking/gates; ensure consistency with task definitions, metric system, evaluation protocol, metrology, and citation anchors.adversarial, and distribution shift, robustnessFix specifications for

II. Terminology & Dependencies

Terms: synthetic_shift, natural_shift, severity, Δ_rel (relative drop), adv.threat_model (whitebox|blackbox|transfer), ‖δ‖_p ≤ ε, attack_steps/restarts/targeted, robust_accuracy, auc_robust.
Dependencies: metrics & units (Ch.6), evaluation protocol (Ch.7), runtime environment (Ch.10), scoring & gates (Ch.8), units & dimensions (Core.Metrology v1.0:check_dim).
Math & symbols: wrap inline symbols; any division/integral/composite operator must use parentheses; for T_arr use
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
  declaring gamma(ell) and d ell. No Chinese in formulas/symbols/definitions.

III. Fields & Structure (Normative)

robustness:

shift_tests:

- {name:"snr_drop", severity:[3,6,9], unit:"dB", policy:"additive-noise"}

- {name:"time_jitter",ms:[5,10,20], policy:"shuffle-window"}

- {name:"spec_notch", bands:[["0.3","0.5"],["0.6","0.7"]], unit:"fraction"}

natural_shifts:

axes: ["device","region","season","domain","locale"]

splits: ["val","test"]

adversarial:

enabled: false

threat_model: "whitebox|blackbox|transfer"

norm: "Linf|L2|L1"

epsilon: 0.01

steps: 10

restarts: 1

targeted: false

metrics:

primary: ["Δ_rel","acc_robust","auc_robust"]

curves: ["acc-vs-ε","acc-vs-SNR","acc-vs-mask"]

thresholds:

drop_rel_max: 0.10

acc_robust_min: 0.80

ece_max_under_shift: 0.05

reporting:

table_axes: ["shift","severity","metric"]

include_ci: true

significance: {method:"bootstrap", B:10000, alpha:0.05}

online_consistency:

shadow_mode: true

window: "7d"

drift_monitors: ["drift_kl","psi"]

alert_rules:

- {name:"robust_drop", rule:"Δ_rel>0.10 for 60m", severity:"high"}

IV. Shift Types & Severity Scales

Synthetic shifts:
- snr_drop: additive noise with severity in dB; declare noise type (Gaussian/colored), random seed, and injection point (pre/post normalization).
- time_jitter: window reshuffle/jitter; specify ms window and boundary handling.
- spec_notch: spectral band notches; declare normalized band ranges and mask policy (zero/median).
Natural shifts: axes device/region/season/domain/locale; report coverage and sample counts and check consistency with Dataset Card coverage.

V. Adversarial Evaluation (Threat Models & Parameters)

Threat models: whitebox (e.g., PGD), blackbox (score/decision-based), transfer.
Constraints: enforce ‖δ‖_p ≤ ε; provide steps/restarts/targeted.
Safety guardrails: adversarial samples used offline or as shadow traffic only; never deploy unisolated into production paths.

VI. Metrics, Thresholds & Coupling

Relative drop: Δ_rel = ( baseline - under_shift ) / max( baseline, ε ).
Robust accuracy: acc_robust at a given severity or worst-case over a set.
Area metrics: auc_robust over ε/SNR/mask spans.
Calibration drift: report ECE/Brier under shift and compare with ece_max_under_shift.
Gates: if Δ_rel>drop_rel_max or acc_robust<acc_robust_min or ECE exceeds the ceiling → release blocking; align with Ch.8 scoring gates.

VII. Statistics & Reporting

Significance: default bootstrap (B≥10k, α=0.05) with CI_95; apply Holm–Bonferroni for multi-model/multi-axis comparisons.
Format: tables keyed by shift/severity/metric; include curves (acc-vs-ε/SNR/mask) and key point estimates.

VIII. Metrology & Units (SI)

Performance & resources: QPS(1/s), latency_ms.{p50,p95,p99}, ρ(—), net_mbps, size_bytes.
Mandatory: metrology:{units:"SI", check_dim:true}; normalize units first before composition/comparison.
Path quantities: if robustness experiments involve T_arr-related processing or metrics, register delta_form/path/measure and validate using the two equivalences.

IX. Machine-Readable Fragment (Drop-in)

robustness:

shift_tests:

- {name:"snr_drop", severity:[3,6,9], unit:"dB", policy:"additive-noise"}

- {name:"time_jitter", ms:[5,10,20], policy:"shuffle-window"}

- {name:"spec_notch", bands:[["0.3","0.5"],["0.6","0.7"]], unit:"fraction"}

natural_shifts: {axes:["device","region"], splits:["val","test"]}

adversarial: {enabled:false, threat_model:"whitebox", norm:"Linf", epsilon:0.01, steps:10, restarts:1, targeted:false}

metrics: {primary:["Δ_rel","acc_robust"], curves:["acc-vs-ε","acc-vs-SNR"]}

thresholds: {drop_rel_max:0.10, acc_robust_min:0.80, ece_max_under_shift:0.05}

reporting: {table_axes:["shift","severity","metric"], include_ci:true, significance:{method:"bootstrap", B:10000, alpha:0.05}}

online_consistency:

shadow_mode: true

window: "7d"

drift_monitors: ["drift_kl","psi"]

alert_rules: [{name:"robust_drop", rule:"Δ_rel>0.10 for 60m", severity:"high"}]

metrology: {units:"SI", check_dim:true}

X. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SHIFT.SPEC_DEFINED

when: "$.robustness.shift_tests[*]"

assert: "has_keys(name) and (has_key(severity) or has_key(ms) or has_key(bands))"

level: error

- id: ADV.THREAT_ALLOWED

when: "$.robustness.adversarial.threat_model"

assert: "value in ['whitebox','blackbox','transfer']"

level: error

- id: ADV.PARAMS_VALID

when: "$.robustness.adversarial"

assert: "value.enabled == false or (has_keys(norm, epsilon, steps) and epsilon > 0 and steps >= 1)"

level: error

- id: METRIC.THRESHOLDS_DEFINED

when: "$.robustness.thresholds"

assert: "has_keys(drop_rel_max, acc_robust_min)"

level: error

- id: REPORT.CI_REQUIRED

when: "$.robustness.reporting"

assert: "value.include_ci == true and has_keys(significance.method, significance.alpha)"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

XI. Cross-Reference Anchors

Metric system & units: EFT.WP.Data.Benchmarks v1.0, Ch.6.
Scoring, normalization & gates: Ch.8.
Evaluation protocol & runtime environment: EFT.WP.Data.ModelCards v1.0, Ch.11; this volume, Ch.10.
Unit & dimension checks: EFT.WP.Core.Metrology v1.0:check_dim.

XII. Chapter Compliance Checklist

Synthetic/natural shift and adversarial settings complete; threat model, norm, and ε/steps/restarts/targeted explicit.
Metrics & thresholds present; jointly report Δ_rel/acc_robust/auc_robust and calibration drift; gates aligned with Ch.8.
Significance & CI configuration (with multiple-comparison correction) active; reports include tables and curves.
SI metrology with check_dim=true; if T_arr appears, delta_form/path/measure registered and validated.
Machine-readable fragment is drop-in and lint-clean; online consistency (if applicable) includes shadow/drift monitors and alert rules.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05