HomeDocs-Technical WhitePaper15-EFT.WP.Methods.Falsification v1.0

Chapter 4: Data & Adversarial Sample Conventions


I. Scope & Objectives

  1. Define unified conventions for the data assets and adversarial sample families required for falsification — covering collection, normalization, generation, filtering, fingerprinting, lineage, and signed publication — to ensure reproducibility, forensic traceability, and quantitative comparability under EnvLock and a shared time base ts = alpha + beta * tau_mono.
  2. Goals & pass criteria
    • Establish the sample family D_family = { D_base, D_neg, D_boundary, D_ood, D_adv, D_mr } with generation and acceptance rules. Any adversarial perturbation must satisfy the budgeted feasible-set constraint
      x_adv = proj_X( x + delta ), with || delta ||_p ≤ epsilon_lp.
    • The bank is evaluated by coverage and kill:
      cov_spec = ( |C_hit| / |C_total| ), kill_rate = ( |mut_killed| / |mut_all| ); thresholds must be met, and statistical tests must pass FDR/FWER control (see Chapter 3 S52-*).

II. Terms & Symbols


III. Postulates & Minimal Equations

  1. P51-4 (Feasible-set & dimensional-conservation postulate)
    Any generated sample x_gen must lie in the feasible set X and pass dimensional checks check_dim(expr). If x + delta exits X, project back via proj_X(•) and record the overflow magnitude and correction path.
  2. P51-5 (Metamorphic invariance postulate)
    For any MR_k, there exists an invariant Inv_k such that
    Inv_k( x ) = Inv_k( MR_k(x) ).
    A violation constitutes an assertion failure and routes to falsification (see Chapter 5).
  3. S52-5 (Adversarial budget & success criteria)
    Budget: || delta ||_p ≤ epsilon_lp, x_adv = proj_X( x + delta ).
    Success (choose one):
    • Label flip: argmax f( x_adv ) ≠ argmax f( x )
    • Loss delta: ΔL = L( y, f( x_adv ) ) - L( y, f( x ) ) ≥ tau_attack
  4. S52-6 (OOD distance & gate)
    With embedding z(•) and metric dist(•,•),
    d_ood(x) = min_{x' ∈ D_train} dist( z(x), z(x') ).
    Flag OOD when d_ood(x) ≥ eta_ood, where eta_ood is tuned by target FPR/TPR on a dev ROC.
  5. S52-7 (Family sampling quotas)
    The sampling distribution pi_family satisfies Σ_j pi_family(j) = 1. During execution, the expected per-family quota is
    E[ n_j ] = pi_family(j) * N_total,
    and the mixture-quality gate is
    || \hat{pi} - pi_family ||_1 ≤ tau_mix.

IV. Data & Manifest Conventions


V. Algorithms & Implementation Bindings

  1. Prototypes (extending I50-*)
    • I50-3 generate_counterexamples(runtime:any, hypothesis:Hypothesis, ops:list, budget:dict) -> CEReport
    • I50-4 metamorphic_transform(x:any, MR:dict) -> x_prime:any
    • I50-5 adversarial_attack(runtime:any, x:any, method:str, eps:dict) -> AttackReport
    • I50-15 build_sample_bank(cards:dict, data:any) -> Bank (construct D_family and return indices)
    • I50-16 filter_ood(data:any, z:any, eta_ood:float) -> {in:list, ood:list}
    • I50-17 balance_sampler(bank:Bank, pi_family:dict, N:int, seed:int) -> Batch
    • I50-18 sign_bank(bank:Bank, anchor:str) -> {fingerprint:str, sig:str}
    • I50-19 verify_budget(batch:Batch, constraints:dict) -> BudgetReport
  2. Idempotency & exceptions
    Idempotent if EnvLock, cards, seed, and anchor are fixed — build_sample_bank must keep its fingerprint.
    Possible exceptions: E_SCHEMA_MISMATCH, E_DIMENSION_MISMATCH, E_PRIVACY_VIOLATION, E_RESOURCE_EXCEEDED, E_NONDETERMINISM.

VI. Metrology Flows & Run Diagram


VII. Verification & Test Matrix

  1. Compliance basics
    • Budget check: random x, verify max_p || delta ||_p ≤ epsilon_lp and x_adv ∈ X.
    • Metamorphic invariance: for each MR_k, sample-check Inv_k(x) = Inv_k(MR_k(x)); failure rate ≤ pre-registered threshold.
    • OOD threshold: on a labeled dev set, achieve TPR@FPR = tau_fpr.
  2. Coverage & kill
    • Spec coverage: cov_spec ≥ tau_cov.
    • Kill-rate: kill_rate ≥ tau_kill, with power interval or bootstrap band 1 - delta.
  3. Stability & replay
    Replay R = 3: fingerprint unchanged; attack_success_rate coefficient of variation < tau_cv; offline/online generation, after ts alignment, satisfies delta_offon ≤ tau_offon.

VIII. Cross-References & Dependencies

EFT.WP.Methods.Inference Chapters 4 (data & feature interfaces), 7 (uncertainty & calibration), 6 (online/offline consistency), 12 (acceptance & release).
Core.DataSpec (fields, dimensions, privacy & licensing), Core.Metrology (coverage, spectral & window conventions), Core.Errors (exception severity), Core.Threads (execution & concurrency).

IX. Risks, Limitations & Open Questions


X. Deliverables & Versioning

  1. Deliverables
    DataCard.yaml, AdvOpsCard.yaml, Bank.index.json (family & quota stats), Coverage.report, KillRate.report, OOD.eval, BudgetReport.json, Bank.sig.
  2. Versioning & changes
    • Any change to schema_id / preprocess / ops / epsilon / pi_family / eta_ood increments at least the minor in major.minor.patch; update fingerprint and anchor.
    • Link to TestPlan.version for backreference during acceptance and continuous falsification (see Chapter 12).

Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/