Home / Docs-Technical WhitePaper / 15-EFT.WP.Methods.Falsification v1.0
Chapter 9: Online Falsification & Gating
I. Scope & Objectives
- Define the runtime mechanics and gating strategy for online falsification, covering shadow evaluation, red-team streams and canary negative-sample injection, sequential/adaptive test coupling, OOD detection, online consistency via delta_offon and R_infer monitoring, and the state machine with rollback playbooks for GateDecision ∈ {pass, hold, block}.
- All online metrics and offline evidence are bound to the shared time base ts = alpha + beta * tau_mono and the locked environment EnvLock to guarantee traceability and forensic reproducibility.
- Gating seeks to respect the risk budget rho_budget, combining Chapter 8’s violation probability r = P( violation | D ) with the coverage budget delta_cov, and minimizing both false blocks and missed blocks under the constraints of alpha_sig, beta_err, and online FDR.
II. Terms & Symbols
- Streams & windows
- Observation sequence: {(x_t, y_hat_t, y_t, meta_t)}_{t=1..}; sliding window: W_t = {t - w + 1, ..., t}.
- Violation indicator: V_t ∈ {0,1}; windowed violation rate:
r_hat_t = ( 1/|W_t| ) * Σ_{i ∈ W_t} V_i.
- Online metrics
- TS.latency_t, TS.thrpt_t, TS.error_t; thresholds tau_latency, tau_thrpt_low, tau_error.
- Consistency:
delta_offon_t = ( norm( y_hat_off_t - y_hat_on_t ) / max( norm( y_hat_off_t ), eps ) );
R_infer_t = 1 - delta_offon_t.
- Risk & budgets
Violation probability r_t = P( violation | D_t ) or its upper bound r_hat_t; global budget rho_budget; online significance allocation alpha_spend(t); online FDR target q_star. - OOD & calibration
OOD(x_t) with threshold tau_ood; calibration errors ECE_t, MCE_t, NLL_t. - Gating strategy
- Decision function:
GateDecision_t = g( r_t, TS.latency_t, TS.error_t, OOD(x_t), ECE_t ; policy ). - State set and order: pass → hold → block, monotone non-decreasing, with rollback playbooks to unwind.
- Decision function:
III. Postulates & Minimal Equations
- P51-15 (Monotone states & minimal intervention)
For a fixed policy, GateDecision is monotone non-decreasing in the triggering evidence; among feasible interventions under rho_budget, select the lowest-cost action first (observe → rate-limit → block). - P51-16 (Consistency-first postulate)
If delta_offon_t > tau_offon, prioritize consistency restoration over performance tuning; escalate GateDecision by at least one level (pass → hold). - P51-17 (Additive risk & budget conservation)
For independent sub-streams k=1..K, total risk satisfies Σ_k rho_budget_k ≤ rho_budget; online spending obeys Σ_{i=1..t} alpha_i ≤ alpha_total. - S52-37 (EWMA violation rate)
Z_t = lambda * V_t + ( 1 - lambda ) * Z_{t-1 }, trigger alarm if Z_t ≥ h; lambda ∈ (0,1]. - S52-38 (Sequential Probability Ratio Test online)
With H0: r ≤ r0, H1: r ≥ r1, likelihood ratio recursion
LR_t = Π_{i=1..t} ( p_1(V_i) / p_0(V_i) ); decisions: - LR_t ≥ A → reject H0
- LR_t ≤ B → accept H0
- B < LR_t < A → continue
where A = (1 - beta_err) / alpha_sig, B = beta_err / (1 - alpha_sig).
- S52-39 (Online significance allocation)
alpha_spent(t) = Σ_{i=1..t} alpha_i ≤ alpha_total; examples:
Type I: alpha_i = w_i * alpha_total with Σ_i w_i ≤ 1;
Type II: alpha_i = min( alpha_cap, c / (i + d) ). - S52-40 (Online FDR constraint)
With cumulative rejections R_t and false rejections V_t, enforce
FDR_t = E[ V_t / max( R_t, 1 ) ] ≤ q_star
via adaptive gating updating alpha_i ← f(history). - S52-41 (Gating thresholds & decisions)
Based on Chapter 8 S52-36: - r_t ≥ tau_block → GateDecision_t = block
- tau_hold ≤ r_t < tau_block → GateDecision_t = hold
- r_t < tau_hold → GateDecision_t = pass
Augmentation: if TS.error_t ≥ tau_error or TS.latency_t ≥ tau_latency or OOD(x_t) ≥ tau_ood, then
GateDecision_t ← max( GateDecision_t, hold ).
- S52-42 (Online coverage & audit consistency)
With coverage estimator
cov_hat_t = ( 1/|W_t| ) * Σ_{i∈W_t} 1[ y_i ∈ Pi(x_i) ],
trigger hold/block when
cov_hat_t < 1 - delta_cov - tau_cov.
IV. Data & Manifest Conventions
- OnlineProbe.card
{topic, source, sample_rate, window:w, lambda, h, tau_error, tau_latency, tau_offon, tau_ood, alpha_total, q_star, anchor, EnvLock}. - Shadow.card
{shadow_graph, traffic_fraction, golden_set_hash, oracle, Cal.sig, Gate.policy}. - Canary.card
{mixing_rate, budget.cpu/gpu/mem, mut_ops, adversarial(eps), schedule, safety_constraints}. - Gate.policy
{tau_hold, tau_block, tau_error, tau_latency, tau_thrpt_low, tau_offon, tau_ood, alpha_spend, rollback_playbook}. - Audit outputs
{gate_audit.log, decisions.parquet, lr_trace.csv, ewma.csv, coverage_online.csv, alarms.json, fingerprint, hash(•)}.
V. Algorithms & Implementation Bindings
- Prototypes (extending I50-*)
- I50-19 gate_decide(r:float, ts:dict, ood:float, calib:dict, policy:dict) -> {decision:str, reason:dict}
- I50-20 shadow_eval(runtime:any, stream:any, oracle:any, policy:dict) -> ShadowReport
- I50-21 canary_inject(stream:any, ce_source:any, rate:float, budget:dict) -> CanaryRun
- I50-22 alpha_spend_scheduler(history:any, scheme:str, params:dict) -> alpha_i
- I50-23 ewma_drift(V_t:int, lambda:float, Z_prev:float) -> {Z_t:float, alarm:bool}
- I50-24 stream_sprt(V_t:int, state:dict, r0:float, r1:float, alpha:float, beta:float) -> {state, action:str}
- I50-25 rollback_execute(playbook:dict, level:str) -> Result
- State machine (overview)
- pass — monitor & log only.
- hold — degrade & rate-limit; expand shadow evaluation and canary coverage; tighten alpha_i and raise delta_cov adjustment.
- block — block high-risk transactions; switch to safe baseline or read-only mode; invoke rollback_execute.
- Decision explanation fields
{trigger ∈ {risk, latency, error, ood, offon, coverage}, metric_value, threshold, alpha_spent, lr_or_ewma_state}.
VI. Metrology Flows & Run Diagram
- Mx-66 Online probes & violation aggregation
Collect V_t and TS.*; compute r_hat_t, Z_t, ECE_t, OOD(x_t); update alpha_spend(t) and GateDecision_t; persist audit logs. - Mx-67 Shadow evaluation & canary negatives
Route traffic_fraction to shadow_graph; inject canary and adversarial samples at mixing_rate; replay the golden_set_hash periodically; emit a synchronized ShadowReport. - Mx-68 Online consistency & rollback
Compute delta_offon_t and R_infer_t; when thresholds are exceeded, escalate to hold/block and call rollback_execute(playbook); after rollback, verify cov_hat_t and TS.* recovery. - Mx-69 Gating–audit closed loop
For each decision, record reason and snapshot thresholds; compute daily rolling false-block, missed-block, detection delay, and alpha_spent(t); bundle into the Evidence.bundle.
VII. Verification & Test Matrix
- Power & latency
- Simulate a step change in violation rate: measure SPRT mean stopping time and realized power; require performance within the beta_err target.
- For EWMA, the detection delay for small drifts must be below T_detect.
- FDR & significance budgets
- On live streams, estimate an upper confidence bound for FDR_t; require FDR_t ≤ q_star + margin.
- Ensure alpha_spend(t) stays within budget: Σ alpha_i ≤ alpha_total.
- Reliability & coverage
- Real-time ECE_t, cov_hat_t meet ECE_target and 1 - delta_cov - tau_cov.
- On OOD sub-streams, the escalation rate of GateDecision matches policy expectations, and the false-block rate for critical traffic is below rho_budget.
- Performance & rollback
After hold/block, TS.latency and TS.error recover within T_recover to thresholds; rollbacks are repeatable and idempotent.
VIII. Cross-References & Dependencies
- Depends on: Chapter 7 (SPRT, alpha-spending, FDR), Chapter 8 (r, Pi(x), cov_hat, extended gating), Chapter 6 (injection & operator orchestration).
- References: Core.Threads (routing & orchestration), Core.Metrology (online metrics), Core.Errors (error types & thresholds).
IX. Risks, Limitations & Open Questions
- Risks & limitations
Non-exchangeable streams bias online statistics; interference between canary and shadow paths; OOD false alarms cause over-gating; rate-limit and blocking perturb TS.thrpt; misallocated alpha and coverage budgets lead to budget exhaustion. - Open questions
Game-theoretic optimization of multi-policy gating (cost–risk–latency); migration and adaptation of Gate.policy across domains; unified schedulers for online FDR and coverage budgets; self-bootstrapping adversarial streams with closed-loop learning.
X. Deliverables & Versioning
- Deliverables
OnlineProbe.card, Shadow.card, Canary.card, Gate.policy, gate_audit.log, decisions.parquet, lr_trace.csv, ewma.csv, coverage_online.csv, Evidence.bundle (with hash(•), fingerprint). - Versioning policy
- Threshold retuning and budget reallocation → minor bump.
- State machine or decision function g(•) changes → major bump.
- Any change to audit-field schemas requires signature refresh and Appendix C registry.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/