18-EFT.WP.Methods.CrossStats v1.0 | Chapter 15 — Use Cases & Reference Implementations

Home ／ Docs-Technical WhitePaper (V6.0) ／ 18-EFT.WP.Methods.CrossStats v1.0

Chapter 15 — Use Cases & Reference Implementations

One-line objective: Demonstrate three end-to-end paradigms—offline evaluation, online A/B (sequential/multiplicity), and cross-domain calibration transfer—to operationalize P30x/S30x/M30/I30, forming an auditable, reproducible, and rollback-capable statistical practice loop.

I. Scope & Targets

Scope
Offline model evaluation and release gates; online experimentation (sequential/multiple) with closed-loop decisioning; cross-domain calibration transfer and baseline updates.
Targets
- Inputs: D_clean, manifest.*, ref, slo_policy, alpha_budget, bins, sync_ref.
- Outputs: eval_report, ab_decision, calibration_map, drift_report, slo_attainment, manifest.stats.*.
- Constraints: check_dim(expr) passes; metric windows computed on tau_mono; if T_arr is involved, record both formulations in parallel and persist delta_form.

II. Terms & Symbols

Data & weights: (x_i, y_i), w_i = 1 / pi(i), W_norm = ( ∑ w_i ) / N.
Intervals & power: alpha, beta, power = 1 - beta, MDE.
Calibration: f_cal(z), ECE, Brier.
Drift: W1, KL, psi, drift_level, drift_slope.
Arrival time: T_arr, c_ref, gamma(ell), d ell, delta_form.
Latency: latency_ms_p50/p95/p99, staleness.

*III. Axioms P315- **

P315-1 (Reproducibility Chain): Every step of evaluation/experimentation/transfer must be reproducible from repro_hash and the corresponding manifest.
P315-2 (Weighted Consistency): Whenever sampling/exposure bias exists, use weighting or propensity adjustments and declare W_norm.
P315-3 (Time-Base Alignment): Compute within tau_mono; publish on ts with offset/skew/J.
P315-4 (Dual Formulations in Parallel): When T_arr appears, record both formulations and delta_form.
P315-5 (Contracts First): All deliverables must pass C30-* contracts to clear release gates.

*IV. Minimal Equations S315- **

S315-1 (Weighted Mean/Variance)
- hat{mu}_w = ( ∑ w_i y_i ) / ( ∑ w_i )
- hat{sigma}_w^2 = ( ∑ w_i ( y_i - hat{mu}_w )^2 ) / ( ∑ w_i )
S315-2 (Two-Arm Sample Size Approximation)
n_per_arm ≈ ( ( z_{1 - alpha/2} + z_{power} )^2 * 2 * sigma^2 ) / MDE^2
S315-3 (ECE)
ECE = ∑_{b=1..B} ( n_b / N ) * | acc_b - conf_b |
S315-4 (Sequential Stopping)
tau = inf { t : S_t ≥ h_upper or S_t ≤ h_lower } (with S_t the cumulative log-likelihood ratio).
S315-5 (Drift Metrics)
W1(p,q), KL(p||q), psi = ∑ ( (q_i - p_i) * ln( q_i / p_i ) ) (binned).
S315-6 (Arrival-Time Gap)
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |

V. Metrology Flow M30-15 (Three End-to-End Use Cases)

Use Case A: Offline Evaluation & Release Gate (Batch → Freeze)
- Ready
  1. Run standardize_names and repair_units (see Methods.Cleaning).
  2. time_align_for_stats(ds, sync_ref); compute_weights(ds, scheme).
- Estimate
  1. fit_glm(ds, formula, family) or import model scores.
  2. bootstrap_metric(fn, ds, B) to produce {est, CI}; calibration_report(pred, obs, bins).
- Check
  detect_drift(ref, cur, metrics); evaluate_stat_contracts(metrics, rules); if needed, backtest_coverage(ds, plan).
- Persist
  emit_stats_manifest(results, policy); coordinate with Methods.Cleaning freeze_release(ds, tag) for artifact release.
Use Case B: Online A/B & Sequential Decisions (Stream → Decide → Rollback/Ship)
- Ready
  design_experiment(pop, constraints, alpha, power); register alpha_budget and slo_policy.
- Run
  run_ab_test(stream, metric, alpha_spending) to emit real-time S_t and interim decisions; track_alpha_spending(seq_tests).
- Guard
  drift_monitor(ref, cur, methods); latency_summary(traces); when exposure bias is detected, estimate_ate(ds, method=DR).
- Close
  compute_slo_attainment(metrics, slo); audit_decision(trace, manifest); on violation, execute rollback plan and re-experiment.
Use Case C: Cross-Domain Calibration Transfer & Baseline Update (Domain A → Domain B)
- Ready
  Collect samples from A/B; harmonize units via repair_units and align time via time_align_for_stats.
- Transfer
  calibration_transfer(src=A, dst=B, method ∈ {Platt, Isotonic, BBQ}) -> map; enforce monotonicity and guard against overfitting.
- Validate
  On domain B, evaluate ECE, Brier before/after; detect_drift to ensure W1/KL/psi stay within thresholds.
- Publish
  emit_stats_manifest to manifest.stats.calibration.*, including map.version, bins, ECE_before/after, then sign and archive.

VI. Contracts & Assertions (Use-Case Mapping C30-151x)

C30-1511 (Weight Normalization): | W_norm - 1 | ≤ 0.01.
C30-1512 (Coverage): coverage_rate ≥ SLO.coverage_min (Offline A).
C30-1513 (Calibration): ECE_after ≤ ECE_before - delta_min and Brier ≤ SLO.Brier_max (Cross-domain C).
C30-1514 (Sequential Error Control): alpha_spent ≤ alpha_budget; FDR ≤ SLO.FDR_max (Online B).
C30-1515 (Power Attainment): Only terminate underpowered trials when n ≥ n_per_arm and MDE is met (Online B).
C30-1516 (Drift Thresholds): If W1 ≤ W1_max ∧ KL ≤ KL_max ∧ psi ≤ psi_max is not satisfied, promotion is prohibited (A/C).
C30-1517 (Dual-Form Gap): If T_arr exists, assert delta_form ≤ tol_Tarr (any use case).
C30-1518 (Latency): latency_ms_p99 ≤ SLO.latency_p99_max (Online B).

VII. Implementation Bindings I30- (Use-Case Subsets)*

Evaluation chain: compute_weights → fit_glm → bootstrap_metric → calibration_report → evaluate_stat_contracts → emit_stats_manifest.
Experiment chain: design_experiment → run_ab_test → track_alpha_spending → drift_monitor → latency_summary → audit_decision.
Transfer chain: calibration_transfer → calibration_report → detect_drift → emit_stats_manifest.
Invariants: alpha_spent ≤ alpha_budget; sum(w_i)/N ≈ 1; metrics.window == Delta_t; signature verifiable.

VIII. Cross-References

Dimensions & units: Methods.Cleaning v1.0, Chapter 4.
Timelines & synchronization: Methods.Cleaning v1.0, Chapter 5.
Multiple comparisons: this volume, Chapter 6.
Drift monitoring: this volume, Chapter 7.
A/B design & stopping: this volume, Chapter 8.
Calibration transfer: this volume, Chapter 9.
Compliance & publication: Methods.Cleaning v1.0, Chapter 10, and this volume, Chapter 14.
Execution graphs & backpressure: EFT.WP.Core.Threads v1.0.

IX. Quality & Risk Control

Use Case A (Offline)
- SLIs: coverage_rate, ECE, Brier, W1/KL/psi.
- Rollback: switch to more conservative intervals (bootstrap/Bayesian quantiles), increase B, or roll back to ref.
Use Case B (Online)
- SLIs: latency_ms_p99, alpha_spent, FDR, decision_sign_stability.
- Rollback: gray-rollback with traffic reduction, freeze stopping boundaries, withdraw variants while conserving alpha_budget.
Use Case C (Cross-Domain)
- SLIs: ECE_after - ECE_before, Brier_after, drift_level.
- Rollback: disable map, revert to in-domain calibration, or trigger resampling.

Summary

The three use cases cover the critical paths for offline evaluation, online decisioning, and cross-domain transfer. Each adopts P315-* as non-negotiable premises, S315-* as computational baselines, M30-15 as the process spine, and C30-151x as release gates. Via I30-* interfaces, statistical gauges are unified with cleaning/time-base/audit systems into an integrated, fully traceable production practice.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05