Home / Docs-Technical WhitePaper / 18-EFT.WP.Methods.CrossStats v1.0
Chapter 15 — Use Cases & Reference Implementations
One-line objective: Demonstrate three end-to-end paradigms—offline evaluation, online A/B (sequential/multiplicity), and cross-domain calibration transfer—to operationalize P30x/S30x/M30/I30, forming an auditable, reproducible, and rollback-capable statistical practice loop.
I. Scope & Targets
- Scope
Offline model evaluation and release gates; online experimentation (sequential/multiple) with closed-loop decisioning; cross-domain calibration transfer and baseline updates. - Targets
- Inputs: D_clean, manifest.*, ref, slo_policy, alpha_budget, bins, sync_ref.
- Outputs: eval_report, ab_decision, calibration_map, drift_report, slo_attainment, manifest.stats.*.
- Constraints: check_dim(expr) passes; metric windows computed on tau_mono; if T_arr is involved, record both formulations in parallel and persist delta_form.
II. Terms & Symbols
- Data & weights: (x_i, y_i), w_i = 1 / pi(i), W_norm = ( ∑ w_i ) / N.
- Intervals & power: alpha, beta, power = 1 - beta, MDE.
- Calibration: f_cal(z), ECE, Brier.
- Drift: W1, KL, psi, drift_level, drift_slope.
- Arrival time: T_arr, c_ref, gamma(ell), d ell, delta_form.
- Latency: latency_ms_p50/p95/p99, staleness.
*III. Axioms P315- **
- P315-1 (Reproducibility Chain): Every step of evaluation/experimentation/transfer must be reproducible from repro_hash and the corresponding manifest.
- P315-2 (Weighted Consistency): Whenever sampling/exposure bias exists, use weighting or propensity adjustments and declare W_norm.
- P315-3 (Time-Base Alignment): Compute within tau_mono; publish on ts with offset/skew/J.
- P315-4 (Dual Formulations in Parallel): When T_arr appears, record both formulations and delta_form.
- P315-5 (Contracts First): All deliverables must pass C30-* contracts to clear release gates.
*IV. Minimal Equations S315- **
- S315-1 (Weighted Mean/Variance)
- hat{mu}_w = ( ∑ w_i y_i ) / ( ∑ w_i )
- hat{sigma}_w^2 = ( ∑ w_i ( y_i - hat{mu}_w )^2 ) / ( ∑ w_i )
- S315-2 (Two-Arm Sample Size Approximation)
n_per_arm ≈ ( ( z_{1 - alpha/2} + z_{power} )^2 * 2 * sigma^2 ) / MDE^2 - S315-3 (ECE)
ECE = ∑_{b=1..B} ( n_b / N ) * | acc_b - conf_b | - S315-4 (Sequential Stopping)
tau = inf { t : S_t ≥ h_upper or S_t ≤ h_lower } (with S_t the cumulative log-likelihood ratio). - S315-5 (Drift Metrics)
W1(p,q), KL(p||q), psi = ∑ ( (q_i - p_i) * ln( q_i / p_i ) ) (binned). - S315-6 (Arrival-Time Gap)
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |
V. Metrology Flow M30-15 (Three End-to-End Use Cases)
- Use Case A: Offline Evaluation & Release Gate (Batch → Freeze)
- Ready
- Run standardize_names and repair_units (see Methods.Cleaning).
- time_align_for_stats(ds, sync_ref); compute_weights(ds, scheme).
- Estimate
- fit_glm(ds, formula, family) or import model scores.
- bootstrap_metric(fn, ds, B) to produce {est, CI}; calibration_report(pred, obs, bins).
- Check
detect_drift(ref, cur, metrics); evaluate_stat_contracts(metrics, rules); if needed, backtest_coverage(ds, plan). - Persist
emit_stats_manifest(results, policy); coordinate with Methods.Cleaning freeze_release(ds, tag) for artifact release.
- Ready
- Use Case B: Online A/B & Sequential Decisions (Stream → Decide → Rollback/Ship)
- Ready
design_experiment(pop, constraints, alpha, power); register alpha_budget and slo_policy. - Run
run_ab_test(stream, metric, alpha_spending) to emit real-time S_t and interim decisions; track_alpha_spending(seq_tests). - Guard
drift_monitor(ref, cur, methods); latency_summary(traces); when exposure bias is detected, estimate_ate(ds, method=DR). - Close
compute_slo_attainment(metrics, slo); audit_decision(trace, manifest); on violation, execute rollback plan and re-experiment.
- Ready
- Use Case C: Cross-Domain Calibration Transfer & Baseline Update (Domain A → Domain B)
- Ready
Collect samples from A/B; harmonize units via repair_units and align time via time_align_for_stats. - Transfer
calibration_transfer(src=A, dst=B, method ∈ {Platt, Isotonic, BBQ}) -> map; enforce monotonicity and guard against overfitting. - Validate
On domain B, evaluate ECE, Brier before/after; detect_drift to ensure W1/KL/psi stay within thresholds. - Publish
emit_stats_manifest to manifest.stats.calibration.*, including map.version, bins, ECE_before/after, then sign and archive.
- Ready
VI. Contracts & Assertions (Use-Case Mapping C30-151x)
- C30-1511 (Weight Normalization): | W_norm - 1 | ≤ 0.01.
- C30-1512 (Coverage): coverage_rate ≥ SLO.coverage_min (Offline A).
- C30-1513 (Calibration): ECE_after ≤ ECE_before - delta_min and Brier ≤ SLO.Brier_max (Cross-domain C).
- C30-1514 (Sequential Error Control): alpha_spent ≤ alpha_budget; FDR ≤ SLO.FDR_max (Online B).
- C30-1515 (Power Attainment): Only terminate underpowered trials when n ≥ n_per_arm and MDE is met (Online B).
- C30-1516 (Drift Thresholds): If W1 ≤ W1_max ∧ KL ≤ KL_max ∧ psi ≤ psi_max is not satisfied, promotion is prohibited (A/C).
- C30-1517 (Dual-Form Gap): If T_arr exists, assert delta_form ≤ tol_Tarr (any use case).
- C30-1518 (Latency): latency_ms_p99 ≤ SLO.latency_p99_max (Online B).
VII. Implementation Bindings I30- (Use-Case Subsets)*
- Evaluation chain: compute_weights → fit_glm → bootstrap_metric → calibration_report → evaluate_stat_contracts → emit_stats_manifest.
- Experiment chain: design_experiment → run_ab_test → track_alpha_spending → drift_monitor → latency_summary → audit_decision.
- Transfer chain: calibration_transfer → calibration_report → detect_drift → emit_stats_manifest.
- Invariants: alpha_spent ≤ alpha_budget; sum(w_i)/N ≈ 1; metrics.window == Delta_t; signature verifiable.
VIII. Cross-References
- Dimensions & units: Methods.Cleaning v1.0, Chapter 4.
- Timelines & synchronization: Methods.Cleaning v1.0, Chapter 5.
- Multiple comparisons: this volume, Chapter 6.
- Drift monitoring: this volume, Chapter 7.
- A/B design & stopping: this volume, Chapter 8.
- Calibration transfer: this volume, Chapter 9.
- Compliance & publication: Methods.Cleaning v1.0, Chapter 10, and this volume, Chapter 14.
- Execution graphs & backpressure: EFT.WP.Core.Threads v1.0.
IX. Quality & Risk Control
- Use Case A (Offline)
- SLIs: coverage_rate, ECE, Brier, W1/KL/psi.
- Rollback: switch to more conservative intervals (bootstrap/Bayesian quantiles), increase B, or roll back to ref.
- Use Case B (Online)
- SLIs: latency_ms_p99, alpha_spent, FDR, decision_sign_stability.
- Rollback: gray-rollback with traffic reduction, freeze stopping boundaries, withdraw variants while conserving alpha_budget.
- Use Case C (Cross-Domain)
- SLIs: ECE_after - ECE_before, Brier_after, drift_level.
- Rollback: disable map, revert to in-domain calibration, or trigger resampling.
Summary
The three use cases cover the critical paths for offline evaluation, online decisioning, and cross-domain transfer. Each adopts P315-* as non-negotiable premises, S315-* as computational baselines, M30-15 as the process spine, and C30-151x as release gates. Via I30-* interfaces, statistical gauges are unified with cleaning/time-base/audit systems into an integrated, fully traceable production practice.Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/