Home / Docs-Technical WhitePaper / 18-EFT.WP.Methods.CrossStats v1.0
Chapter 2 — Axioms and Minimal Equations (Statistical Baseline)
One-Line Objective
Establish a harmonized baseline across frequentist and Bayesian practice; standardize the minimal axioms P302-* and a reusable family of equations S302-* for sampling, estimation, and uncertainty—serving as anchor references for the entire volume.I. Scope and Objects
- Scope
- Applies to descriptive statistics, parameter estimation, intervals/posteriors, robust variance, sequential decisions, and arrival-time–aware windowed statistics for both batch data and event streams.
- Compute on tau_mono, publish on ts; whenever T_arr is involved, compute both conventions in parallel and record delta_form.
- Objects
Data D = { (x_i, y_i, t_i, w_i, m_i) }, model parameters theta, prior p(theta), likelihood L(theta; D), statistical window Delta_t, and contract policy policy.
II. Terms and Variables
- Sampling and weights: pi(i) (inclusion probability), w_i = 1 / pi(i), W_norm = ( ∑ w_i ) / N, n_eff = ( ∑ w_i )^2 / ( ∑ w_i^2 ).
- Estimation and uncertainty: hat{theta}, SE, CI, U = k * u_c.
- Timebase and time of arrival: tau_mono, ts, offset/skew/J; two T_arr conventions:
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), T_arr = ( ∫ ( n_eff / c_ref ) d ell ), discrepancy delta_form. - Information and diagnostics: I(theta) (Fisher information), H (Hessian), s_i (per-observation score), ppc (posterior predictive check).
III. Axioms P302-*
- P302-1 (Exchangeability / hierarchical exchangeability): within a design stratum, treat samples as exchangeable; across strata, use weights or hierarchical models.
- P302-2 (Consistent sampling frame & explicit weights): any weighted analysis must document how pi(i) was generated and versioned, and must retain weight metadata and the normalization factor.
- P302-3 (Dimensional conservation): before cross-modal aggregation, run check_dim(expr); unit-repair failures constitute contract violations.
- P302-4 (Unified timebase): evaluate windows on tau_mono; publish on ts with offset/skew/J.
- P302-5 (Parallel arrival-time conventions): for any statistic involving arrival time or propagation delay, compute both conventions and assert delta_form ≤ tol_Tarr.
- P302-6 (Explicit and interpretable priors): Bayesian analyses must state the motivation for p(theta) and include sensitivity analysis.
- P302-7 (Robustness and diagnostics): provide sandwich (robust) variance and posterior predictive checks by default; account for sequential/multiple-comparison alpha consumption in a unified ledger.
- P302-8 (Declared missingness mechanism): encode missingness via m_i ∈ {0,1}; explicitly document the assumed mechanism (MCAR/MAR/MNAR) and imputation strategy; implicit fills are prohibited.
IV. Minimal Equations S302-*
- S302-1 (Weighted log-likelihood and score):
ell(theta) = ∑_i w_i * log p( y_i | x_i; theta );
U(theta) = ∂ ell(theta) / ∂ theta = ∑_i w_i * ∂ log p( y_i | x_i; theta ) / ∂ theta;
hat{theta} satisfies U(hat{theta}) = 0. - S302-2 (Fisher information and asymptotic normality):
I(theta) = - E[ ∂^2 ell(theta) / ∂ theta ∂ theta^T ];
hat{theta} ~ approx Normal( theta, I(theta)^{-1} ) (with design- or replicate-based correction under weighting). - S302-3 (Sandwich robust covariance):
H = - ∑_i w_i * ∂^2 log p_i / ∂ theta ∂ theta^T;
S = ∑_i w_i^2 * s_i * s_i^T, where s_i = ∂ log p_i / ∂ theta;
Var_robust( hat{theta} ) = H^{-1} * S * ( H^{-1} )^T. - S302-4 (GLM score equations, unified notation):
g( mu_i ) = x_i^T beta, mu_i = E[ Y_i | x_i ];
U(beta) = X^T W ( y - mu ) = 0 (matrix W incorporates w_i, the variance function, and the link-derivative factor). - S302-5 (Bayesian posterior and posterior predictive):
p(theta | D) = k * L(theta; D) * p(theta), with normalizing constant k;
p( y_new | x_new, D ) = ( ∫ p( y_new | x_new, theta ) * p( theta | D ) d theta ). - S302-6 (Delta method):
Var( g( hat{theta} ) ) ≈ ( ∇g( theta ) )^T Var( hat{theta} ) ( ∇g( theta ) ). - S302-7 (Intervals and coverage):
Frequentist: CI_{1-α} = hat{theta} ± z_{1-α/2} * SE( hat{theta} );
Bayesian: CI_{1-α} = [ q_{α/2}( p(theta|D) ), q_{1-α/2}( p(theta|D) ) ]. - S302-8 (Windowed statistics):
hat{mu}_w( t; Delta_t ) = ( ∑_{i: |tau_i - t| ≤ Delta_t/2} w_i y_i ) / ( ∑_{i: |tau_i - t| ≤ Delta_t/2} w_i ). - S302-9 (Two-sample effects and sample-size approximation):
Mean difference delta = mu_A - mu_B, with variance Var( delta ) = sigma_A^2 / n_A + sigma_B^2 / n_B;
for equal variances and equal per-arm sample sizes,
n_per_arm ≈ ( 2 * ( z_{1-α/2} + z_{power} )^2 * sigma^2 ) / delta^2. - S302-10 (Sequential GLR statistic and stopping):
Lambda_t = ( sup_{theta ∈ H1} L_t( theta ) ) / ( sup_{theta ∈ H0} L_t( theta ) );
tau = inf { t : log Lambda_t ≥ h_1 or log Lambda_t ≤ - h_0 }. - S302-11 (Foundational drift metrics):
KL( P_ref || P_cur ) = ( ∫ p_ref * log( p_ref / p_cur ) dx );
W1( P_ref, P_cur ) = ( ∫ | F_ref(x) - F_cur(x) | dx );
psi = ( ∑ bins ( p_cur - p_ref ) * log( p_cur / p_ref ) ) (discrete approximation).
V. Statistical Process M30-2 (Baseline → Diagnostics → Release)
- Readiness
Verify units and dimensions; align to tau_mono; load the weighting scheme and normalize with W_norm ≈ 1; define H0/H1 and alpha/power, or the prior p(theta). - Estimation
Solve U( hat{theta} ) = 0 or draw posterior samples; compute SE, Var_robust, or posterior quantiles; evaluate windowed metrics over Delta_t. - Diagnostics
Inspect H condition number and near-zero scores; compare robust vs. model-based variance; run ppc and convergence checks; record alpha consumption for sequential or multiple comparisons. - Release
Produce CI/posterior, decisions, and logs; record both T_arr conventions and delta_form in parallel; persist manifest.stats.* and sign for freeze.
VI. Contracts and Assertions
- C30-21 (Weighting convention): | W_norm - 1 | ≤ tol_w_norm; max(w_i) / median(w_i) ≤ cap_w_max.
- C30-22 (Hessian and identifiability): H positive definite with cond(H) ≤ cap_cond; || U( hat{theta} ) || ≤ tol_score.
- C30-23 (Coverage/power): offline playback coverage |cov - target_cov| ≤ tol_cov; online alpha_spent ≤ alpha_budget.
- C30-24 (Bayesian diagnostics): Rhat ≤ cap_rhat, ESS/N ≥ min_ess_ratio, ppc_fail_rate ≤ tol_ppc.
- C30-25 (Arrival-time and windows): delta_form ≤ tol_Tarr; window coverage cov_rate ≥ tol_window_cover.
VII. Implementation Bindings I30-*
- I30-41 fit_glm(ds, formula, family) -> model: returns hat{theta}, SE, Var_robust, diagnostics, and residuals; invariant cond(H) ≤ cap_cond.
- I30-42 fit_bayes(ds, model_spec, priors) -> posterior: returns samples, summaries, ppc, and diagnostics; invariant Rhat ≤ cap_rhat.
- I30-43 bootstrap_metric(fn, ds, B) -> {est, CI}: BCa or percentile; invariant B ≥ B_min.
- I30-44 sequential_glrt(stream, H0, H1, h) -> stop_time: returns tau, decision, and alpha_spent.
- I30-45 emit_stats_manifest(results, policy) -> manifest.stats: writes contract outcomes, thresholds, TraceID, and signature.
VIII. Cross-References
- Schema binding and unit repair: see Methods.Cleaning v1.0, Ch. 3/4.
- Timeline and arrival time: see Methods.Cleaning v1.0, Ch. 5/6.
- Drift monitoring and alerts: see this volume, Ch. 7; experimental design: Ch. 8; causal inference: Ch. 10.
- Imaging quality mapping and metrics: see Methods.Imaging v1.0, Ch. 14.
IX. Quality and Risk Control
- SLI/SLO
latency_ms_p99 for estimation, coverage deviation, robust-to-model variance ratio, alpha_spent/alpha_budget, and Rhat shortfall rate. - Rollback
Contract failures trigger rollback to the previous signed manifest.stats; retain TraceID and the runtime environment digest.
Summary
This chapter codifies the common baseline for sampling, estimation, and uncertainty propagation via P302-* and S302-*. Subsequent chapters build on this foundation to cover complex sampling, resampling and cross-validation, error control, drift, A/B testing, and causal analysis—reusing this chapter’s contracts and implementation bindings.Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/