HomeDocs-Technical WhitePaper18-EFT.WP.Methods.CrossStats v1.0

Chapter 3 — Sampling Design and Survey Weights (SRS/STRAT/CLUSTER)


One-Line Objective

Establish a unified convention for weights, variance estimation, and calibration across SRS, stratified, and cluster (including PPS) sampling, and provide an executable, end-to-end specification from design through release.

I. Scope and Objects

  1. Scope
    • Applies to probability sampling on finite and streaming populations, weighted estimation, design-based variance, and weight calibration.
    • Supports multistage designs (PSU/SSU), post-stratification and iterative proportional fitting (raking/IPF), and is compatible with event streams (Poisson/Bernoulli/reservoir) and time windows Delta_t.
  2. Objects
    • Population size N (possibly unknown), sample size n, stratum h ∈ {1..H}, cluster c ∈ {1..C}, inclusion probability pi(i), survey weight w_i = 1 / pi(i), normalization factor W_norm = ( ∑ w_i ) / N_hat.
    • Calibration matrix (control totals) X_cal, constraint vector t_cal (e.g., marginal totals).
    • Time semantics: perform windowing and online sampling on tau_mono; publish on ts. For any statistic involving arrival time, record both T_arr conventions and delta_form in parallel.

II. Terms and Variables

  1. Estimators
    • Horvitz–Thompson: hat{T}_HT = ∑_i ( y_i / pi(i) ); Hájek ratio: hat{Y}_Hajek = ( ∑ w_i y_i ) / ( ∑ w_i ).
    • Stratified mean: hat{Y} = ∑_h ( N_h / N ) * hat{Y}_h, where hat{Y}_h = ( ∑_{i ∈ h} w_i y_i ) / ( ∑_{i ∈ h} w_i ).
  2. Variance and design effects
    • DEFF = Var_complex( hat{Y} ) / Var_SRS( hat{Y} ); weight-induced DEFF_w ≈ 1 + CV(w)^2.
    • Intra-cluster correlation rho_icc; cluster effect DEFF_c ≈ 1 + ( m_bar - 1 ) * rho_icc.
  3. Weight calibration
    Calibrated weights w_i* satisfy ∑ w_i* x_i = t_cal; raking applies multiplicative updates over multiple margins.
  4. Replicate weights
    Replication scheme R ∈ {1..R_rep} (JK, BRR, Bootstrap), with replicate weights w_i^(r).
  5. Units and dimensions
    unit(w_i) = 1, dim(w_i) = []; estimators inherit the unit of y; run check_dim before release.

III. Axioms P303-*


IV. Minimal Equations S303-*


V. Statistical Process M30-3 (Design → Sampling → Weights → Variance → Calibration → Release)


VI. Contracts and Assertions (Examples C30-31x)


VII. Implementation Bindings I30-*

  1. I30-31 compute_weights(ds, scheme) -> w
    • scheme ∈ {SRSWOR, STRAT, CLUSTER_PPS, POISSON, BERNOULLI, RESERVOIR}; outputs w, W_norm, logs, and extremum report.
    • Invariants: sum(w)/N_hat ≈ 1, seed/version persisted.
  2. I30-32 estimate_variance(ds, method) -> var_report
    method ∈ {LINEARIZATION, JK, BRR, BOOT}; outputs SE, DEFF, rho_icc (if applicable).
  3. I30-33 calibrate_weights(w, X_cal, t_cal, method) -> w*
    method ∈ {RAKING, QP_CAL}; outputs convergence curves and residuals.
  4. I30-34 stream_sampler(stream, policy) -> sample, w
    policy ∈ {POISSON(p), RESERVOIR(K), WINDOW(Delta_t)}; outputs sample and time metadata.
  5. I30-35 emit_sampling_manifest(design, weights, variance) -> manifest.stats.sampling
    Writes TraceID, design summary, parameters, contract evaluation, and signature.

VIII. Cross-References


IX. Quality Metrics and Risk Control


Summary

This chapter delivers a closed loop for sampling and weight governance from design to release: P303-* constrains probability conventions and timebase, S303-* covers inclusion probabilities, variance, and calibration primitives, M30-3 prescribes the workflow, and I30-* supplies the binding interfaces—establishing a reusable sampling substrate for coverage assessment, A/B testing, and causal inference.

Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/