Home / Docs-Technical WhitePaper / 19-EFT.WP.Methods.SynthData v1.0
Appendix E — Error & Uncertainty Propagation (Synthesis Edition)
I. Scope & Objectives
- Provide a unified definition for the sources, combination, and publication of uncertainty across the synthetic-data lifecycle—design → training → generation → evaluation → release. Specify coverage reporting U = k * u_c, compatible with both offline batches and streaming increments.
- Deliverables: an uncertainty ledger err_budget.*, method tags method ∈ {analytic, bootstrap, posterior}, window & time-base fields, and mappings to contracts.* and manifest.synth.*.
II. Terms & Symbols
- Models & parameters: theta (generator parameters), hat_theta (estimate), Sigma_theta = Cov(hat_theta).
- Samples & weights: D_real, D_syn, n_real, n_syn, w_i, n_eff = ( ( ∑ w_i )^2 ) / ( ∑ w_i^2 ).
- Metrics & estimators: y = g(x) (metric functional), u(x) (standard uncertainty), u_c(y) (combined standard uncertainty), U = k * u_c (coverage uncertainty).
- Mechanism noise: sigma_dp (DP mechanism std or equivalent), eps_total, delta_total (privacy budget).
- Time base & arrival: tau_mono, ts, T_arr, delta_form, offset/skew/J.
- Uncertainty components: u_model (model estimation), u_sampling (finite sampling), u_dp (privacy), u_align (timebase/path), u_eval (evaluator/embedding), u_env (scenario/domain randomization).
*III. Axioms P40E- **
- P40E-1 (Explicit Measure): Declare domains and measures for all expectations, variances, and integrals; fix the embedding/preprocessing gauges and version them.
- P40E-2 (Coverage Publication): Publish u_c and U = k * u_c alongside every external metric, stating k and method provenance.
- P40E-3 (Traceable Components): The uncertainty ledger must separate {u_model, u_sampling, u_dp, u_align, u_eval, u_env}—no aggregation without lineage.
- P40E-4 (Time-Base Consistency): Compute windows on tau_mono, publish on ts, and record offset/skew/J plus both T_arr formulations with delta_form.
- P40E-5 (Dimensional Conservation): Run check_dim( y - f(x) ) before propagation; log any unit conversions explicitly.
- P40E-6 (Privacy-Noise Independence): Prefer the independence assumption between DP noise and data; if dependence exists, publish the covariance term or a valid upper bound.
- P40E-7 (Reproducibility): Publish the seeds/RNGs and bootstrap size B used to reproduce the uncertainty evaluation.
*IV. Minimal Equations S40E- **
- S40E-1 (Delta linearization): u_c^2( y ) = grad_x g(x)^T * Cov(x) * grad_x g(x)。
- S40E-2 (Parameter propagation): u_model^2( y ) = J_theta * Sigma_theta * J_theta^T,where J_theta = ∂g/∂theta |_{hat_theta}。
- S40E-3 (Finite sampling): u_sampling^2( y ) ≈ Var_hat( y | theta ) / n_eff。
- S40E-4 (Privacy-noise propagation): if y = g(x + n_dp) with n_dp ~ (0, Sigma_dp), then u_dp^2( y ) = J_x * Sigma_dp * J_x^T。
- S40E-5 (Arrival-time contribution):
u_align^2( y ) = ( ∂g/∂T_arr )^2 * u^2( T_arr ),where
u^2( T_arr ) = u_jitter^2 + ( delta_form^2 ) / 3 (uniform upper-bound approximation)。 - S40E-6 (Synthesis combination): if components are approximately independent,
u_c^2 = u_model^2 + u_sampling^2 + u_dp^2 + u_align^2 + u_eval^2 + u_env^2;
otherwise add 2 * ∑ Cov_i,j for correlated parts。 - S40E-7 (Bootstrap): u(y) = std( { y^(b) }_{b=1..B} ),CI = quantile( { y^(b) }, [alpha/2, 1-alpha/2] )。
- S40E-8 (Bayesian): U = k * sd( { g(theta^(s)) }_{s=1..S} ),or publish quantile band q_{alpha/2}, q_{1-alpha/2}。
- S40E-9 (Weighted SE): SE( mean_w )^2 = ( ∑ w_i^2 * (x_i - mean_w)^2 ) / ( ( ∑ w_i )^2 )。
V. Propagation Paths & Ledger Structure
- Canonical chain (declare each u_*)
- Design & calibration: hat_theta, Sigma_theta → u_model.
- Sampling & generation: n_syn, w_i → u_sampling (including n_eff).
- Constraints & alignment: enforce_constraints, align_timepath → u_align.
- Privacy & watermark: DP mechanism + accounting → u_dp with {eps_total, delta_total}.
- Evaluation & embeddings: FID/KID/MMD/W1/utility_gap → u_eval.
- Domain randomization: scene-parameter variance → u_env.
- Combine: u_c^2 = ∑ u_*^2 (+ correlations), publish U = k * u_c.
- Ledger keys (suggested)
- err_budget.model/sampling/dp/align/eval/env = {method, value, details}。
- details must include at least B|S, kernel|backbone|layer, seed/rng, window, unit/dim.
VI. Synthesis-Specific Notes & Formulae
- Embedding metrics (e.g., FID)
Approximation: u_eval^2( FID ) ≈ grad_{mu,Sigma} FID^T * Cov( mu,Sigma ) * grad_{mu,Sigma} FID;obtain Cov( mu,Sigma ) via asymptotics or bootstrap. - Kernel-parameter sensitivity (e.g., MMD_RBF)
u_eval^2 ≈ ( ∂MMD/∂h )^2 * u^2(h ) + a Delta-method sample term, where h is the bandwidth. - DP synthesis (counts/histogram constraints)
Gaussian mechanism: if c = true_count + n, n ~ N(0, sigma_dp^2), then u_dp^2 = sigma_dp^2; push through y=g(c) with J_c linearization. - Conditional/controllable generation
Condition uncertainty: u_env^2 = J_c * Cov(c) * J_c^T; if rejection sampling reduces n_eff, update u_sampling. - Time-series / event synthesis
Arrival-rate variance contributes to u_sampling; use block bootstrap for W1(inter_arrival) to avoid underestimating dependence. - Multimodal consistency
When aggregating across modalities, prefer median-of-means and publish u_agg; alternatively provide per-modality intervals and an upper bound on cross-metric correlations.
VII. Windows & Time-Base Alignment
- Windowing policy
Fixed span Delta_t with sliding step; require n_eff ≥ n_eff_min before publication; otherwise delay or widen the window. - Alignment requirements
Record offset/skew/J; for path-involved gauges, publish both T_arr forms and delta_form with its u(T_arr). - Streaming recursion (mean & variance)
- mu_{t+1} = mu_t + ( x_{t+1} - mu_t ) / n;
- S_{t+1} = S_t + ( x_{t+1} - mu_t ) * ( x_{t+1} - mu_{t+1} );
- u = sqrt( S / ( n - 1 ) );use n_eff for weighted variants.
*VIII. Contracts & Assertions C40E- **
- C40E-1 (Coverage Gating): assert( y + k * u_c(y) ≤ tol_y ) or require the two-sided band to cover the target interval.
- C40E-2 (Effective Sample Size): assert( n_eff ≥ n_eff_min ) (default recommendation n_eff_min = 128).
- C40E-3 (DP Budget): assert( eps_total ≤ eps_cap ∧ delta_total ≤ delta_cap ), recording the accounting method.
- C40E-4 (Arrival Consistency): assert( delta_form ≤ tol_Tarr ), add its contribution to u_align.
- C40E-5 (Method Disclosure): method ∈ {analytic, bootstrap, posterior} with all key parameters persisted; otherwise mark as non-compliant.
- C40E-6 (Dimensional Check): assert( check_dim( y - f(x) ) = pass ).
IX. I40- Implementation Bindings (Uncertainty)*
- propagate_uncertainty_synth(report_in) -> err_budget
- Inputs: hat_theta, Sigma_theta, metrics_raw, dp_config, align_info, env_cov.
- Outputs: component u_*, u_c, U, and details.
- bootstrap_metrics(ds_syn, metrics, B, seed) -> {u, CI, samples}
Persist bootstrap resamples and intervals. - posterior_pushforward(posterior, g, S) -> {u, CI}
Sample posterior theta^(s) and push through y=g(theta). - dp_accounting_and_variance(steps) -> {eps_total, delta_total, Sigma_dp}
Return budgets and equivalent covariance from mechanism + accounting. - align_timepath_for_uncertainty(ds, sync_ref) -> {T_arr_form1, T_arr_form2, delta_form, u(T_arr)}
Harmonize with I40-81 align_timepath and emit alignment uncertainty. - emit_uncertainty_manifest(err_budget) -> manifest.synth.metrics[*].u
Write into manifest.synth. - Invariants: reproducible(seed); delta_form ≤ tol_Tarr; budgets eps_total, delta_total within limits; check_dim = pass; method and parameters recorded.
X. Cross-References
- Time-base & dual arrival forms: Methods.Cleaning v1.0 Chapters 6/10 and Appendices B/C.
- Imaging & embedding-metric uncertainty: Methods.Imaging v1.0 Chapter 14 and Appendices D/E.
- Statistical propagation & coverage conventions: Methods.CrossStats v1.0 Chapters 2/4/5 and Appendix E.
XI. Summary
This appendix specifies the layered components of uncertainty for synthetic data, dual routes for computation (linearization and resampling), standard combination of DP and arrival-time contributions, and the integration of u_c and U = k * u_c into contracts and manifests. The published err_budget.* supports cross-volume reuse, cross-version audits, and replayable reproduction with consistent comparability.Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/