Home / Docs-Technical WhitePaper / 19-EFT.WP.Methods.SynthData v1.0
Chapter 2 — Axioms & Minimal Equations (Generation Baseline)
I. Scope & Targets
- Goals
- Establish a unified generation baseline for bringing p_model(x; theta) close to p_data(x), with common conventions for divergence measures, time/path consistency, dimensional conservation, and privacy budgeting.
- Define the necessary equations and threshold mappings required by the publication gate, serving as foundations for later implementation and evaluation chapters.
- Inputs
- Reference distribution & samples: D_real ~ p_data(x); schema & constraints: SRef, Rules; time & path anchors: tau_mono, ts, gamma(ell).
- Budget & policy: Privacy = {eps, delta, accounting}; evaluation metrics: Metrics = {W1, KL, JS, MMD, FID, KID, covg}.
- Outputs
Baseline report {divergence, coverage, uncertainty, privacy}; threshold recommendations {tol_*}; contract mapping C40-2xx.
II. Terms & Variables
- Distributions & parameters: p_data(x), p_model(x; theta), theta, z ~ p(z).
- Divergences & kernels: KL, JS, W1, MMD(k), FID(mu,Sigma).
- Time & arrival: T_arr, c_ref, n_eff (effective refractive index—distinct from weight-based n_eff), gamma(ell), d ell.
- Uncertainty: u(x) (standard uncertainty), U = k * u_c (coverage/expanded uncertainty), alpha (confidence level).
- Privacy budget: DP(eps, delta), eps_total, delta_total.
- Coverage & representativeness: covg, supp(•), n_eff_weights = ( (∑ w)^2 ) / ( ∑ w^2 ).
III. Axioms P402- (Non-Negotiables for the Baseline)*
- P402-1 (Explicit Measures): Any integral or expectation must specify domain and measure, e.g., ( ∫_Omega p(x) dx ) = 1.
- P402-2 (Declared Divergences): Fidelity is defined only via explicit divergence/distance families—no visual heuristics as substitutes.
- P402-3 (Dimensional Conservation): Any physical quantity entering models or metrics must satisfy check_dim(expr); compute only after unit harmonization.
- P402-4 (Unified Time Base): Evaluate windows on tau_mono; publish on ts with offset/skew/J recorded.
- P402-5 (Dual Arrival-Time Forms): When propagation/paths are involved, compute both
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) and T_arr = ( ∫ ( n_eff / c_ref ) d ell ), and publish delta_form. - P402-6 (Explicit Privacy Budgeting): Before any synthetic release, provide DP(eps, delta) accounting and cumulative budgets eps_total, delta_total.
- P402-7 (Minimum Publication Criterion): pass = fidelity_ok ∧ privacy_ok ∧ contract_ok ∧ manifest_signed.
- P402-8 (Reproducibility): With identical SynthSpec, seed/rng, and version, D_syn must be statistically equivalent.
- P402-9 (Coverage Lower Bounds): Coverage covg for key strata/enumerations must meet policy minima.
- P402-10 (Steady State & Backpressure): Streaming generation on G=(V,E) satisfies stability and backpressure constraints (see Core.Threads v1.0).
IV. Minimal Equations S402- (Necessary Baseline Formulae)*
- S402-1 (Fitting Objective)
- theta* = argmin_theta D( p_model(•; theta ) || p_data(•) ), where D ∈ { KL, JS, W1, MMD }.
- Constrained form: min_theta ( D + λ * R(theta) ), where R encodes rule/physical/geometry/referential-integrity penalties.
- S402-2 (Wasserstein-1 Distance)
W1(P,Q) = inf_{pi ∈ Π(P,Q)} ( ∫ c(x,y) d pi(x,y) ), with typical c(x,y)=||x-y||_1 or ||x-y||_2. - S402-3 (MMD)
MMD^2(P,Q;k) = || μ_P - μ_Q ||_H^2 = E_{x,x'} k(x,x') - 2 E_{x,y} k(x,y) + E_{y,y'} k(y,y') (declare kernel k and bandwidth). - S402-4 (FID/KID in Image/Embedding Space)
- FID = || mu_r - mu_s ||_2^2 + Tr( Sigma_r + Sigma_s - 2 * ( Sigma_r * Sigma_s )^{1/2} ).
- KID uses a kernel unbiased estimator via multiple subsamples; declare feature-extraction protocol.
- S402-5 (Coverage & Support Sets)
Discrete approximation: covg = | supp(D_syn) ∩ supp(D_real) | / | supp(D_real) |; in continuous space, approximate via grids or kernel mass ratios. - S402-6 (Uncertainty Publication)
- Expanded uncertainty: U = k * u_c, with k set by target coverage 1 - alpha; for bootstrap, publish percentile bands {q_{alpha/2}, q_{1-alpha/2}}.
- Delta method (short form): var( g( hat{theta} ) ) ≈ ( ∇g )^T * cov( hat{theta} ) * ( ∇g ).
- S402-7 (Privacy Accounting, Simple Composition)
eps_total = ( ∑_{r=1}^R eps_r ), delta_total = ( ∑_{r=1}^R delta_r ); for advanced accounting, publish the accountant and its parameters. - S402-8 (Dual Arrival-Form Discrepancy)
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |, and assert delta_form ≤ tol_Tarr. - S402-9 (Time Mapping & Jitter)
ts = map_tau_to_ts( tau_mono; offset, skew ); publish jitter bound J under manifest.synth.timing. - S402-10 (Weight-Effective Sample Size, if Reweighting Applied)
n_eff_weights = ( (∑ w_i)^2 ) / ( ∑ w_i^2 ), and require W_norm = ( ∑ w_i ) / N ≈ 1.
V. Synthesis Flow M40-2 (Baseline Verification)
- Readiness
Confirm SRef/Rules; units & dimensions pass check_dim; set Metrics and preliminary thresholds {tol_W1, tol_MMD, tol_FID, covg_min}. - Training/Fitting
Solve theta* = argmin_theta D + λR; record theta_ref and seed/rng. - Path/Time Consistency
Compute both T_arr forms and delta_form; run map_tau_to_ts and measure offset/skew/J. - Metrics & Uncertainty
Evaluate {W1, KL, JS, MMD, FID, KID, covg}; produce U or CI. - Privacy Accounting
Aggregate eps_total, delta_total; if over budget, roll back or retrain with down-sampling/noise. - Contracts & Persistence
Execute C40-2xx contracts; on pass, write manifest.synth and sign; otherwise, issue remediation guidance and rollback points.
VI. Contracts & Assertions C40-2xx (Generation Baseline)
- C40-201 (Divergence Thresholds): W1 ≤ tol_W1 ∧ MMD ≤ tol_MMD (or FID ≤ tol_FID for imaging).
- C40-202 (Coverage Lower Bound): covg ≥ covg_min, and per-stratum covg_group ≥ covg_min_group.
- C40-203 (Arrival Consistency): delta_form ≤ tol_Tarr.
- C40-204 (Time-Base Jitter): J ≤ J_max; |offset| ≤ off_max; |skew| ≤ skew_max.
- C40-205 (Privacy Budget): eps_total ≤ bud_eps ∧ delta_total ≤ bud_delta; disclose accounting method.
- C40-206 (Dimensional Conservation): check_dim(expr)=true; publish the unit table with the manifest.
- C40-207 (Effective Sample Size, if Reweighted): n_eff_weights / N ≥ r_min and W_norm ≈ 1.
- C40-208 (Reproducibility): repeat-run deviation |metric' - metric| ≤ tol_repro.
VII. Implementation Bindings I40- (Chapter Anchors)*
- I40-201 measure_fidelity(real, syn, metrics) -> report (implements W1/KL/JS/MMD/FID/KID/covg with CI or U)
- I40-202 privacy_accounting(steps, mechanism) -> {eps_total, delta_total}
- I40-203 propagate_uncertainty(estimates, method) -> {U or CI} where method ∈ {delta, bootstrap, bayes}
- I40-204 enforce_timepath_baseline(ds_syn, ref) -> {delta_form, offset, skew, J}
- I40-205 evaluate_weight_effective(w) -> {n_eff_weights, W_norm}
- I40-206 assert_synth_contract(report, rules) -> decision
- Invariants: sum(w)/N ≈ 1; eps_total within budget; delta_form ≤ tol_Tarr; unit/dimension checks pass; identical seed reproduces equivalent outputs.
VIII. Cross-References
- Cleaning: units/dimensions (Ch.4), timeline & synchronization (Ch.5), paths & arrival time (Ch.6), contracts & release (Ch.10).
- Imaging: metric conventions & embedding spaces (PSF→metric interpretation, Ch.5; quality & audit, Ch.14).
- Cross-Statistics: uncertainty propagation (Chs.4/5), multiplicity (Ch.6), drift & alignment (Ch.7).
IX. Quality Metrics & Risk Control
- Baseline panel fields: metrics.{W1,MMD,FID,KID,KL,JS,covg}, uncertainty.{U,CI}, timing.{offset,skew,J}, arrival.delta_form, privacy.{eps_total,delta_total}.
- Risk policy: when W1 or MMD exceed thresholds, prioritize reweighting/mapping or retraining; when eps_total nears budget, switch to low-fidelity mode or delay release; when delta_form breaches, mandate parameter review for path/medium.
Summary
- This chapter codifies the baseline’s non-negotiables P402-* and supplies the necessary equations S402-* for divergence, coverage, arrival time, time-base, privacy, and uncertainty.
- Through the verification flow M40-2 and contracts C40-2xx, it provides a unified gate and implementation anchors for subsequent generator families, conditional control, and release governance.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/