HomeDocs-Technical WhitePaper19-EFT.WP.Methods.SynthData v1.0

Chapter 2 — Axioms & Minimal Equations (Generation Baseline)


I. Scope & Targets

  1. Goals
    • Establish a unified generation baseline for bringing p_model(x; theta) close to p_data(x), with common conventions for divergence measures, time/path consistency, dimensional conservation, and privacy budgeting.
    • Define the necessary equations and threshold mappings required by the publication gate, serving as foundations for later implementation and evaluation chapters.
  2. Inputs
    • Reference distribution & samples: D_real ~ p_data(x); schema & constraints: SRef, Rules; time & path anchors: tau_mono, ts, gamma(ell).
    • Budget & policy: Privacy = {eps, delta, accounting}; evaluation metrics: Metrics = {W1, KL, JS, MMD, FID, KID, covg}.
  3. Outputs
    Baseline report {divergence, coverage, uncertainty, privacy}; threshold recommendations {tol_*}; contract mapping C40-2xx.

II. Terms & Variables


III. Axioms P402- (Non-Negotiables for the Baseline)*


IV. Minimal Equations S402- (Necessary Baseline Formulae)*

  1. S402-1 (Fitting Objective)
    • theta* = argmin_theta D( p_model(•; theta ) || p_data(•) ), where D ∈ { KL, JS, W1, MMD }.
    • Constrained form: min_theta ( D + λ * R(theta) ), where R encodes rule/physical/geometry/referential-integrity penalties.
  2. S402-2 (Wasserstein-1 Distance)
    W1(P,Q) = inf_{pi ∈ Π(P,Q)} ( ∫ c(x,y) d pi(x,y) ), with typical c(x,y)=||x-y||_1 or ||x-y||_2.
  3. S402-3 (MMD)
    MMD^2(P,Q;k) = || μ_P - μ_Q ||_H^2 = E_{x,x'} k(x,x') - 2 E_{x,y} k(x,y) + E_{y,y'} k(y,y') (declare kernel k and bandwidth).
  4. S402-4 (FID/KID in Image/Embedding Space)
    • FID = || mu_r - mu_s ||_2^2 + Tr( Sigma_r + Sigma_s - 2 * ( Sigma_r * Sigma_s )^{1/2} ).
    • KID uses a kernel unbiased estimator via multiple subsamples; declare feature-extraction protocol.
  5. S402-5 (Coverage & Support Sets)
    Discrete approximation: covg = | supp(D_syn) ∩ supp(D_real) | / | supp(D_real) |; in continuous space, approximate via grids or kernel mass ratios.
  6. S402-6 (Uncertainty Publication)
    • Expanded uncertainty: U = k * u_c, with k set by target coverage 1 - alpha; for bootstrap, publish percentile bands {q_{alpha/2}, q_{1-alpha/2}}.
    • Delta method (short form): var( g( hat{theta} ) ) ≈ ( ∇g )^T * cov( hat{theta} ) * ( ∇g ).
  7. S402-7 (Privacy Accounting, Simple Composition)
    eps_total = ( ∑_{r=1}^R eps_r ), delta_total = ( ∑_{r=1}^R delta_r ); for advanced accounting, publish the accountant and its parameters.
  8. S402-8 (Dual Arrival-Form Discrepancy)
    delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |, and assert delta_form ≤ tol_Tarr.
  9. S402-9 (Time Mapping & Jitter)
    ts = map_tau_to_ts( tau_mono; offset, skew ); publish jitter bound J under manifest.synth.timing.
  10. S402-10 (Weight-Effective Sample Size, if Reweighting Applied)
    n_eff_weights = ( (∑ w_i)^2 ) / ( ∑ w_i^2 ), and require W_norm = ( ∑ w_i ) / N ≈ 1.

V. Synthesis Flow M40-2 (Baseline Verification)


VI. Contracts & Assertions C40-2xx (Generation Baseline)


VII. Implementation Bindings I40- (Chapter Anchors)*


VIII. Cross-References


IX. Quality Metrics & Risk Control


Summary


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/