Home / Docs-Technical WhitePaper / 19-EFT.WP.Methods.SynthData v1.0
Chapter 12 — Fidelity & Utility Evaluation (FID/KID/W1/MMD/Downstream)
I. Scope & Targets
- Goals
- Evaluate synthetic data D_syn against reference real data D_real using a multi-metric suite for statistical fidelity (distributional closeness) and downstream utility (task performance), establishing auditable, reproducible, and regression-ready release gates.
- Metric family includes FID, KID, W1, MMD, precision/recall for generative models, plus downstream task deltas and power.
- Output manifest.synth.metrics.* together with intervals/uncertainty U = k * u_c.
- Applies to
Images, speech, text, tabular, time series, and multimodal settings; offline evaluation and streaming rolling evaluation. - Outputs
Per-window and full-corpus metrics, confidence intervals or posterior quantiles, pass/fail decisions, rollback guidance, and signature.
II. Terms & Variables
- Data & embeddings: D_real = {x_i}, D_syn = {x'_j}, phi(•) (frozen feature extractor), Z = phi(D).
- Statistics: mu_r, Sigma_r, mu_s, Sigma_s, kernel K(•,•), coupling pi (OT), cost matrix C.
- Distances & measures: FID, KID, MMD, W1, PR_gen (generative precision/recall), covg (coverage).
- Downstream: metric_real, metric_syn, delta_down = metric_real - metric_syn, power.
- Time & arrival: tau_mono, ts, Delta_t, T_arr, delta_form, offset/skew/J.
- Manifest keys: manifest.synth.metrics.{fid,kid,w1,mmd,pr,covg,delta_down,U,phi_spec,windows,signature}.
III. Axioms P412-*
- P412-1 (Frozen Feature Spec): The architecture, weights, preprocessing, and tensor shapes of phi must be fixed and declared in the manifest.
- P412-2 (Explicit Measure & Domain): All integrals/distances must declare measures, domains, and kernel parameters; do not mix embedding-space metrics with pixel/raw-space metrics.
- P412-3 (Window Consistency): Compute every metric on the same Delta_t windows over tau_mono; publish on ts.
- P412-4 (Uncertainty Required): Each metric must carry an interval or quantiles U = k * u_c (bootstrap or posterior).
- P412-5 (Dual Arrival Forms): For metrics involving temporal/path components, record both T_arr formulations and delta_form before/after computation.
- P412-6 (Dimensions & Units): Treat FID/KID/MMD/W1 as dimensionless or self-consistent under their definition domains; execute check_dim(expr).
- P412-7 (No Privacy Degradation): Evaluation is post-processing and must not weaken DP(eps,delta) guarantees.
IV. Minimal Equations S412-*
- S412-1 (FID)
- With Z_r ~ N( mu_r, Sigma_r ), Z_s ~ N( mu_s, Sigma_s ):
FID = || mu_r - mu_s ||_2^2 + Tr( Sigma_r + Sigma_s - 2 * ( Sigma_r^(1/2) * Sigma_s * Sigma_r^(1/2) )^(1/2) )。 - mu_r = (1/n) * ∑_i phi(x_i), Sigma_r = Cov( phi(x_i) ); analogously for the synthetic set.
- With Z_r ~ N( mu_r, Sigma_r ), Z_s ~ N( mu_s, Sigma_s ):
- S412-2 (KID, unbiased MMD^2 with a polynomial kernel)
Let K(u,v) = ( (u^T v) / d + 1 )^3, d = dim(phi):
KID = MMD_unbiased^2 = ( 1 / (n*(n-1)) ) * ∑_{i != j} K( z_i, z_j ) + ( 1 / (m*(m-1)) ) * ∑_{i != j} K( z'_i, z'_j ) - ( 2 / (n*m) ) * ∑_{i,j} K( z_i, z'_j )。 - S412-3 (MMD, general kernel)
MMD^2( P, Q ) = || E_P[ phi_k(x) ] - E_Q[ phi_k(y) ] ||_H^2;use the unbiased empirical estimator as above; kernel and bandwidth must be explicit in the manifest. - S412-4 (Wasserstein-1 distance)
- W1( P, Q ) = inf_{pi ∈ Π(P,Q)} E_{(x,y)~pi}[ c(x,y) ],commonly c(x,y)=||x-y||_2。
- Empirical OT: π* = argmin_π ⟨π, C⟩ + λ * H(π),W1 = ⟨π*, C⟩。
- S412-5 (Precision/Recall for generative models, PR_gen)
Estimate manifold coverage and sample quality via k-NN ball neighborhood graphs in embedding space:
precision = P_{z'~Q}( z' ∈ M_P ),recall = P_{z~P}( z ∈ M_Q )。 - S412-6 (Downstream utility delta)
For downstream metric metric ∈ {AUC, mAP, F1, RMSE, BLEU, WER, ACC}:
delta_down = metric_real - metric_syn,with power power = 1 - beta and minimum detectable effect MDE. - S412-7 (Uncertainty & intervals)
Bootstrap: compute {FID}_b, {KID}_b via resampling b=1..B, CI_q = quantile( {metric}_b, q );
Delta method: SE( g( hat{theta} ) ) ≈ sqrt( g'( hat{theta} )^T Var( hat{theta} ) g'( hat{theta} ) )。 - S412-8 (Arrival-time consistency)
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell );T_arr = ( ∫ ( n_eff / c_ref ) d ell );
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |。
V. Metrology Flow M40-12 (Fidelity & Utility Closed Loop)
- Readiness
Freeze phi and preprocessing; lock the reference set D_real_ref and window Delta_t; declare kernel/bandwidth/OT regularization. - Embedding & Sampling
Compute Z_r, Z_s; for streaming, use rolling windows on tau_mono and stratified sampling to keep W_norm close to 1. - Metric Computation
Compute FID/KID/MMD/W1/PR_gen/covg, and simultaneously generate bootstrap distributions {metric}_b and U. - Downstream Evaluation
Fix a training/evaluation protocol: train_on = {real|syn|mix}, eval_on = {real_holdout}; output delta_down and power. - Arrival-Time & Time-Base Checks
For time/path data, write offset/skew/J, T_arr, delta_form; assert windowed upper bounds. - Contract Decision
Decide pass/fail and rollback guidance under C40-12xx; if needed, trigger reweighting or mapping (see Chapter 11). - Persist & Sign
Emit manifest.synth.metrics.*, audit logs, and signature; archive phi_spec and random seeds.
VI. Contracts & Assertions C40-12xx
- C40-1201 (FID/KID thresholds): FID ≤ tol_fid and KID ≤ tol_kid (report CI_95).
- C40-1202 (Kernel spec & MMD): MMD^2 ≤ tol_mmd; kernel and bandwidth must match the manifest.
- C40-1203 (W1 & stability): W1 ≤ tol_w1; entropic regularization λ within allowed range.
- C40-1204 (PR_gen & coverage): precision ≥ p_min and recall ≥ r_min and covg ≥ covg_min.
- C40-1205 (Downstream utility): | delta_down | ≤ tol_down, or power ≥ power_min; when a difference is detected, roll back per policy.
- C40-1206 (Arrival-time consistency): delta_form ≤ tol_Tarr; |offset| ≤ off_max, J ≤ J_max.
- C40-1207 (Reproducibility): reproducible(seed)=true; cross-run metric drift ≤ tol_reprod.
- C40-1208 (Dimensional checks): check_dim(expr)=true (especially for tabular/time-series domains).
VII. Implementation Bindings I40-12*
- compute_fid(Z_r, Z_s) -> {fid, CI}
- compute_kid(Z_r, Z_s, kernel_spec) -> {kid, CI}
- compute_mmd(Z_r, Z_s, kernel_spec) -> {mmd2, CI}
- compute_w1(Z_r, Z_s, cost, reg) -> {w1, reg_used}
- estimate_pr_gen(Z_r, Z_s, k) -> {precision, recall, covg}
- evaluate_downstream(protocol, datasets) -> {metric_real, metric_syn, delta_down, power}
- bootstrap_metrics(fn_list, Z_r, Z_s, B) -> U_bundle
- slice_and_window(ds, Delta_t, strata) -> {windows}
- timepath_hardening(ds, sync_ref) -> ds'(写入 offset/skew/J, T_arr, delta_form)
- emit_metrics_manifest(results, policy) -> manifest.synth.metrics
- Invariants: phi_spec immutable; sum(weights)/N ≈ 1; alpha/thresholds and kernel params match the manifest; delta_form ≤ tol_Tarr.
VIII. Cross-References
- This volume: Chapter 5 (stability in deep generation), Chapter 11 (reweighting/mapping), Chapter 13 (release process).
- Methods.CrossStats v1.0: Chapter 5 (resampling & intervals), Chapter 7 (drift detection), Chapter 14 (statistical SLOs).
- Methods.Cleaning v1.0: Chapter 10 (compliance, contracts & freeze) and Appendix B (contract library).
- Methods.Imaging v1.0: Chapter 14 (imaging quality metrics) as references for multimodal alignment.
IX. Quality SLIs & Risk Control
- SLIs
fid, kid, mmd2, w1, precision, recall, covg, |delta_down|, latency_ms_p99, delta_form, telemetry.drop_rate. - Risk Strategies
- FID/KID over threshold: audit phi_spec, batch normalization, kernel/bandwidth; if needed, roll back to earlier model weights.
- Large W1 with low PR_gen: prioritize structural alignment (OT/monotone mapping) or retrain with discriminator constraints.
- Large delta_down: switch train_on={mix}, adjust loss or representativeness (see Chapter 11).
- Streaming drift: re-estimate metrics on sliding windows and link alerts to freeze_release_synth rollback tags.
Summary
This chapter fixes evaluation conventions (P412-*), provides computable definitions and uncertainty reporting for FID/KID/W1/MMD/PR_gen/delta_down (S412-*), and operationalizes the closed loop (M40-12) from readiness → embedding → metrics → downstream → arrival-time checks → decision → persistence. C40-12xx anchors release gates and SLOs, while I40-12* defines engineering interfaces and invariants, culminating in auditable publication via manifest.synth.metrics.*.Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/