Home / Docs-Technical WhitePaper / 19-EFT.WP.Methods.SynthData v1.0
Chapter 11 — Bias, Fairness & Representativeness (Reweight / Mapping)
I. Scope & Targets
- Goals
- Govern the representativeness of D_syn relative to p_target and improve group fairness via reweighting and distribution mapping (including optimal transport and monotone mappings), thereby reducing selection bias and coverage gaps.
- Under intact physical/business constraints and privacy budgets, align both marginal and conditional distributions of the synthetic data with specified policy and compliance targets.
- Publish auditable manifest.synth.fairness.* along with window-consistent operational metrics.
- Applies to
Tabular, time-series, graph, and multimodal data; offline and streaming synthesis; scenarios coupled to downstream training or evaluation. - Outputs
A weight vector w(i) or a mapping T(x), plus an audit report, contract assertions, and a release manifest.
II. Terms & Variables
- Data & attributes: X (features), Y (label), A (protected attribute), C (condition/prompt), p_ref, p_syn, p_target.
- Reweighting: w(x), r(x) = p_target(x) / p_syn(x), w_clip, n_eff = ( (∑ w)^2 ) / ( ∑ w^2 ).
- Mapping & distances: T: X -> X', W1, MMD, KL, psi (population stability index).
- Fairness metrics: SPD = P(hat{Y}=1|A=1) - P(hat{Y}=1|A=0), EOD = (TPR_1-TPR_0, FPR_1-FPR_0), PPD (predictive parity diff).
- Time & arrival: tau_mono, ts, offset/skew/J, T_arr, delta_form.
- Manifest keys: manifest.synth.fairness.{method, targets, W1, MMD, psi, SPD, EOD, n_eff, windows, signature}.
*III. Axioms P411- **
- P411-1 (Explicit Targets): Before release, specify p_target (or a vector of group proportions) together with tolerances.
- P411-2 (Method Equivalence): Reweighting and mapping are equivalent at publication scope; provide impact analyses and rollback paths for both.
- P411-3 (Constraints First): Physical/business constraints and referential integrity (foreign keys) take priority over balancing actions.
- P411-4 (Unified Time Base): Evaluate representativeness and fairness on tau_mono; publish on ts with explicit windowing.
- P411-5 (Robustness): Any reweighting must ensure n_eff / N ≥ rho_min to prevent variance blow-up.
- P411-6 (No Privacy Degradation): Post-processing must not weaken DP(eps,delta); if retraining is required, re-account the budget.
- P411-7 (Dual Arrival Forms): When adjusting time/path fields, record both T_arr formulations and delta_form.
- P411-8 (Dimensional Conservation): Numerical mappings must pass check_dim(expr); no unit conflict may be introduced.
- P411-9 (Multimodal Consistency): Align multimodal bundles on the joint view; isolated per-modal balancing that breaks cross-modal coherence is disallowed.
*IV. Minimal Equations S411- **
- S411-1 (Density-Ratio Reweighting)
- r(x) = p_target(x) / p_syn(x),w(x) = clip( r(x), 0, w_clip ),w_norm = w / ( (1/N) * ∑_i w_i )。
- Effective sample size: n_eff = ( (∑_i w_i)^2 ) / ( ∑_i w_i^2 )。
- S411-2 (Weighted Risk with Fairness Constraints)
min_f E_{(x,y)~p_syn}[ w(x) * L( f(x), y ) ]
s.t. | P_w( hat{Y}=1 | A=a ) - P_w( hat{Y}=1 | A=b ) | ≤ tol_spd,
| TPR_w(a) - TPR_w(b) | ≤ tol_eod_TPR,| FPR_w(a) - FPR_w(b) | ≤ tol_eod_FPR。 - S411-3 (Wasserstein Mapping)
- T# p_syn = p_target,T = argmin_T E_{x~p_syn}[ c( x, T(x) ) ],with c(x,z) = ||x - z||_2 commonly used。
- Entropic OT: π* = argmin_π ⟨π, C⟩ + λ * H(π),T(x) = ∑_j π*(x,x_j') * x_j'。
- S411-4 (MMD Alignment)
MMD^2 = || (1/N)∑ φ(x_i) - (1/M)∑ φ(x_j') ||_H^2 ≤ tol_mmd,declare kernel and bandwidth in the manifest。 - S411-5 (Representativeness Ratio & Coverage)
repr_ratio(a) = p_syn(A=a) / p_target(A=a),covg = |supp(p_target) ∩ supp(p_syn)| / |supp(p_target)|。 - S411-6 (Arrival-Time Consistency)
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell );T_arr = ( ∫ ( n_eff / c_ref ) d ell );
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |。
V. Metrology Flow M40-11 (Representativeness & Fairness Loop)
- Target Setting
Define p_target (from a reference set or policy vector), protected attribute(s) A, windows, and thresholds. - Baseline Assessment
Compute repr_ratio(a), W1, MMD, psi, and covg on tau_mono windows. - Method Selection
If marginal proportion shifts dominate, begin with reweighting; for structural bias, use OT/monotone mapping or a hybrid. - Parameter Solving
- Estimate r(x) (logistic-ratio | KLIEP | Bregman), obtain w_norm, and verify n_eff;
- or solve for π* / T(x) while preserving foreign keys, units, and physical constraints.
- Audit & Rollback
Re-evaluate W1/MMD/psi/SPD/EOD; if any exceed thresholds, increase regularization, tighten w_clip, stratify alignment, or roll back to a coarser p_target. - Arrival-Time & Time-Base Handling
Apply timepath_hardening; write offset/skew/J, T_arr, and delta_form. - Persist & Freeze
Emit manifest.synth.fairness.* with signature; record methods, parameters, thresholds, and windowed results.
VI. Contracts & Assertions C40-11xx
- C40-1101 (Proportional Representativeness): For all a, |repr_ratio(a) - 1| ≤ tol_repr.
- C40-1102 (Structural Alignment): W1 ≤ tol_W1 and MMD ≤ tol_MMD and psi ≤ tol_psi.
- C40-1103 (Effective Sample Size): n_eff / N ≥ rho_min.
- C40-1104 (Fairness Constraints): |SPD| ≤ tol_spd, |TPR_1-TPR_0| ≤ tol_eod_TPR, |FPR_1-FPR_0| ≤ tol_eod_FPR.
- C40-1105 (Constraints & Units): assert_foreign_key = true and check_dim( T(x) - x ) valid.
- C40-1106 (Arrival Consistency): delta_form ≤ tol_Tarr; |offset| ≤ off_max, J ≤ J_max.
- C40-1107 (Multimodal Consistency): Joint-view alignment metrics must pass; single-modality pass is insufficient.
- C40-1108 (No Privacy Degradation): eps_total_after ≤ eps_total_before (post-processing does not weaken privacy guarantees).
VII. Implementation Bindings I40-11*
- estimate_density_ratio(ref, syn, method) -> r_hat
- compute_reweight(r_hat, clip, normalize) -> w_norm, n_eff
- fit_ot_map(syn, target, cost, reg) -> T, π*
- apply_mapping(ds_syn, T, constraints) -> ds_syn'
- audit_representativeness(ds_syn', ref, metrics) -> {W1, MMD, psi, repr_ratio, covg}
- audit_group_fairness(ds_syn', model_spec|metric) -> {SPD, EOD, PPD}
- timepath_hardening(ds_syn', sync_ref) -> ds_syn_t (writes T_arr, delta_form, offset/skew/J)
- emit_fairness_manifest(results, policy) -> manifest.synth.fairness
- Invariants: sum(w_norm)/N ≈ 1; n_eff increases monotonically with larger w_clip; foreign keys preserved; delta_form ≤ tol_Tarr; unit/dimension checks pass.
VIII. Cross-References
- Methods.CrossStats v1.0: Chapter 7 (drift & alignment), Chapter 9 (calibration transfer).
- This volume: Chapter 5 (robustness in deep generation), Chapter 9 (multimodal balancing), Chapter 12 (fidelity & utility evaluation), and Chapter 10 (privacy constraints).
- Methods.Cleaning v1.0: Chapter 10 (release freeze) and Appendix B (contract library).
IX. Quality SLIs & Risk Control
- SLIs
W1, MMD, psi, repr_ratio_p95, n_eff/N, |SPD|, |EOD|, latency_ms_p99 (alignment pipeline), delta_form. - Strategies
- Divergent reweighting: tighten w_clip, stratify estimation, add regularization/smoothing.
- Residual structural bias: switch to OT or segmented monotone mapping; coarsen p_target if needed.
- Fairness–utility conflict: multi-objective trade-offs or relaxed constraints; cost-sensitive learning with stability validation.
- Streaming drift: update w/T on sliding windows and integrate with alert/rollback.
Summary
: non-negotiable axioms P411-*; minimal equations S411-* for density ratios, OT/MMD alignment, and fairness constraints; the flow M40-11 from readiness to alignment, auditing, and freeze; release gates C40-11xx; and delivery interfaces I40-11* ensuring engineering invariants. Deliverables are published via manifest.synth.fairness.* for auditable external use.mapping and reweightingThis chapter defines a governance loop for representativeness and fairness centered onCopyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/