Home / Docs-Technical WhitePaper / 19-EFT.WP.Methods.SynthData v1.0
Chapter 7 — Conditional & Controllable Generation (Prompt/CFG/Rules)
I. Scope & Targets
- Goals
- Define a unified convention for conditional and controllable generation: x ~ p_model(x | c; theta), where c may be a text prompt, structured conditions, numeric bounds, or a set of rules.
- Specify cooperative mechanisms among CFG (classifier-free guidance), hard constraints, and soft penalties; achieve interpretable control without violating units/dimensions or physical constraints.
- Bring condition alignment, rule satisfaction, and downstream utility under contracts and manifests: manifest.synth.cond.*.
- Inputs
Generation engine engine(theta) (see Chapter 5), condition set Cset, rules & constraints Rules = { g_j(x,c) ≤ 0 }, a reference distribution or panel ref, time-base and arrival-time conventions, SLOs and thresholds. - Outputs
Conditional samples D_syn(c), alignment reports and acceptance rate acc_rate(c), control strengths and schedules w_cfg(t), contract evaluation report, and manifest.synth.cond. - Applicability
Applies to tabular, time series, image/audio/text, and multimodal settings; when physical chains are involved, execute jointly with Chapter 6 (record T_arr, delta_form).
II. Terms & Symbols
- Conditions & control: c (condition/prompt), enc(c) (condition encoding), w_cfg ∈ R_+ (CFG strength), lambda_j ≥ 0 (Lagrange multipliers for rules), schedule w_cfg(t) (diffusion timestep schedule), policy (sampling & rules policy).
- Rules & acceptance: g_j(x,c) ≤ 0 (j-th hard constraint), m_acc ∈ {0,1} (accept indicator), acc_rate = ( ∑ m_acc ) / N.
- Alignment & utility: sim_embed(x,c) (condition–sample similarity, e.g., embedding cosine), util(x,c) (downstream utility score), penalty(x,c) (rule penalty).
- Time & arrival: tau_mono, ts, T_arr, gamma(ell), delta_form, offset/skew/J.
- Metrology & units: unit(•), dim(•), check_dim(expr).
*III. Axioms P407- **
- P407-1 (Explicit Conditions): The semantics, domain, and units of c must be explicit and persisted; implicit defaults are prohibited.
- P407-2 (Minimal Distortion): Control must preserve the statistical fidelity of p_model(x|c); any rule or post-processing should minimize distortion to p_model.
- P407-3 (Hard-Constraint Priority): Use hard constraints (reject or project) for safety, physics, and referential integrity; use soft penalties or resampling for the rest.
- P407-4 (Auditable Schedules): w_cfg(t) and related schedules (sampling temperature, truncation, etc.) must be functional and reproducible.
- P407-5 (Measurable Alignment): Evaluate condition–sample alignment with repeatable metrics sim_embed(x,c) and bring them into contracts.
- P407-6 (Units & Dimensions): Conditional control must not violate unit/dim; check_dim(expr) must pass.
- P407-7 (Time Base & Arrival): When time/path propagation is present, record both T_arr formulations and delta_form; evaluate windows on tau_mono.
- P407-8 (Fairness & Bias): Publish coverage and disparity metrics across important subgroups for p(•|c); provide debiasing or reweighting paths.
- P407-9 (Reproducibility & Trace): Persist seed/rng/enc(c)/w_cfg(t)/Rules in the manifest and sign it.
- P407-10 (Multimodal Consistency): Declare the shared embedding or alignment mapping for multimodal conditions to avoid cross-modal ambiguity.
*IV. Minimal Equations S407- **
- S407-1 (Conditional Generation Base Form)
x ~ p_model( x | c; theta ),目标最小化 D( p_model(x|c) || p_ref(x|c) ),D ∈ {W1, KL, MMD}。 - S407-2 (Generic CFG Form)
With guided fields s_cond(z,t) = s_theta(z,t|c) and s_uncond(z,t) = s_theta(z,t|∅), define
s_guided(z,t) = s_uncond(z,t) + w_cfg(t) * ( s_cond(z,t) - s_uncond(z,t) )。 - S407-3 (Soft Penalties with Lagrange Multipliers)
min_theta E_{c} E_{x~p_theta(•|c)} [ L_fid(x,c) + ( ∑_j lambda_j * g_j^+(x,c) ) ],其中 g_j^+(x,c) = max( 0, g_j(x,c) )。 - S407-4 (Hard-Constraint Accept/Reject or Projection)
- m_acc(x,c) = 1 若 ∀j, g_j(x,c) ≤ 0,否则 0;接受率 acc_rate = ( ∑ m_acc ) / N。
- Constraint projection: x' = Pi_C(x) = argmin_{z ∈ C} d(z,x) with residual res_cons = d(x',x).
- S407-5 (Alignment Thresholds & Utility)
sim_embed(x,c) ≥ sim_min,util(x,c) ≥ util_min;不达标触发重采样或增益控制。 - S407-6 (Sequential KL Regularization)
max_theta E_{x~p_theta(•|c)}[ R(x,c) ] - beta * KL( p_theta(•|c) || p_ref(•|c) ),beta ≥ 0。 - S407-7 (Dual Arrival Forms)
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) 与 T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
delta_form = | ( 1 / c_ref ) * ( ∫ n_eff d ell ) - ( ∫ ( n_eff / c_ref ) d ell ) |。
V. Metrology Flow M40-7 (Conditional & Controllable Loop)
- Ready
Fix Cset, the encoding enc(c), alignment metric sim_embed, rules Rules, schedule w_cfg(t), thresholds sim_min/util_min, acc_min, SLOs, and manifest keys. - Condition Encoding
Parse c into enc(c) (text/structured/numeric); normalize units and run dimensional checks check_dim. - Candidate Sampling
Have the engine produce initial states z_t or x0; record seed/rng. - Guidance & Stepping
Compute s_cond and s_uncond, combine to s_guided, and step with w_cfg(t) to propose x. - Rule Checking
Evaluate g_j(x,c); for violations, either reject or project x ← Pi_C(x); record res_cons and m_acc. - Alignment & Utility
Measure sim_embed(x,c) and util(x,c); if below thresholds, adaptively raise w_cfg or resample. - Reweighting & Debiasing (optional)
Compute a map or weights w(i) so p_syn(•|c) aligns with ref; persist weights and n_eff. - Time Base & Arrival (if applicable)
Write tau_mono/ts, the two T_arr formulations with delta_form, and offset/skew/J. - Persist & Publish
Emit manifest.synth.cond including enc(c), w_cfg(t), Rules, thresholds, metrics, acc_rate, and signature; freeze release or roll back.
VI. Contracts & Assertions C40-7xx
- C40-701 (Valid Encoding): enc(c) lies within the registered domain; syntax and units are valid.
- C40-702 (Hard Constraints): P( g_j(x,c) ≤ 0 ) ≥ p_pass_min, or res_cons ≤ tol_cons.
- C40-703 (Alignment): sim_embed_p50 ≥ sim_min and sim_embed_p05 ≥ sim_floor.
- C40-704 (Acceptance Rate): acc_rate ≥ acc_min; if not, publish the retry rounds R used.
- C40-705 (Units & Dimensions): check_dim(expr)=true.
- C40-706 (Time/Arrival): non_decreasing(tau_mono), J ≤ J_max, delta_form ≤ tol_Tarr.
- C40-707 (Fairness & Coverage): For key subgroup g, | metric_g - metric_all | ≤ gap_max, or publish the reweighting map.
- C40-708 (SLO): latency_ms_p99 ≤ SLO_cond, oom_rate ≤ oom_max.
- C40-709 (Reproducibility): seed/rng/w_cfg(t)/Rules replayable; hash signatures consistent.
VII. Implementation Bindings I40-7*
- encode_condition(c, registry) -> enc(c)
- compose_guidance(engine, method, w_schedule) -> engine' (where method ∈ {CFG, classifier, energy})
- sample_conditional(engine', n, enc(c), seed) -> ds_syn
- evaluate_rules(ds_syn, Rules) -> {m_acc, res_cons, report}
- accept_or_project(ds_syn, Rules, projector) -> ds_syn'
- measure_alignment(ds_syn', enc(c), metric) -> {sim_stats, util_stats}
- rebalance_conditional(ref, ds_syn', method) -> {map|w}
- annotate_time_arrival(ds_syn', ref_path) -> ds_syn'' (write T_arr, delta_form, offset/skew/J)
- emit_conditional_manifest(artifacts) -> manifest.synth.cond
- Invariants: reproducible(seed); acc_rate ≥ acc_min; sim_embed_p50 ≥ sim_min; delta_form ≤ tol_Tarr; units/dimensions pass checks.
VIII. Cross-References
- This volume: Chapter 5 (CFG implementations in deep generators), Chapter 6 (physics/simulation & constraint projection), Chapter 12 (fidelity & utility evaluation), Chapter 13 (release & manifests).
- Methods.Cleaning v1.0: Chapter 10 (contracts & release freeze).
- Methods.CrossStats v1.0: Chapters 7/9/14 (distribution alignment, calibration transfer, statistical SLOs).
- Methods.Imaging v1.0: Chapter 13 (time/path-gated arrival-time consistency).
IX. Quality Metrics & Risk Control
- Core SLIs
sim_embed_p50/p05/p95, acc_rate, res_cons, n_eff, latency_ms_p99, oom_rate, fairness_gap, delta_form, J. - Common risks & mitigations
- Overlarge CFG causing mode collapse → use increasing w_cfg(t) schedules and early stop; introduce KL regularization.
- Overtight rules causing low acceptance → relax to soft penalties, switch to projection, or stage constraints.
- Condition–sample mismatch → improve encoders or embeddings; apply adaptive sim_min.
- Unit/dimension violations → run check_dim and range clamping in both encoding and post-processing.
- Subgroup bias → enable reweighting or mapping alignment; publish gaps and corrective evidence.
- Time/arrival drift → re-run annotate_time_arrival; audit delta_form and J.
Summary
This chapter establishes an executable specification for conditional and controllable generation: axioms P407-* for explicit conditions, auditable control, and minimal distortion; equations S407-* for CFG, Lagrangian penalties, and accept–project mechanics; the process flow M40-7 to close the loop across encoding, guidance, checking, alignment, and manifest publication; and contracts C40-7xx plus interfaces I40-7* to ensure engineering implementation and cross-volume consistency. Deliverables and metrics populate manifest.synth.cond, supporting downstream evaluation and release freeze.Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/