HomeDocs-Technical WhitePaper19-EFT.WP.Methods.SynthData v1.0

Appendix C — Manifest Templates & Examples (synth manifest)


I. Scope & Objectives


II. Keyspace & Naming Convention

  1. Root object: manifest.synth
  2. Naming hierarchy
    • trace.*: provenance & versioning
    • dataset.*: dataset and schema binding
    • engine.*: generator & reproducibility
    • generation.*: conditioning, controls, and timepath
    • metrics.*: fidelity/utility/drift metrics and uncertainty
    • privacy.*: differential privacy & attack evaluations
    • contracts.*: contract evaluations & dispositions
    • runtime.*: streaming runtime & SLOs
    • sign.*: verification & signature

III. Minimal Keys & Metrology Conventions

  1. version: manifest version (semantic).
  2. trace.TraceID: global provenance ID; trace.build, trace.commit, trace.timestamp.
  3. dataset.name, dataset.tag, dataset.modality ∈ {tabular,image,text,audio,graph,timeseries}, dataset.SRef, dataset.sref_hash.
  4. dataset.n_real, dataset.n_syn, dataset.split ∈ {train,valid,test,release}.
  5. engine.type ∈ {copula,glm,vae,gan,flow,diffusion,scm}, engine.version, engine.seed, engine.rng, engine.spec_uri, engine.train_data_ref.
  6. generation.condition (declared schema for c), generation.controls (e.g., cfg_scale, top_p, temperature), generation.schedule.
  7. generation.timepath.*:
    • ts_timezone, tau_mono_origin, offset, skew, J.
    • c_ref, T_arr_form1 = ( 1 / c_ref ) * ( ∫ n_eff d ell ), T_arr_form2 = ( ∫ ( n_eff / c_ref ) d ell ), delta_form = | T_arr_form1 − T_arr_form2 |.
  8. metrics.*: name, value, u(value), unit(value), window, details (kernel/embedding/bandwidth, etc.).
  9. privacy.dp.eps_total, privacy.dp.delta_total, privacy.dp.accounting_method, privacy.attack_suite = {membership,linkability,attribute}, privacy.MI_risk.
  10. contracts[]: id, expr, tol, severity ∈ {info,warn,block}, result ∈ {pass,fail}, evidence_ref, action_plan.
  11. runtime.rho, runtime.latency_ms_p99, runtime.drop_rate, runtime.window.
  12. provenance.source_hash, provenance.blob_hash = hash_sha256(blob).
  13. sign.method, sign.signer, sign.signature, sign.timestamp.

IV. Units, Dimensions & Uncertainty

  1. Explicitly declare unit(x) and dim(x) for any field entering formulas. Examples:
    • unit(W1)="feature_unit", dim(W1)="-"
    • unit(T_arr)="s", dim(T_arr)="[T]"
  2. Uncertainty publication: provide u(metric) or interval {lo, hi}; derive via bootstrap or posterior quantiles and record metrics.details.source ∈ {bootstrap,posterior,analytic}.
  3. Dimensional check: publish only after check_dim( y - f(x) ) = pass.

V. Template (YAML minimal skeleton)

version: "1.0.0"

trace:

TraceID: "trc-xxxxxxxx"

build: "2025.09.01"

commit: "abcdef12"

timestamp: "2025-09-01T12:00:00Z"

dataset:

name: "synth-demo"

tag: "r1"

modality: "tabular"

SRef: "SRef-2025A"

sref_hash: "sha256:..."

n_real: 120000

n_syn: 120000

split: "release"

engine:

type: "diffusion"

version: "2.1.0"

seed: 20250901

rng: "pcg64"

spec_uri: "s3://specs/eng.json"

train_data_ref: "lake://real/train@sha256:..."

generation:

condition:

c_schema: "prompt|policy|conditioning-keys"

c_payload: {}

controls:

cfg_scale: 6.0

temperature: 1.0

schedule:

batches: 240

batch_size: 512

timepath:

ts_timezone: "UTC"

tau_mono_origin: "2025-09-01T00:00:00Z"

offset: 0.001

skew: 1.0e-6

J: 0.0005

c_ref: 2.99792458e8

T_arr_form1: "( 1 / c_ref ) * ( ∫ n_eff d ell )"

T_arr_form2: "( ∫ ( n_eff / c_ref ) d ell )"

delta_form: 1.2e-9

metrics:

- name: "W1"

value: 0.034

u: 0.006

unit: "feature_unit"

window: "all"

details: {distance: "Wasserstein-1", feature_space: "scaled-numeric"}

- name: "MMD_RBF"

value: 0.012

u: 0.004

unit: "-"

window: "all"

details: {kernel: "rbf", bandwidth: 1.0}

- name: "utility_gap_auc"

value: -0.005

u: 0.003

unit: "-"

window: "valid"

details: {model: "xgb", metric: "AUC"}

privacy:

dp:

eps_total: 2.0

delta_total: 1.0e-6

accounting_method: "moments"

attack_suite: ["membership","linkability"]

MI_risk: 0.03

contracts:

- id: "C40-121"

expr: "FID ≤ tol_FID ∧ KID ≤ tol_KID"

tol: {FID: 15.0, KID: 0.02}

severity: "warn"

result: "pass"

evidence_ref: ["metrics:FID","metrics:KID"]

action_plan: "none"

- id: "C40-141"

expr: "eps_total ≤ eps_budget ∧ delta_total ≤ delta_budget"

tol: {eps_budget: 3.0, delta_budget: 1.0e-5}

severity: "block"

result: "pass"

evidence_ref: ["privacy.dp"]

action_plan: "none"

runtime:

rho: 0.73

latency_ms_p99: 420

drop_rate: 0.002

window: "1h"

provenance:

source_hash: "sha256:..."

blob_hash: "sha256:..."

sign:

method: "ed25519"

signer: "release-bot@org"

signature: "base64:..."

timestamp: "2025-09-01T12:01:00Z"


VI. Example A (Tabular: DP Synthesis, Release Build)

  1. Setup
    • modality="tabular", engine.type="copula", privacy.dp=(eps_total=1.5, delta_total=1.0e-6).
    • Fidelity thresholds: W1 ≤ 0.05, MMD_RBF ≤ 0.02; utility non-inferiority: utility_gap_auc ≥ -0.01.
  2. Key differences on disk
    • engine.type="copula", engine.version="1.4.2", controls empty.
    • If reweighting is used, publish n_eff = ( (∑ w)^2 ) / ( ∑ w^2 ) in metrics.details.
  3. Fragment

engine: {type: "copula", version: "1.4.2", seed: 42, rng: "pcg64", spec_uri: "s3://specs/copula.json"}

metrics:

- name: "W1"; value: 0.028; u: 0.005; unit: "feature_unit"; window: "all"; details: {space: "num+onehot"}

- name: "MMD_RBF"; value: 0.010; u: 0.003; unit: "-"; window: "all"; details: {kernel: "rbf", bandwidth: 0.8}

- name: "utility_gap_auc"; value: -0.006; u: 0.002; unit: "-"; window: "valid"; details: {model: "logit"}

privacy:

dp: {eps_total: 1.5, delta_total: 1.0e-6, accounting_method: "advanced-composition"}

contracts:

- id: "C40-121"; expr: "W1 ≤ 0.05 ∧ MMD ≤ 0.02"; tol: {W1: 0.05, MMD: 0.02}; severity: "warn"; result: "pass"; evidence_ref: ["metrics:W1","metrics:MMD_RBF"]; action_plan: "none"

- id: "C40-141"; expr: "eps_total ≤ 2.0 ∧ delta_total ≤ 1.0e-5"; tol: {eps_budget: 2.0, delta_budget: 1.0e-5}; severity: "block"; result: "pass"; evidence_ref: ["privacy.dp"]; action_plan: "none"


VII. Example B (Imaging: Diffusion Generation + Arrival Consistency)

  1. Setup
    • modality="image", engine.type="diffusion"; fidelity via FID/KID with declared feature extractor and layer.
    • Record both T_arr formulations and ensure delta_form ≤ tol_Tarr.
  2. Fragment

dataset: {name: "synth-imaging", tag: "r2", modality: "image", SRef: "SRef-IMG-2025B", sref_hash: "sha256:..."}

engine: {type: "diffusion", version: "2.3.0", seed: 1337, rng: "philox", spec_uri: "s3://specs/diff.json"}

generation:

condition: {c_schema: "text-prompt", c_payload: {"scene": "factory floor", "illum": "D65"}}

controls: {cfg_scale: 7.5, sampler: "ddim", steps: 30}

timepath:

ts_timezone: "UTC"

tau_mono_origin: "2025-09-01T00:00:00Z"

offset: 0.0007

skew: 7.0e-7

J: 0.0003

c_ref: 2.99792458e8

T_arr_form1: "( 1 / c_ref ) * ( ∫ n_eff d ell )"

T_arr_form2: "( ∫ ( n_eff / c_ref ) d ell )"

delta_form: 9.0e-10

metrics:

- name: "FID"; value: 11.8; u: 1.4; unit: "-"; window: "all"; details: {embed_net: "InceptionV3", layer: "pool3"}

- name: "KID"; value: 0.013; u: 0.003; unit: "-"; window: "all"; details: {estimator: "poly-kernel"}

- name: "coverage"; value: 0.92; u: 0.02; unit: "-"; window: "all"; details: {bins: 64}

contracts:

- id: "C40-022"; expr: "delta_form ≤ tol_Tarr"; tol: {tol_Tarr: 1.0e-9}; severity: "block"; result: "pass"; evidence_ref: ["generation.timepath"]; action_plan: "none"

- id: "C40-121"; expr: "FID ≤ 15 ∧ KID ≤ 0.02"; tol: {FID: 15.0, KID: 0.02}; severity: "warn"; result: "pass"; evidence_ref: ["metrics:FID","metrics:KID"]; action_plan: "none"


VIII. Incremental Manifest for Streaming (Windowed)

runtime:

window: "5m"

rho: 0.81

latency_ms_p99: 510

drop_rate: 0.004

metrics_windowed:

- name: "W1_cur"; value: 0.041; u: 0.008; unit: "feature_unit"; window: "2025-09-01T12:00Z/12:05Z"

- name: "psi_cur"; value: 0.07; u: 0.01; unit: "-"; window: "2025-09-01T12:00Z/12:05Z"

contracts_windowed:

- id: "C40-184"; expr: "W1_cur ≤ 0.06 ∧ psi_cur ≤ 0.1"; tol: {W1_cur: 0.06, psi_cur: 0.1}; severity: "warn"; result: "pass"; evidence_ref: ["metrics_windowed:W1_cur","metrics_windowed:psi_cur"]; action_plan: "none"


IX. Validation, Signature & Release Essentials

  1. Before freezing
    • assert_synth_contract(ds_syn, rules) -> report, populate contracts.* with result.
    • Provenance & signature: provenance.blob_hash = hash_sha256(blob); sign.signature produced per sign.method.
    • Dimensional & formula checks: check_dim( y - f(x) ) = pass; delta_form ≤ tol_Tarr.
  2. Release admission
    pass = ( ∧ contracts.result=pass ) ∧ ( metrics valid ) ∧ ( privacy budget sufficient ) ∧ ( sign completed ).

X. Cross-References


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/