Home / Docs-Technical WhitePaper / 52-Dataset Card Template v1.0
Chapter 7 — Quality Gates & Integrity (QC Gates)
I. Purpose & Scope
- Define dataset-level Quality Gates (G1–G8) and integrity checks, including decision criteria, Stops/Fallbacks (S1–S5), /validate report format, and release compliance, to ensure consistent and auditable structure, dimensions, path, versioning, and freshness.
- For path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, with delta_form ∈ {general, factored} recorded on the data side; publication requires p_dim = 1.0 with check_dim_report.json attached.
II. Prerequisites & Inputs
- Contract & structure: schema.json/contract.yaml per Ch. 4, fields/units/dimensions aligned with TARR.
- Splits & versioning: split.yaml/split_manifest.json complete (Ch. 6), SemVer tagging clear.
- Source & lineage: provenance.yaml/lineage_graph.json complete (Ch. 3), acyclic.
- Metrology & coverage: aligned with Error Budget (cov_group/Σ, coverage ∈ {k, alpha, quantile}).
- Citations & versions: “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%.
III. Gates G1–G8
- G1 | Schema completeness: required fields present; types/index/window and units/dimensions match the contract; primary key/time/path blocks complete.
- G2 | Citation compliance: see[]/references[] are anchor-direct; coverage ≥ 90%; no external links/aliases.
- G3 | Path conventions: gamma/measure/delta_form present; len(gamma_ell)=len(d_ell)=len(n_eff)≥2; step Δell ≤ ( c_ref / f_s ) / max(n_eff).
- G4 | Dimensional closure: pass I70-dim_check, p_dim = 1.0; unified parenthesized forms:
T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ) or T_arr = ( ∫ ( n_eff / c_ref ) d ell );
Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell ). - G5 | Freshness: clock_state="locked"; |ts_start − calib.timestamp| ≤ τ_calib; expired samples isolated.
- G6 | Coverage: statistical intervals consistent between data and publication (k/alpha/quantile).
- G7 | Covariance consistency: cov_group/Σ aligned with Error Budget; Σ PD (jitter if needed); stratification/slice assumptions consistent.
- G8 | Uniqueness: unique record_id/checksum; lineage DAG acyclic; no cross-split confusion (unless explicitly shared via slice_k).
IV. Stops & Fallbacks (S1–S5)
- S1: dimensional failure or p_dim < 1 → reject/rollback; tag [Restricted] if necessary.
- S2: freshness failure or clock_state != locked → isolate or recalibrate.
- S3: missing path block or invalid step → reject and backfill/resample.
- S4: non-PD/inconsistent covariance → fix kernel params or switch robust surrogate.
- S5: citation non-compliance/insufficient anchors → block release and correct references.
V. /validate Report Spec
- Input: gates[] (default ["G1".."G8"]), optional stops[].
- Output: global and per-split gate results, stops_triggered, KPI snapshot, links links{check_dim_report,audit}.
validate_report.json (sample)
{
"dataset_id": "ds-core",
"timestamp": "2025-09-24T16:00:00Z",
"global": { "G1": true, "G2": 0.94, "G3": true, "G4": true, "G5": true, "G6": true, "G7": true, "G8": true },
"splits": {
"train": { "G": { "G1": true, "G3": true, "G4": true, "G6": true, "G8": true }, "count": 120345 },
"val": { "G": { "G1": true, "G3": true, "G4": true, "G6": true, "G8": true }, "count": 25780 },
"test": { "G": { "G1": true, "G3": true, "G4": true, "G6": true, "G8": true }, "count": 25812 }
},
"stops_triggered": [],
"links": { "check_dim_report": "reports/check_dim_report.json", "audit": "reports/audit.jsonl" }
}
VI. Machine-Readable Rules
A. gate_rules.yaml
version: "1.0.0"
gates:
G1: { schema_required: true }
G2: { anchor_coverage_min: 0.90, forbid_external_links: true }
G3: { path_required: true, min_samples: 2, delta_form: ["general","factored"], delta_ell_guard: "c_ref/fs/max(n_eff)" }
G4: { require_dim_check: true, p_dim: 1.0 }
G5: { tau_calib_s_max: 86400, clock_state: "locked" }
G6: { coverage_allowed: ["k","alpha","quantile"] }
G7: { cov_pd: true, kernel_allowed: ["exp","matern","ar1","const"] }
G8: { unique_record_id: true, unique_checksum: true, lineage_acyclic: true }
stops:
S1: "dim_check_fail or p_dim<1"
S2: "freshness_expired or clock_state!=locked"
S3: "path_block_missing or delta_ell_violate"
S4: "covariance_not_pd or cov_model_mismatch"
S5: "anchor_coverage_below_min or external_link_found"
labels: { restricted: "[Restricted]" }
B. compliance_table.csv (headers)
split,G1,G2(G-coverage),G3,G4,G5,G6,G7,G8,stops
train,true,0.94,true,true,true,true,true,true,""
val,true,0.95,true,true,true,true,true,true,""
test,true,0.93,true,true,true,true,true,true,""
VII. Monitoring & Alerts
- Online KPIs: Latency_P50/P95, Throughput, p_dim, σ_y(τ), δt_abs, Δτ_ch, loss_rate, Q_res.
- Triggers: gate breaches (G1–G8), S1–S5, lock loss, path desync, non-PD covariance; support silence windows & alert merging.
- Actions: isolate/rollback split, backfill path block, recalibrate, switch robust surrogate, tag [Restricted].
VIII. Anti-Patterns & Fixes
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: T_arr = ( ∫ ( n_eff / c_ref ) d ell ).
- Anti: only gamma(ell) provided, missing d ell/delta_form → Fix: complete and align with n_eff.
- Anti: expired samples mixed into splits → Fix: filter per freshness.policy or isolate and tag.
- Anti: coverage mode differs between data and publication → Fix: unify coverage.mode and params.
- Anti: lineage cycles or missing version/checksum → Fix: break cycles and complete version/checksum.
IX. Cross-References
- Splits/Versioning/Freshness: Ch. 6; Structure & Schema: Ch. 4; Provenance: Ch. 3; UQ: Ch. 8.
- Pipeline Card: gates/monitoring (Ch. 9), inbound contracts (Ch. 4), outputs & release (Ch. 12).
- Error Budget Card: threshold mapping (Ch. 8), intervals & coverage (Ch. 8).
X. Checklist
- gate_rules.yaml consistent with /validate; compliance_table.csv generated.
- For path quantities, explicit gamma/measure/delta_form; len(path) ≥ 2, Δell compliant; I70-dim_check passed, p_dim = 1.0.
- clock_state="locked", τ_calib valid; expired samples isolated or [Restricted].
- Coverage unified (k/alpha/quantile); cov_group/Σ aligned with Error Budget and PD.
- see[]/references[]/version compliant with anchor coverage ≥ 90%; lineage acyclic; record_id/checksum unique.
- Release bundle contains check_dim_report.json, validate_report.json, compliance_table.csv, and signature.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/