Home / Docs-Technical WhitePaper / 52-Dataset Card Template v1.0
Chapter 5 — Annotation, Labeling & Review
I. Purpose & Scope
- Standardize annotation processes, labeling schema, and review/audit mechanisms, covering protocols, personnel & tools, label contracts, quality control & consistency assessment, privacy & minimization, and publication conventions.
- For labels involving path quantities (arrival time/phase), the text must explicitly show gamma(ell) and d ell, and the data side records delta_form ∈ {general, factored}; publication requires p_dim = 1.0 with check_dim_report.json attached.
II. Inputs & Dependencies
- Contracts & structure: schema.json/contract.yaml (Ch. 4); label fields declare type, unit, and dimension.
- Provenance & lineage: provenance.yaml/lineage_graph.json (Ch. 3); annotation batches included in the lineage DAG.
- Metrology & coverage: align coverage ∈ {k, alpha, quantile} and cov_group/Σ with the Error Budget; version/freshness with the Parameter Card.
- Citations & versions: “volume + version + anchor (P/S/M/I)”, anchor coverage ≥ 90%, no external links/aliases.
III. Annotation Process
- Protocol: define task (classification/regression/time-series/path/multimodal), guide (guidelines/examples/anti-patterns), criteria, and conflict names.
- Personnel & tools: annotator_id (anonymized), tool versions, training & calibration items.
- Batches & jobs: batch_id, sample allocation (random/stratified/hard-first), minimal repeated labels k_rep.
- Intake & review: majority vote/experience-weighted/adjudication; disagreement thresholds trigger re-review.
- Privacy & minimization: follow privacy_policy.yaml; prohibit logging raw sensitive content; mask/de-identify where required.
IV. Labeling Schema
- Fields:
- label.value, label.type ∈ {class, span, bbox, point, numeric, path}
- label.unit (e.g., rad/s, 1, s), label.dim
- label.confidence (0..1), label.rationale (optional)
- label.coverage (k/alpha/quantile, choose one)
- Path labels (if applicable): label.path{ gamma_ell[], d_ell[], delta_form, target ∈ {T_arr,Phi} }.
- Hierarchy & multitask: task_id, label.group, dependencies[]; inter-task consistency in constraints.
- Validity & mutex: enum.values[], mutex, requires, and implies rules.
V. Review & Consistency
- Metrics:
- Classification: κ (Cohen/Fleiss), agreement rate, macro/micro F1 (if reference exists).
- Numeric: MAE/MSE, CCC (concordance), interval alignment (k/alpha/quantile).
- Path: DTW/Hausdorff/segmented RMSE; step and alignment window consistent.
- Thresholds: κ ≥ κ_min, MAE ≤ τ_mae, interval_overlap ≥ τ_overlap; below thresholds triggers adjudication or re-annotation.
- Sampled re-review: stratified by batch/annotator/difficulty with rate r_review; discrepancies logged in quality.flags.
VI. Gates & Validation
- G1 | Schema completeness: label fields defined in schema.json/contract.yaml with types/units/dimensions.
- G2 | Citation compliance: see[]/references[] anchor coverage ≥ 90%.
- G3 | Path conventions: for path labels, gamma/measure/delta_form present; len(path)≥2, Δell compliant.
- G4 | Dimensional closure: pass I70-dim_check, p_dim = 1.0.
- G6 | Coverage: label.coverage ∈ {k, alpha, quantile} aligned with publication.
- G8 | Uniqueness: unique record_id+task_id; audit events complete.
- Trigger S1–S5 (dimensional/path/citation failures, etc.) to reject or re-annotate; tag [Restricted] when necessary.
VII. Field Representation (minimal template)
field | type | unit | dim | description | constraints | see |
|---|---|---|---|---|---|---|
task_id | string | 1 | 1 | annotation task ID | unique | — |
label.value | string/number/array | per field | per field | label value | matches label.type | Contract |
label.type | enum | 1 | 1 | class/span/bbox/point/numeric/path | required | Contract |
label.unit | string | SI | per field | unit | consistent with field | — |
label.dim | string | — | 1 | dimension | consistent with unit | — |
label.confidence | number | 1 | 1 | confidence [0,1] | ≥0 ∧ ≤1 | — |
label.coverage | object | 1 | 1 | `k | alpha | quantile` |
annotator_id | string | 1 | 1 | anonymized ID | de-identified | Privacy |
rationale | string | 1 | 1 | notes/criteria | optional | — |
VIII. Machine-Readable Contracts
A. label_schema.json (excerpt)
{
"$schema":"https://json-schema.org/draft/2020-12/schema",
"title":"Labeling v1.0.0",
"type":"object",
"required":["task_id","label"],
"properties":{
"task_id":{"type":"string"},
"label":{
"type":"object",
"required":["type","value"],
"properties":{
"type":{"enum":["class","span","bbox","point","numeric","path"]},
"value":{},
"unit":{"type":"string"},
"dim":{"type":"string"},
"confidence":{"type":"number","minimum":0,"maximum":1},
"coverage":{"type":"object"}
}
},
"annotator_id":{"type":"string"}
}
}
B. annotation_contract.yaml (process & thresholds)
version: "1.0.0"
tasks:
- id: "cls-01"
type: "class"
guide: "docs/guidelines_cls01.md"
k_rep: 2
kappa_min: 0.75
- id: "path-01"
type: "path"
guide: "docs/guidelines_path01.md"
path:
required: true
gamma: "gamma(ell)"
measure: "d ell"
delta_form: "general"
metrics:
dtw_max: 0.15
overlap_min: 0.80
coverage:
mode: "k"
k: 2
review:
r_review: 0.1
adjudication: true
privacy:
deid_policy: "privacy_policy.yaml"
C. Audit event audit.jsonl (sample line)
IX. Normative Path Forms
- Arrival: T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ); T_arr = ( ∫ ( n_eff / c_ref ) d ell ).
- Phase: Phi = ( 2π / λ_ref ) * ( ∫ n_eff d ell ).
In text, explicitly show path & measure; arrays equal length; step & alignment match Ch. 4/5 constraints.
X. Anti-Patterns & Fixes
- Anti: provide gamma(ell) only, missing d ell/delta_form → Fix: add and align with n_eff.
- Anti: T_arr = ∫ n_eff / c_ref d ell (missing parentheses) → Fix: parenthesize to normative forms.
- Anti: lack of consistency/review records → Fix: configure k_rep/r_review and adjudication and log audits.
- Anti: missing unit/dim → Fix: add units/dimensions and pass I70-dim_check.
XI. Cross-References
- Structure & Schema: Ch. 4; Splits & Versioning: Ch. 6; Gates & Integrity: Ch. 7; Uncertainty & Covariance: Ch. 8.
- Pipeline Card: stage control & path alignment (Ch. 6/Ch. 5).
- Error Budget Card: intervals/coverage & covariance (Ch. 8/Ch. 5/Ch. 6).
XII. Checklist
- label_schema.json/annotation_contract.yaml complete and aligned with Ch. 4; label units & dimensions complete.
- For path labels: explicit gamma/measure/delta_form; len(path) ≥ 2, Δell compliant.
- Consistency metrics met (κ/MAE/interval_overlap, etc.); sampled re-review & adjudication recorded.
- Coverage aligned with publication; I70-dim_check passed, p_dim = 1.0.
- Audit events complete; privacy minimization & de-identification enforced; citations & versions compliant (anchor coverage ≥ 90%).
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/