HomeDocs-Technical WhitePaper46-EFT.WP.Data.Benchmarks v1.0

Chapter 9 Significance & Uncertainty


I. Chapter Purpose & Scope

in benchmarks: test methods & parameters, confidence/coverage intervals, effect size & power, multiple-comparison correction, metrological composition & unit posture, and linkage with scoring/ranking/gates; ensure consistency with the metric system, evaluation protocol, frozen splits, metrology, and citation anchors.uncertainty and significanceFix specifications for

II. Terminology & Dependencies

  1. Terms: p-value, CI_95 (95% confidence interval), Δ (effect size), MDE (minimum detectable effect), power, coverage (coverage interval), u_c (combined standard uncertainty), U=k·u_c (expanded uncertainty), B (bootstrap reps).
  2. Dependencies: metrics & units (Ch.6), evaluation protocol (ModelCards v1.0, Ch.11), metrological composition (Core.Metrology v1.0:check_dim), online windows & monitoring (Pipeline v1.0, Ch.12).
  3. Math & symbols: wrap inline symbols; any division/integral/composite operator must use parentheses; for T_arr use
    • T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
    • T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
      declaring gamma(ell) and d ell. No Chinese in formulas/symbols/definitions.

III. Fields & Structure (Normative)

significance:

method: "bootstrap|permutation|t|bayes"

B: 10000

alpha: 0.05

effect_size: "delta|cohens_d|cliffs_delta"

mde: null

tails: "two|one"

correction: "Holm-Bonferroni|BH|none"

strata: ["task|locale|domain?"]

seed: 1701

uncertainty:

model: "GUM|linear|montecarlo|bayes"

components:

- {name:"stat", type:"random", value:null, unit:"—", distribution:"bootstrap", coverage:{level:0.95}}

- {name:"system", type:"systematic",value:null, unit:"<SI>", distribution:"normal", coverage:{k:2.0}}

correlation:

posture: "independent|groups|covariance"

groups: [{name:"instrument", rho:0.6}]

propagation:

rule: "rss|linear|montecarlo|bayes"

samples: 0

coverage_policy:

target: "CI_95|coverage_95"

k: 2.0

report:

significant_figures: 3

unit_consistency: true


IV. Significance Tests & Effect Size

  1. Method selection:
    • bootstrap: default; recommend B≥10,000; report CI_95 and Δ.
    • permutation: robust to distribution/variance differences.
    • t: use under normality and homoscedasticity.
    • bayes: report posterior intervals and P(Δ>0).
  2. Effect size Δ: harmonize with metric direction (e.g., negate ECE before comparison); optionally report cohens_d or cliffs_delta.
  3. Power & MDE: for online A/B, provide power≥0.8 or set mde with sample-size calculations.
  4. Multiple comparisons: default Holm–Bonferroni; apply to stratified (task/locale/domain) and cross-task comparisons.
  5. Gating linkage: candidates beating baselines without p<α must not be promoted; leaderboard updates require significance gates.

V. Uncertainty Modeling & Composition

  1. Components: statistical (sampling) vs. systematic (calibration/device/environment); record units and distributions.
  2. Composition rules:
    • rss: independent standard uncertainties, u_c = ( sqrt( Σ u_i^2 ) );
    • linear: first-order linearization, u_c = ( sqrt( J Σ J^T ) ), J=( ∂f / ∂x );
    • montecarlo|bayes: specify samples or prior/likelihood and report coverage intervals.
  3. Expanded uncertainty: U = ( k * u_c ); under normal assumptions k≈2 ≈ 95%.
  4. Dimensional consistency: normalize units first before composition; express in SI and pass check_dim.

VI. Coupling with Scoring/Ranking/Gates


VII. Machine-Readable Fragment (Drop-in)

significance:

method: "bootstrap"

B: 10000

alpha: 0.05

effect_size: "delta"

correction: "Holm-Bonferroni"

strata: ["task"]

seed: 1701

uncertainty:

model: "linear"

components:

- {name:"stat", type:"random", value:null, unit:"—", distribution:"bootstrap", coverage:{level:0.95}}

- {name:"device", type:"systematic", value:0.8, unit:"%", distribution:"normal", coverage:{k:2.0}}

correlation: {posture:"groups", groups:[{name:"device", rho:0.6}]}

propagation: {rule:"linear", samples:0}

coverage_policy: {target:"CI_95", k:2.0}

report: {significant_figures:3, unit_consistency:true}


VIII. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SIG.METHOD_ALLOWED

when: "$.significance.method"

assert: "value in ['bootstrap','permutation','t','bayes']"

level: error

- id: SIG.PARAMS_COMPLETE

when: "$.significance"

assert: "has_keys(B, alpha)"

level: error

- id: SIG.CORRECTION_ALLOWED

when: "$.significance.correction"

assert: "value in ['Holm-Bonferroni','BH','none']"

level: error

- id: UNC.MODEL_ALLOWED

when: "$.uncertainty.model"

assert: "value in ['GUM','linear','montecarlo','bayes']"

level: error

- id: UNC.COMPONENTS_DEFINED

when: "$.uncertainty.components"

assert: "len(value) >= 1"

level: error

- id: UNC.PROP_RULE_ALLOWED

when: "$.uncertainty.propagation.rule"

assert: "value in ['rss','linear','montecarlo','bayes']"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error


IX. Cross-Reference Anchors


X. Chapter Compliance Checklist


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/