46-EFT.WP.Data.Benchmarks v1.0 | Chapter 9 Significance & Uncertainty

Home ／ Docs-Technical WhitePaper (V6.0) ／ 46-EFT.WP.Data.Benchmarks v1.0

Chapter 9 Significance & Uncertainty

I. Chapter Purpose & Scope

in benchmarks: test methods & parameters, confidence/coverage intervals, effect size & power, multiple-comparison correction, metrological composition & unit posture, and linkage with scoring/ranking/gates; ensure consistency with the metric system, evaluation protocol, frozen splits, metrology, and citation anchors.uncertainty and significanceFix specifications for

II. Terminology & Dependencies

Terms: p-value, CI_95 (95% confidence interval), Δ (effect size), MDE (minimum detectable effect), power, coverage (coverage interval), u_c (combined standard uncertainty), U=k·u_c (expanded uncertainty), B (bootstrap reps).
Dependencies: metrics & units (Ch.6), evaluation protocol (ModelCards v1.0, Ch.11), metrological composition (Core.Metrology v1.0:check_dim), online windows & monitoring (Pipeline v1.0, Ch.12).
Math & symbols: wrap inline symbols; any division/integral/composite operator must use parentheses; for T_arr use
- T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell ), or
- T_arr = ( ∫ ( n_eff / c_ref ) d ell ),
  declaring gamma(ell) and d ell. No Chinese in formulas/symbols/definitions.

III. Fields & Structure (Normative)

significance:

method: "bootstrap|permutation|t|bayes"

B: 10000

alpha: 0.05

effect_size: "delta|cohens_d|cliffs_delta"

mde: null

tails: "two|one"

correction: "Holm-Bonferroni|BH|none"

strata: ["task|locale|domain?"]

seed: 1701

uncertainty:

model: "GUM|linear|montecarlo|bayes"

components:

- {name:"stat", type:"random", value:null, unit:"—", distribution:"bootstrap", coverage:{level:0.95}}

- {name:"system", type:"systematic",value:null, unit:"<SI>", distribution:"normal", coverage:{k:2.0}}

correlation:

posture: "independent|groups|covariance"

groups: [{name:"instrument", rho:0.6}]

propagation:

rule: "rss|linear|montecarlo|bayes"

samples: 0

coverage_policy:

target: "CI_95|coverage_95"

k: 2.0

report:

significant_figures: 3

unit_consistency: true

IV. Significance Tests & Effect Size

Method selection:
- bootstrap: default; recommend B≥10,000; report CI_95 and Δ.
- permutation: robust to distribution/variance differences.
- t: use under normality and homoscedasticity.
- bayes: report posterior intervals and P(Δ>0).
Effect size Δ: harmonize with metric direction (e.g., negate ECE before comparison); optionally report cohens_d or cliffs_delta.
Power & MDE: for online A/B, provide power≥0.8 or set mde with sample-size calculations.
Multiple comparisons: default Holm–Bonferroni; apply to stratified (task/locale/domain) and cross-task comparisons.
Gating linkage: candidates beating baselines without p<α must not be promoted; leaderboard updates require significance gates.

V. Uncertainty Modeling & Composition

Components: statistical (sampling) vs. systematic (calibration/device/environment); record units and distributions.
Composition rules:
- rss: independent standard uncertainties, u_c = ( sqrt( Σ u_i^2 ) );
- linear: first-order linearization, u_c = ( sqrt( J Σ J^T ) ), J=( ∂f / ∂x );
- montecarlo|bayes: specify samples or prior/likelihood and report coverage intervals.
Expanded uncertainty: U = ( k * u_c ); under normal assumptions k≈2 ≈ 95%.
Dimensional consistency: normalize units first before composition; express in SI and pass check_dim.

VI. Coupling with Scoring/Ranking/Gates

Output CI_95 or coverage intervals along with scoring; ranking is based on score_norm, while Δ/CI_95/p gate promotions only.
Gates (Ch.8) must reference this chapter’s alpha/B/correction to enforce consistent leaderboard governance.

VII. Machine-Readable Fragment (Drop-in)

significance:

method: "bootstrap"

B: 10000

alpha: 0.05

effect_size: "delta"

correction: "Holm-Bonferroni"

strata: ["task"]

seed: 1701

uncertainty:

model: "linear"

components:

- {name:"stat", type:"random", value:null, unit:"—", distribution:"bootstrap", coverage:{level:0.95}}

- {name:"device", type:"systematic", value:0.8, unit:"%", distribution:"normal", coverage:{k:2.0}}

correlation: {posture:"groups", groups:[{name:"device", rho:0.6}]}

propagation: {rule:"linear", samples:0}

coverage_policy: {target:"CI_95", k:2.0}

report: {significant_figures:3, unit_consistency:true}

VIII. Lint Rules (Excerpt, Normative)

lint_rules:

- id: SIG.METHOD_ALLOWED

when: "$.significance.method"

assert: "value in ['bootstrap','permutation','t','bayes']"

level: error

- id: SIG.PARAMS_COMPLETE

when: "$.significance"

assert: "has_keys(B, alpha)"

level: error

- id: SIG.CORRECTION_ALLOWED

when: "$.significance.correction"

assert: "value in ['Holm-Bonferroni','BH','none']"

level: error

- id: UNC.MODEL_ALLOWED

when: "$.uncertainty.model"

assert: "value in ['GUM','linear','montecarlo','bayes']"

level: error

- id: UNC.COMPONENTS_DEFINED

when: "$.uncertainty.components"

assert: "len(value) >= 1"

level: error

- id: UNC.PROP_RULE_ALLOWED

when: "$.uncertainty.propagation.rule"

assert: "value in ['rss','linear','montecarlo','bayes']"

level: error

- id: METROLOGY.SI_AND_CHECKDIM

when: "$.metrology"

assert: "units == 'SI' and check_dim == true"

level: error

IX. Cross-Reference Anchors

Metric system & units: EFT.WP.Data.Benchmarks v1.0, Ch.6.
Scoring, normalization & ranking: Ch.8.
Evaluation protocol & significance settings: EFT.WP.Data.ModelCards v1.0, Ch.11.
Unit & dimensional composition: EFT.WP.Core.Metrology v1.0:check_dim.

X. Chapter Compliance Checklist

Significance configuration includes method/B/alpha/correction; effect size Δ and (if needed) power/MDE are explicit.
Uncertainty model/components/correlation/propagation/coverage are complete; SI units with check_dim=true.
Scoring/ranking/gates are coupled to significance outcomes; no promotion without significance.
If T_arr appears, delta_form/path/measure registered and validated.
Machine-readable fragment is drop-in and lint-clean; export_manifest.references[] use “Volume vX.Y:Anchor.”

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05