14-EFT.WP.Methods.Inference v1.0 | Chapter 12: Acceptance & Scoring Release

Home ／ Docs-Technical WhitePaper (V6.0) ／ 14-EFT.WP.Methods.Inference v1.0

Chapter 12: Acceptance & Scoring Release

I. Scope & Objectives

Define the final acceptance standard, score synthesis, and announcement & release workflow for an inference system under a locked environment EnvLock. Ensure that accuracy, calibration, SLOs, consistency, and compliance evidence are reproducible, traceable, and forensically verifiable, covering the path from offline validation to the last gate before online ramp-up.
Deliver the end-to-end workflow Mx-41 → Mx-44, the scoring convention, confidence qualification, and third-party verification artifacts. Integrate with Chapter 6 (online/offline consistency), Chapter 7 (uncertainty & calibration), Chapter 8 (performance & SLO), and Chapter 11 (cross-domain/device certificate CertEq).

II. Terms & Symbols

Gate: a Boolean acceptance outcome; Gate = true means admitted.
Score & weights: score = Σ w_k * s_k, with Σ w_k = 1; s_k are normalized sub-scores (see Chapter 8).
Confidence interval: CI_{1-delta}(m) = [ LCB_{1-delta}(m) , UCB_{1-delta}(m) ], where delta is the significance level.
Non-inferiority baseline: score_base (the previous release or control‐anchor score); tolerance tau_noninf.
Unified gate vector: tau = { tau_acc, tau_cal, tau_slo, tau_cons, tau_dev, tau_cost }.
Sample size & stratification: N_total, strata set H = {h_1,...,h_J}, per-stratum proportion pi_h and minimum sample N_min(h).
Evidence anchors: anchor, fingerprint, signature, IPC/PC (Chapter 9); time mapping ts = alpha + beta * tau_mono (Chapter 3).

III. Postulates & Minimal Equations

P41-81 Deployability postulate (acceptance→production portability)
Under EnvLock, a fixed graph, and locked IPC/PC, if Gate = true and CertEq is valid, then under same-distribution and controlled shift the online expected risk is non-inferior to baseline:
R_exp^{online} <= R_exp^{base} + O( shift ).
P41-82 Verifiable publication postulate
The release bundle binds artifacts/metrics/plans/logs via fingerprint and signature. Any third party, given the same anchor and inputs, can replay to obtain statistically equivalent scores and gate outcomes.
S42-81 Score synthesis
score = Σ w_k * s_k with canonical components, e.g., s_acc (accuracy), s_cal (calibration), s_slo (latency/throughput/stability), s_cons (offline/online consistency), s_dev (cross-device consistency), s_cost (resource/cost).
Confidence lower bound:
score_LCB = score - z_{1-delta} * sqrt( Var(score) / N_total )
or bootstrap lower quantile:
score_LCB = quantile_{delta}( score^{*} ).
S42-82 Non-inferiority & gates
Non-inferiority: score_LCB >= ( score_base - tau_noninf ).
Conjunctive thresholds:
Gate = ( s_acc >= tau_acc ) AND ( s_cal >= tau_cal ) AND ( s_slo >= tau_slo ) AND ( s_cons >= tau_cons ) AND ( s_dev >= tau_dev ) AND ( s_cost >= tau_cost ).
S42-83 Calibration confidence with ECE/MCE
With buckets B and weights w_b,
ECE = Σ w_b * | acc_b - conf_b |.
After temperature scaling yields ECE_T, acceptance requires:
ECE_T <= tau_cal and the interval upper bound satisfies
UCB_{1-delta}(ECE_T) <= tau_cal + eps.
S42-84 Consistency & regression control
Offline/online difference:
delta_offon = ( norm( y_hat_off - y_hat_on ) / norm( y_hat_off ) ); require delta_offon <= tau_cons (Chapter 6).
Cross-device difference:
delta_dev = ( norm( y_hat_A - y_hat_B ) / norm( y_hat_A ) ); require delta_dev <= tau_dev and a valid CertEq (Chapter 11).

IV. Data & Manifest Conventions

Mandatory fields (in addition to Chapter 9 cards):
dataset_id, split_policy, sampling: H→{pi_h, N_min(h)}; ts_window with alpha, beta; rng.seed, rng_family; artifact_fingerprint; driver/runtime_version.
Metric inventory & units: ACC, AUC, NLL, ECE, MCE, TS.latency_{p95,p99}, TS.thrpt, TS.error, cost.per.req, power.avg.
Baseline control: anchor_base, score_base, tau_noninf; thresholds vector tau.
Traceability & redaction
Store input hash(•) and the full fingerprint chain. Do not expose raw identifiers. Any externally disclosed statistic must use k-anon or quantile summaries.

V. Algorithms & Implementation Bindings

New I40-* prototypes
- I40-60 make_acceptance_plan(spec:dict) -> Plan
- I40-61 run_acceptance(plan:Plan) -> AcceptanceReport
- I40-62 compose_score(metrics:dict, weights:dict, method:str) -> {score:float, score_LCB:float, var:float}
- I40-63 decide_gate(report:AcceptanceReport, tau:dict, noninf:dict) -> {Gate:boolean, reasons:list}
- I40-64 build_announcement(report:AcceptanceReport, templates:any) -> AnnBundle
- I40-65 notarize_and_archive(bundle:AnnBundle, anchors:list) -> ArchiveReceipt
- I40-66 third_party_verify(bundle:any, policy:dict) -> VerifyReport
Decision pseudocode (abridged)
- Compute metrics with stratified aggregation; build CI_{1-delta} for each component.
- score_pack = I40-62(metrics, w, method="bootstrap").
- Non-inferiority: ok_noninf = ( score_pack.score_LCB >= score_base - tau_noninf ).
- Thresholds: ok_tau = Π_k [ s_k >= tau_k ] (logical AND).
- Evidence: validate CertEq, delta_offon, and audit log integrity.
- Gate = ok_noninf AND ok_tau AND evidence_complete; emit the rejection vector reasons.

VI. Metrology Flows & Run Diagram (Mx-41 → Mx-44)

Mx-41 Acceptance preparation
Lock EnvLock; freeze IPC/PC and anchor. Build the Plan (I40-60): define stratification, thresholds tau, non-inferiority control score_base, target power, and bootstrap replicates B.
Mx-42 Scoring & confidence construction
Run batch evaluation and online shadow traffic. For ACC/AUC/NLL/ECE/MCE/TS.*/cost.*, construct CI_{1-delta}. Execute I40-62 to obtain score and score_LCB; compile the AcceptanceReport.
Mx-43 Gate decision & sign-off
Call I40-63 to produce Gate and reasons. If failed, generate rollback & remediation advice (e.g., recalibration, window replay, resource quotas). If passed, collect CertEq, ConsistencyReport, DriftReport, and AuditLog.
Mx-44 Announcement & archiving
I40-64 builds the AnnBundle (executive summary, key plots, methodological conventions, limitations & risks, SLO statements, replay guide). I40-65 signs & archives; publish external announcement and third-party endpoints. Optionally, invoke I40-66 for independent verification.

VII. Verification & Test Matrix

Strata stability: for each h ∈ H, test sub-scores s_k(h) vs. overall; if
LCB_{1-delta}( s_k(h) - s_k ) < -eps, flag skew risk.
Non-inferiority power: choose delta and tau_noninf; size the sample to guarantee power >= 1 - beta_err. If insufficient, enlarge sample or extend the observation window.
Calibration review: build CI_{1-delta} for ECE/MCE/NLL and compare Delta_ECE, Delta_NLL before vs. after temperature scaling.
Joint consistency: verify delta_offon (Chapter 6) and delta_dev/CertEq (Chapter 11); any threshold breach sets Gate = false.
SLO stress gate: validate TS.latency_{p99}, TS.thrpt, and TS.error simultaneously at the committed load profile.
Replay idempotence: rerun with identical anchor, rng.seed, and inputs; differences must be within statistical noise.
Cost compliance: cost.per.req and power.avg must satisfy tau_cost jointly with SLOs; stability cannot be traded off for cost.

VIII. Cross-References & Dependencies

Score components & weights: Chapter 8.
TS.*, alerting & rollback: Chapter 10.
Offline/online consistency: Chapter 6.
Calibration & uncertainty: Chapter 7.
Cross-device certificate: Chapter 11.
Cards & fingerprints: Chapter 9.
Time & spectral conventions: Chapter 3 and Core.Metrology.

IX. Risks, Limitations & Open Questions

Business subjectivity in tau_noninf may raise ramp-up risk; disclose both conservative score_LCB and the gate vector tau.
Sample drift and data dependence can understate CI_{1-delta}; use stratified bootstrap and time-block resampling.
Third-party kernel/environment gaps may break replay equivalence; include reference tolerances and CertEq scope in the announcement.
Composite scores reduce interpretability; publish component curves and weight-sensitivity analysis in the AnnBundle.

X. Deliverables & Versioning

Deliverables
- AcceptancePlan.yaml (from Mx-41, with tau, score_base, stratification, and power targets).
- AcceptanceReport.md (from Mx-42, metrics, CI_{1-delta}, score/score_LCB, rejection reasons).
- SLO.Proof.json (key TS.* evidence and sampling windows).
- CalibrationReport.json (ECE/MCE/NLL with methods).
- ConsistencyReport.json (includes delta_offon, Chapter 6).
- CertEq.pdf (cross-device equivalence, Chapter 11).
- AnnBundle.zip (announcement pack, replay scripts, fingerprint/signature, methods & limitations).
- ArchiveReceipt.txt (archive signature and storage URI).
Versioning policy
- Any change to thresholds tau, weights w_k, or baseline score_base must generate a new AcceptancePlan and re-run Mx-41 → Mx-44.
- AnnBundle uses semantic versioning major.minor.patch; minor updates may not lower score_LCB or breach any tau.
- All changes must update PC.meta.parent_fingerprint and append to the CHANGELOG (see Appendix C).

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05