Home / Docs-Technical WhitePaper / 18-EFT.WP.Methods.CrossStats v1.0
Chapter 9 | Calibration Transfer and Domain Adaptation (Platt/Isotonic/BBQ)
One-sentence goal: Under distribution shift and cross-domain migration, provide monotonic score→probability calibration and an auditable transfer mapping so that ECE/NLL/Brier (and related metrics) meet their target SLOs in the new domain.
I. Scope and Objects
- Scope
Applies to binary and multi-class models for cross-domain (src → dst), cross-time, cross-device, and traffic-stratified probability calibration and transfer; covers both offline static and online streaming incremental calibration. - Objects
- Inputs: source-domain validation set D_src = { (s_i, y_i, x_i) }, small labeled or pseudo-labeled target set D_dst, base-model outputs s or z (logits), candidate method method ∈ {Platt, Isotonic, BBQ, Temp, Vector/Dirichlet}, optional importance weights w_i = p_dst(x_i) / p_src(x_i).
- Outputs: monotone mapping f_hat with parameters; cal_metrics = {ECE, NLL, Brier, KS}; a contract report and manifest.stats.calib.*.
- Constraints: evaluate in windows on tau_mono and publish with ts; when metrics depend on T_arr, record both formulations and delta_form in parallel.
II. Terms and Variables
- Scores and probabilities: s ∈ R (score), z ∈ R^C (logits), p_hat = sigmoid(a s + b), p = softmax(z / T).
- Weights and binning: w_i ≥ 0, bin k, counts n_k, Beta prior hyperparameters alpha, beta.
- Metrics and errors: ECE, NLL, Brier, ACE, and auxiliary budget U = k * u_c.
- Transfer and shift: p_src(x), p_dst(x), ps(x) (propensity or density-ratio proxy), drift_level.
- Constraints: monotonicity f' ≥ 0, probability simplex ∑_c p_c = 1, temperature T > 0.
III. Postulates P309-*
- P309-1 (Monotonicity & rank fidelity): scalar calibration f must be non-decreasing, preserving score order.
- P309-2 (Calibration–decision separation): calibration adjusts probability scale only; the decision threshold learner is not re-trained; evaluation uses a holdout or cross-validation split.
- P309-3 (Weight coherence): under covariate shift, minimize weighted risk on the target; if w_i cannot be stably estimated, fallback to stratified matching or a conservatively bounded range.
- P309-4 (Multiclass conservation): after multi-class calibration, probabilities must lie on the simplex with | ∑_c p_c - 1 | ≤ tol_sum.
- P309-5 (Time base & arrival time): statistical windows roll on tau_mono and are published with ts; if T_arr is used, maintain both formulations and assert delta_form.
- P309-6 (Auditability & rollback): any production calibration must ship with manifest.stats.calib.* and a rollback mapping f_prev.
- P309-7 (Overfitting guard): limit bin/parameter degrees of freedom, validate across folds, and enforce minimum samples per bin.
IV. Minimal Equations S309-*
- S309-1 (Platt scaling)
- p = sigmoid( a * s + b ), with (a, b) = argmin ∑_i w_i * ( - y_i * log p_i - (1 - y_i) * log(1 - p_i) ).
- Monotonicity (for positively oriented s): necessary & sufficient condition a ≥ 0.
- S309-2 (Isotonic regression, PAV)
- Find f to minimize ∑_i w_i * ( y_i - f(s_i) )^2 subject to non-decreasing f; the solution is piecewise-constant, with PAV merging adjacent violations.
- Laplace smoothing in bin k:
f_k = ( ∑_{i∈bin k} w_i y_i + lambda ) / ( ∑_{i∈bin k} w_i + 2 lambda ).
- S309-3 (BBQ: Bayesian Binning into Quantiles)
- Quantize s into K quantile bins; posterior bin mean:
p_k = ( alpha + ∑_{i∈k} w_i y_i ) / ( alpha + beta + ∑_{i∈k} w_i ). - Choose K by minimizing a weighted NLL criterion with a BIC-style penalty.
- Quantize s into K quantile bins; posterior bin mean:
- S309-4 (Temperature / Vector / Dirichlet, multi-class)
- Temperature scaling: p = softmax( z / T ), with T = argmin_T ∑_i w_i * ( - log p_{y_i} ), T > 0.
- Vector scaling: p = softmax( W z + b ); learn W, b by weighted NLL under order-preserving constraints.
- Dirichlet calibration (simple form): learn g(p_hat) = softmax( A * log p_hat + b ) to minimize weighted NLL.
- S309-5 (Importance-weighted risk)
- R_dst(f) = E_{(x,y)∼p_src} [ w(x) * l( f(s(x)), y ) ], with w(x) = p_dst(x)/p_src(x).
- Constraints: W_norm = ( ∑ w_i ) / N ≈ 1, var(w) ≤ tol_wvar.
- S309-6 (Calibration error metrics)
- ECE = ∑_{k=1}^K ( n_k / n ) * | acc(k) - conf(k) |; in the weighted version, replace counts with w_i.
- Brier = ( 1 / n ) * ∑ ( y_i - p_i )^2, NLL = - ( 1 / n ) * ∑ log p_{i,y_i}.
Unified arrival-time & path-measure convention
- Constant-factored: T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- General: T_arr = ( ∫ ( n_eff / c_ref ) d ell )
Always declare path gamma(ell) and measure d ell; maintain delta_form where applicable.
V. Statistical Workflow M30-9 (Ready → Fit → Validate → Launch → Persist)
- Readiness
Align time base (tau_mono), build source/target slices and stratifications; estimate or heuristically approximate w_i; set the primary metric NLL and guardrails ECE/Brier. - Method selection and fitting
- Binary priority: Platt → Isotonic → BBQ.
- Multiclass priority: Temperature → Vector/Dirichlet.
- Optimize weighted objectives, enforce monotonicity and regularization (lambda, minimum bin width and bin count).
- Cross-validation & overfitting defense
Use K-fold or time-block validation; report delta_metric = metric_post - metric_pre with variance. If delta_metric fails the threshold, fallback. - Pre-launch compliance
Check W_norm, var(w), monotonicity, and probability-sum conservation; produce a rollback mapping and a canary/gradual rollout plan. - Launch & monitoring
Maintain stratified online ECE/NLL dashboards; on drift, auto-trigger rebinning or temperature re-estimation. - Persist
Output f_hat, parameters, cal_metrics, summary stats of w_i, and manifest.stats.calib.* with signatures.
VI. Contracts and Assertions (C30-91x)
- C30-911 (Monotonicity): f is non-decreasing; for Platt, assert a ≥ 0.
- C30-912 (Min samples / bins): n_k ≥ n_min_bin and K ≤ K_max; for BBQ, alpha, beta ≥ alpha_min.
- C30-913 (Weighted stability): W_norm ≈ 1, var(w) ≤ tol_wvar, max(w) ≤ w_max.
- C30-914 (Multiclass conservation): | ∑_c p_c - 1 | ≤ tol_sum, T > 0.
- C30-915 (Improvement threshold): enforce NLL_post ≤ NLL_pre - tol_nll or ECE_post ≤ ECE_pre - tol_ece; otherwise rollback.
- C30-916 (Arrival-time delta): when using T_arr, assert delta_form ≤ tol_Tarr.
- C30-917 (Retraining throttling): rolling re-estimation frequency ≤ freq_max; each change must retain a TraceID and full audit trail.
- C30-918 (Rank fidelity): post-calibration AUC drop must not exceed tol_auc_drop.
VII. Implementation Bindings I30-*
- I30-91 calibration_transfer(src, dst, method, weights=None, params) -> f_hat
- I30-92 fit_platt(scores, labels, weights) -> {a, b}
- I30-93 fit_isotonic(scores, labels, weights, lambda, n_min_bin) -> f_hat
- I30-94 fit_bbq(scores, labels, weights, K_max, alpha, beta) -> {bins, p_k}
- I30-95 temperature_scaling(logits, labels, weights) -> T
- I30-96 dirichlet_calibration(p_hat, labels, weights, reg) -> {A, b}
- I30-97 apply_calibration(f_hat, scores_or_logits) -> probs
- I30-98 evaluate_calibration(probs, labels, weights) -> {ECE, NLL, Brier, KS}
- I30-99 enforce_calibration_contracts(report, rules) -> contract_report
- I30-90 time_align_for_stats(ds, sync_ref) -> ds' (carries offset/skew/J and two T_arr formulations)
Invariants: sum(weights)/N ≈ 1; probs ∈ simplex; monotone(f_hat); manifest carries a version and a rollback pointer.
VIII. Cross-References
- Sampling weights and importance weighting: see Chapter 3 of this volume.
- Multiple comparisons & sequential budgeting (controlling multi-metric impacts of calibration): see Chapter 6.
- Drift detection and triggers for recalibration: see Chapter 7.
- A/B guardrails and launch gates: see Chapter 8.
- Time bases and the two T_arr formulations: see Methods.Cleaning v1.0, Chapters 5 and 6.
IX. Quality & Risk Control
- SLI/SLO (examples)
ECE_post_p95 ≤ SLO_ece; NLL_post ≤ NLL_pre - tol_nll; latency_ms_p99 ≤ SLO_latency; recalib_frequency ≤ freq_max. - Risk controls
- If drift triggers but var(w) is unstable: degrade to temperature scaling.
- If BBQ bins are sparse: fallback to Platt.
- If multi-class tol_sum is violated: force renormalization.
- If tol_auc_drop is violated: block release and rollback.
Summary
, closing the loop from fitting to production audit.monitorable, and rollbackable, reproducible and manifest.stats.calib.*, calibration transfer becomes C30-91x. Through compliance contracts and weighted objectives calibration under cross-domain transfer, with Temperature/Vector/Dirichlet, and multi-class BBQ, Isotonic, PlattThis chapter presents a unified convention forCopyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/