HomeDocs-Technical WhitePaper14-EFT.WP.Methods.Inference v1.0

Chapter 8: Performance Metrics & SLO


I. Scope & Objectives

  1. Unify the object model, computation conventions, and publishing format for inference performance metrics and Service-Level Objectives (SLOs), covering offline load tests and online observability, single-instance and distributed inference, and CPU/GPU/accelerator modalities.
  2. Provide reusable score synthesis, gate.slo, and error-budget allocation, ensuring parallel executability with Chapter 6 (online/offline consistency) and Chapter 7 (calibration gates).
  3. Target outputs
    • Metrics & conventions: TS.latency_{p50,p95,p99}, TS.thrpt, TS.error, tail_ampl, cost_u, R_infer.
    • SLO spec: SLO = { name, sli, target, window, objective, budget }.
    • Score synthesis: score = Σ w_k * s_k, with ScoreReport and SLOReport.
    • Metrology flow: Mx-47 → Mx-52.

II. Terms & Symbols

  1. Metrics & decompositions
    • Latency breakdown: TS.lat_total = TS.lat_io + TS.lat_queue + TS.lat_sched + TS.lat_model.
    • Throughput: TS.thrpt = N_req / W; concurrency approximation: WIP ≈ TS.arrival_rate * E[T].
    • Tail amplification: tail_ampl = TS.latency_p99 / TS.latency_p50.
    • Availability: avail = 1 - ( N_err / N_req ), where N_err includes timeout, 5xx, policy_denied.
  2. SLI / SLO / SLA
    SLI is an observable (e.g., TS.latency_p99); SLO is the target (e.g., TS.latency_p99 <= L_target over window W); SLA (external contract) is out of scope in this volume.
  3. Cost & budgets
    • Unit cost: cost_u = ( cost_cpu + cost_gpu + cost_mem + cost_io + cost_net ) / N_req.
    • Resource budgets: budget.cpu/gpu/mem/power; error budget: budget.err = 1 - target.avail.
  4. Normalization & scoring
    • Linear downwards normalization: norm_down(x; a,b) = clamp( ( b - x ) / ( b - a ), 0, 1 ) (smaller is better).
    • Linear upwards normalization: norm_up(x; a,b) = clamp( ( x - a ) / ( b - a ), 0, 1 ) (larger is better).

III. Postulates & Minimal Equations

  1. P41-21 Invariant observability postulate
    With EnvLock locked and the aggregator fixed, the computation convention of the same SLI is equivalent offline/online: SLI_off ≡ SLI_on.
  2. P41-22 Multi-objective monotonicity postulate
    If any sub-metric s_k improves (others unchanged), the overall score does not decrease: ∂score/∂s_k >= 0.
  3. S42-31 Score synthesis
    score = w_acc * acc + w_cal * ( 1 - ECE_norm ) + w_lat * ( 1 - lat_p99_norm ) + w_thr * thrpt_norm + w_cost * ( 1 - cost_u_norm ) + w_cons * R_infer, with Σ w_* = 1.
    lat_p99_norm = norm_down( TS.latency_p99; L_target, L_worst );
    thrpt_norm = norm_up( TS.thrpt; QPS_min, QPS_goal );
    cost_u_norm = norm_up( cost_u; C_min, C_max );
    ECE_norm = norm_up( ECE; 0, ECE_max ).
  4. S42-32 SLO decision & error budget
    • Latency-type: pass_lat = 1[ TS.latency_p99 <= L_target ].
    • Availability-type: pass_avail = 1[ avail >= A_target ].
    • Budget consumption: budget.used = violations / opportunities, with violations = Σ 1[ SLI_i fails ].
  5. S42-33 Cost model
    • cost_cpu = price_cpu * cpu_time; cost_gpu = price_gpu * gpu_time;
      cost_mem = price_mem * mem_GB * time; similarly for cost_io/net.
    • cost_u = ( cost_cpu + cost_gpu + cost_mem + cost_io + cost_net ) / N_req.
  6. S42-34 Queue consistency & Little’s-law approximation
    WIP ≈ λ * E[T], where λ = TS.arrival_rate, E[T] = TS.latency_p50, for capacity and backpressure checks.

IV. Data & Manifest Conventions

  1. Per-request minimal observability fields
    • ts_start, ts_end, route, batch_size, device, dtype_policy, quant_scheme, status, bytes_in/out, retries, cold_start, z_logit_opt.
    • Resource samples: cpu_pct, gpu_util, mem_GB, power_W, sm_occupancy, bw_in/out.
    • Bucketing & aggregation: hist.latency (supports kll/tdigest), window W, step Δt.
  2. Convention consistency
    Measure all latency on tau_mono and map to ts: ts = alpha + beta * tau_mono. Use the same quantile approximator and compression parameters for percentiles.
  3. Cost conventions
    Declare unit-price baselines and currency. For mixed tenancy, record share_ratio to apportion cost_mem and cost_net.

V. Algorithms & Implementation Bindings

  1. Prototypes
    • I40-11 compute_sli(stream:any, spec:dict) -> SLIReport
    • I40-12 compose_score(sli:dict, weights:dict) -> ScoreReport
    • I40-13 plan_capacity(target:dict, priors:dict) -> Plan
    • I40-10 compare_offline_online(off:any, on:any, policy:dict) -> ConsistencyReport
  2. compute_sli highlights
    Maintain TS.latency_{p50,p95,p99} via kll/tdigest; windowed aggregation over W with step Δt; slice by route/device.
  3. compose_score highlights
    Normalize per S42-31 and synthesize; return the overall score, per-dimension s_k, gate.slo, and sensitivities ∂score/∂s_k.
  4. plan_capacity
    Produce the feasible region over ( λ, batch_size, replica ) satisfying pass_lat ∧ pass_avail ∧ cost_u <= C_cap. If infeasible, return E_RESOURCE_EXCEEDED.

VI. Metrology Flows & Run Diagram (Mx-47 → Mx-52)


VII. Verification & Test Matrix


VIII. Cross-References & Dependencies

Shares TS.*, R_infer, rollback, and canary orchestration with Chapter 6; shares ECE_norm and calibration gates in the score with Chapter 7; aligns with scoring and publication conventions in EFT.WP.Methods.Repro Chapter 8; adheres to hb/bp/makespan/critical path semantics from Core.Threads.

IX. Risks, Limitations & Open Questions


X. Deliverables & Versioning

  1. Deliverables
    • SLOSpec.yaml (SLO definitions and budgets);
    • SLIReport.json (windowed statistics with approximator parameters across dimensions);
    • ScoreReport.json (score, s_k, sensitivities and gates);
    • Plan.yaml (capacity plan and rollback thresholds);
    • Audit bundle (aggregator fingerprint, signatures, and release fingerprints).
  2. Versioning policy
    • Changes to SLI definitions, aggregators or window W/Δt, w_*, or any target/budget must bump the minor version and be recorded in Appendix C.
    • If the scoring structure or cost-model terms change, bump the major version and update
      fingerprint = hash( SLOSpec || ScoreSpec ).

Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/