HomeDocs-Technical WhitePaper13-EFT.WP.Methods.SimStack v1.0

Chapter 7: Parallelization, Scheduling & Resources


I. Scope & Objectives


II. Terms & Symbols

  1. Graph & paths
    • G=(V,E), w(v) (nominal work), c(e) (edge communication cost), crit(G) (critical path), D(G)=crit(G), W(G)=Σ_v w(v).
    • dist_in(v): shortest-path length from sources to v; dist_out(v): from v to sinks; slack(v) = D(G) - dist_in(v) - dist_out(v).
  2. Readiness & priority
    R(t): ready set at time t; p(v): scheduling priority score; batch(v): minimal mergeable unit for batching.
  3. Resources & quotas
    • Resource vector cap = (cpu, mem, gpu, nic); demand req(v) conformal to cap.
    • Quota quota(k): per-tenant or per-stage limit; dominant-resource share dom_ratio(k).
  4. Rates & utilization
    • lambda(v), mu(v), rho(v)=lambda(v)/mu(v); window estimate rho_hat(v;W).
    • Cost metric cost(res): linear or piecewise-linear combination by unit prices.
  5. Metrics & alerts
    TS.latency.*, TS.throughput.*, TS.util.*, TS.queue.*, TS.hb.violations, TS.sli.success_rate.

III. Postulates & Minimal Equations (P61-/S62-)

  1. P61-13 (Budget compliance)
    At any time, allocation alloc(t) must satisfy Σ_v use(v,t) ≤ cap and per-tenant use_k(t) ≤ quota(k).
  2. P61-14 (Fairness & starvation freedom)
    For an active tenant k, when req_k is satisfiable, waiting time is bounded. Use Dominant Resource Fairness (DRF) or an equivalent approximation.
  3. P61-15 (Causality & idempotency guardrails)
    Scheduling, batching, and deferrals must not violate hb. If retry paths change, obey idempotency or compensation contracts.
  4. S62-40 (Lower bound on makespan vs. parallelism)
    T_make(G,P) ≥ max( W(G)/P , D(G) ), where P is effective parallelism.
  5. S62-41 (Amdahl & Gustafson bounds)
    S_amdahl(P) = 1 / ( s + (1 - s)/P ), S_gustafson(P) = s + (1 - s) * P, with serial fraction s.
  6. S62-42 (G/G/m approximate waiting time)
    For m parallel servers:
    W_q ≈ ( C * rho^sqrt(2(m+1)) / ( m * (1 - rho) ) ) * ( 1 / mu ),
    with C = ( c_a^2 + c_s^2 ) / 2, rho = lambda / ( m * mu ).
  7. S62-43 (Priority score)
    • p(v) = ω1 * ( 1 / ( slack(v) + ε ) ) + ω2 * w(v) + ω3 * risk(v) + ω4 * age(v).
    • Recommend ω1 ≥ ω2 ≥ 0; risk(v) derives from failure probability or retry cost.
  8. S62-44 (Communication cut cost)
    For cross-host cut E_cut:
    CommCost = Σ_{e∈E_cut} vol(e) * lat(link) + Σ_{e∈E_cut} vol(e) / bw(link).
  9. S62-45 (DRF shares)
    For tenant k, share on resource r: sh_k^r = use_k^r / cap^r; dominant share dom_k = max_r sh_k^r. Schedule in ascending dom_k.

IV. Cost Model & Ready Set


V. Heuristics & Placement Rules


VI. Quotas, Isolation & Elasticity


VII. Data & Manifest Conventions (Scheduling & Resources)


VIII. Algorithms & Implementation Bindings (I60- Extensions)*


IX. Metrology Flows & Run Diagrams (Aligned with Mx-6*)

  1. Mx-67 schedule-compile-run
    • build_exec_graph → estimate w(v), c(e), and dist_in/out;
    • plan_schedule to produce p(v) and initial placement;
    • allocate_resources to enforce quotas and affinity;
    • Event loop: run → emit_metrics → apply_backpressure → autoscale as needed.
  2. Mx-68 rescale-and-rebalance
    • Monitor ρ_hat, TS.latency.p99;
    • Trigger autoscale and migrations;
    • Validate hb invariants and idempotent compensation.
  3. Mx-69 incident-and-rollback
    • Escalate alerts and apply protective load shedding;
    • Roll back to safe placement and quotas;
    • Produce postmortem with baseline update recommendations.

X. Observability, SLOs & Alerting

  1. SLIs & SLOs
    • End-to-end: TS.latency.p99, TS.sli.success_rate.
    • System: TS.util.*, TS.queue.backlog, TS.hb.violations.
  2. Alert rules
    • If TS.latency.p99 exceeds target for K consecutive windows, trigger scale_out.
    • If TS.hb.violations > 0, trigger block-and-audit.
  3. Dashboards & traceability
    Publish in ts, persist tau_mono and alpha/beta; record all scaling and migrations in audit.trail.

XI. Verification & Test Matrix

  1. Minimum required
    • Amdahl/Gustafson: vary s and verify speedup bounds.
    • G/G/m: inject c_a^2/c_s^2, validate W_q estimates and P99 predictions.
    • Placement cut: compare CommCost and TS.latency.p99 before/after heuristics.
  2. Boundary & extreme
    Burst traffic and hotspot migration; DRF fairness under heavy quota contention; GPU scarcity and NUMA constraints.
  3. Regression & thresholds
    With a fixed baseline, compare ΔT_make, ΔTS.util.*, ΔTS.queue.backlog, ΔTS.hb.violations, and cost deltas.

XII. Cross-References & Dependencies


XIII. Risks, Limitations & Open Questions


XIV. Deliverables & Versioning

  1. Deliverables
    • Scheduling strategy library (CP-first, HEFT, network-aware partitioning, DRF), placement & migration tools, autoscale policies, dashboard configs.
    • Benchmark graph suites, load generators, A/B scripts, and regression gates.
  2. Versioning
    From v1.0, freeze manifest keys and metric sets; add new strategies via feature flags with migration notes.

XV. New Terms & Symbols (to memorize)


Copyright & License (CC BY 4.0)

Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.

First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/