13-EFT.WP.Methods.SimStack v1.0 | Chapter 7: Parallelization, Scheduling & Resources

Home ／ Docs-Technical WhitePaper (V6.0) ／ 13-EFT.WP.Methods.SimStack v1.0

Chapter 7: Parallelization, Scheduling & Resources

I. Scope & Objectives

Define a unified pipeline from the object graph G=(V,E) to executable plans and resource allocation, covering cost models, ready sets, heuristics and placement rules, quotas and isolation, observability and alerting, all consistent with TS.* metrics, hb semantics, and the tau_mono ↔ ts mapping.
Objective: under conservation gates and SLO targets, minimize T_make(G), control P99 and overall cost, and produce reusable implementation bindings and run flows.

II. Terms & Symbols

Graph & paths
- G=(V,E), w(v) (nominal work), c(e) (edge communication cost), crit(G) (critical path), D(G)=crit(G), W(G)=Σ_v w(v).
- dist_in(v): shortest-path length from sources to v; dist_out(v): from v to sinks; slack(v) = D(G) - dist_in(v) - dist_out(v).
Readiness & priority
R(t): ready set at time t; p(v): scheduling priority score; batch(v): minimal mergeable unit for batching.
Resources & quotas
- Resource vector cap = (cpu, mem, gpu, nic); demand req(v) conformal to cap.
- Quota quota(k): per-tenant or per-stage limit; dominant-resource share dom_ratio(k).
Rates & utilization
- lambda(v), mu(v), rho(v)=lambda(v)/mu(v); window estimate rho_hat(v;W).
- Cost metric cost(res): linear or piecewise-linear combination by unit prices.
Metrics & alerts
TS.latency.*, TS.throughput.*, TS.util.*, TS.queue.*, TS.hb.violations, TS.sli.success_rate.

III. Postulates & Minimal Equations (P61-/S62-)

P61-13 (Budget compliance)
At any time, allocation alloc(t) must satisfy Σ_v use(v,t) ≤ cap and per-tenant use_k(t) ≤ quota(k).
P61-14 (Fairness & starvation freedom)
For an active tenant k, when req_k is satisfiable, waiting time is bounded. Use Dominant Resource Fairness (DRF) or an equivalent approximation.
P61-15 (Causality & idempotency guardrails)
Scheduling, batching, and deferrals must not violate hb. If retry paths change, obey idempotency or compensation contracts.
S62-40 (Lower bound on makespan vs. parallelism)
T_make(G,P) ≥ max( W(G)/P , D(G) ), where P is effective parallelism.
S62-41 (Amdahl & Gustafson bounds)
S_amdahl(P) = 1 / ( s + (1 - s)/P ), S_gustafson(P) = s + (1 - s) * P, with serial fraction s.
S62-42 (G/G/m approximate waiting time)
For m parallel servers:
W_q ≈ ( C * rho^sqrt(2(m+1)) / ( m * (1 - rho) ) ) * ( 1 / mu ),
with C = ( c_a^2 + c_s^2 ) / 2, rho = lambda / ( m * mu ).
S62-43 (Priority score)
- p(v) = ω1 * ( 1 / ( slack(v) + ε ) ) + ω2 * w(v) + ω3 * risk(v) + ω4 * age(v).
- Recommend ω1 ≥ ω2 ≥ 0; risk(v) derives from failure probability or retry cost.
S62-44 (Communication cut cost)
For cross-host cut E_cut:
CommCost = Σ_{e∈E_cut} vol(e) * lat(link) + Σ_{e∈E_cut} vol(e) / bw(link).
S62-45 (DRF shares)
For tenant k, share on resource r: sh_k^r = use_k^r / cap^r; dominant share dom_k = max_r sh_k^r. Schedule in ascending dom_k.

IV. Cost Model & Ready Set

Objective function
J = α * T_make + β * P99 + γ * cost(res) + δ * penalty(hb, retry) with scenario-defined α,β,γ,δ.
Ready-set maintenance
R(t) = { v | preds(v) completed ∧ resources satisfy req(v) }; order by p(v); support batch(v) merges to increase mu.
Rates & batching
Batch-size effect on service rate: mu(b) ≈ mu0 * f(b), where f(b) is monotone with saturation; record f’s fit convention in the manifest.

V. Heuristics & Placement Rules

Critical-path first (CP-first)
Minimize slack(v); tie-break by larger w(v).
HEFT-style heterogeneous placement
Estimate EFT(v,h) (earliest finish time of v on host h) and greedily minimize it:
EFT = avail(h) + exec(v,h) + xfer(preds→h).
Network-aware partitioning
Minimize CommCost per S62-44; map tightly connected clusters to the same domain to reduce E_cut.
Hotspot/backpressure coupling
When upstream B(u) stays high, prefer placing v near the congested domain to shorten buffer residency.
Constraints & isolation
Hard: req(v) ⊆ cap(h); soft: encode as penalties into J. Enforce tenant isolation via quota(k) and dom_k.

VI. Quotas, Isolation & Elasticity

Quota models
Static: fixed quota(k) per tenant. Dynamic: when ρ_hat(k;W) exceeds thresholds, trigger reallocation.
Elastic scaling
Triggers: if ρ_hat(v;W) > ρ_hi and TS.latency.p99 exceeds target, request capacity increase Δcap. Reclaim under sustained low load.
Isolation levels
Process-level, container-level, NUMA affinity, and GPU MIG. Record binding policy and affinity domains in the manifest.

VII. Data & Manifest Conventions (Scheduling & Resources)

Plans & placement
plan.id, placement[v] = host, start/finish(tau_mono), p(v), batch.size, preds/dists.
Resources & quotas
host.cap, alloc.timeline, quota(k), dom_k, violations.
Cost & objectives
objectives.J, weights(α,β,γ,δ), CommCost, cost(res) summary.
Metrics & alerts
TS.util.cpu/gpu/mem/nic, TS.queue.backlog, TS.latency.p99, TS.hb.violations, alerts.
Time bases & replay
Record all times on tau_mono and publish ts with alpha/beta; ensure replayability.

VIII. Algorithms & Implementation Bindings (I60- Extensions)*

I60-6 build_exec_graph(spec:any) -> GraphRef (see Chapter 3)
I60-7 plan_schedule(graph:GraphRef, policy:dict, resources:dict) -> SchedPlanRef (see Chapter 3)
I60-15 allocate_resources(plan:SchedPlanRef, quotas:dict) -> AllocationRef
Output host mapping, affinity bindings, and evidence of quota enforcement.
I60-16 autoscale(allocation:AllocationRef, signals:dict) -> Actions
Trigger scale-out/in and migrations based on ρ_hat and TS.*.
I60-8 apply_backpressure(graph:GraphRef, strategy:dict) -> bp.Report (see Chapter 3)
I60-10 eval_slo(trace:any, targets:dict) -> TS.Report (see Chapter 3)

IX. Metrology Flows & Run Diagrams (Aligned with Mx-6*)

Mx-67 schedule-compile-run
- build_exec_graph → estimate w(v), c(e), and dist_in/out;
- plan_schedule to produce p(v) and initial placement;
- allocate_resources to enforce quotas and affinity;
- Event loop: run → emit_metrics → apply_backpressure → autoscale as needed.
Mx-68 rescale-and-rebalance
- Monitor ρ_hat, TS.latency.p99;
- Trigger autoscale and migrations;
- Validate hb invariants and idempotent compensation.
Mx-69 incident-and-rollback
- Escalate alerts and apply protective load shedding;
- Roll back to safe placement and quotas;
- Produce postmortem with baseline update recommendations.

X. Observability, SLOs & Alerting

SLIs & SLOs
- End-to-end: TS.latency.p99, TS.sli.success_rate.
- System: TS.util.*, TS.queue.backlog, TS.hb.violations.
Alert rules
- If TS.latency.p99 exceeds target for K consecutive windows, trigger scale_out.
- If TS.hb.violations > 0, trigger block-and-audit.
Dashboards & traceability
Publish in ts, persist tau_mono and alpha/beta; record all scaling and migrations in audit.trail.

XI. Verification & Test Matrix

Minimum required
- Amdahl/Gustafson: vary s and verify speedup bounds.
- G/G/m: inject c_a^2/c_s^2, validate W_q estimates and P99 predictions.
- Placement cut: compare CommCost and TS.latency.p99 before/after heuristics.
Boundary & extreme
Burst traffic and hotspot migration; DRF fairness under heavy quota contention; GPU scarcity and NUMA constraints.
Regression & thresholds
With a fixed baseline, compare ΔT_make, ΔTS.util.*, ΔTS.queue.backlog, ΔTS.hb.violations, and cost deltas.

XII. Cross-References & Dependencies

With the Thread Network (Chapter 3)
Share P61-1..4 and S62-10..14; coordinate readiness and backpressure via apply_backpressure.
With Coupled Advancement (Chapter 4)
Step sizes and sync windows may be adjusted via scheduling signals; failures and retries follow idempotency and compensation.
With Time Calibration (Chapter 5)
Migrations and replays execute on tau_mono, publish as ts; keep arrival-time paths and measures consistent.
With Data Persistence (Chapter 6)
Plans, quotas, migrations, and alerts must be persisted to manifest and audit.trail, with fixed field names and units.

XIII. Risks, Limitations & Open Questions

Risks
Misplacement inflates E_cut and worsens tail latency; suboptimal batching raises P99; scaling oscillations break hb and cause retry storms.
Limitations
Heuristic approximations are not globally optimal; some resources (GPU/MIG) are indivisible with long-tail contention.
Open questions
Joint optimal control of step-size × scheduling × placement; robust migrations with provable SLO preservation under cross-domain network jitter.

XIV. Deliverables & Versioning

Deliverables
- Scheduling strategy library (CP-first, HEFT, network-aware partitioning, DRF), placement & migration tools, autoscale policies, dashboard configs.
- Benchmark graph suites, load generators, A/B scripts, and regression gates.
Versioning
From v1.0, freeze manifest keys and metric sets; add new strategies via feature flags with migration notes.

XV. New Terms & Symbols (to memorize)

Planning & priority: R(t), p(v), slack(v), EFT(v,h), batch(v).
Resources & quotas: cap, req(v), quota(k), dom_k, DRF.
Communication & cuts: E_cut, CommCost, vol(e), lat(link), bw(link).
Objectives & cost: J, α,β,γ,δ, cost(res).
Queues & parallelism: m, W_q, rho, c_a^2, c_s^2.
Metrics & alerts: TS.util.*, TS.queue.*, TS.latency.p99, TS.hb.violations, audit.trail.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05