Home / Docs-Technical WhitePaper / 55-Decision & Change Log Template v1.0
Chapter 7 Risks, Triggers & Rollback
I. Chapter Goals & Applicability (Mandatory)
- Define a risk grading system, trigger caliber, and a rollback closed loop (detect → decide → execute → verify → postmortem → re-deploy) to ensure any change can be fast, auditable, and restorable upon deviation.
- Applies to changes in cross-volume calibers, parameter/data contracts, methods/processes, and implementation binding; aligned with Chapter 4 (State Machine), Chapter 6 (Impact), and Chapter 9 (Implementation & Verification).
II. Risk Levels (L1–L4, Mandatory)
- L1 Minor: localized impact; no user perception; auto-recovers within window.
- L2 Moderate: single subsystem/region affected; SLO approaching threshold; ops intervention required.
- L3 Severe: multi-subsystem or global impact; SLO/SLA breached; rapid degradation or rollback required.
- L4 Critical: safety/compliance risk or core function outage; immediate full rollback and emergency comms.
III. Trigger Definitions (Mandatory)
- Unified naming: <trigger_name> := <metric><comparator><threshold>@<window>, consistent with gate style.
- Default minimal trigger set:
- t_accuracy_low := gate_accuracy<0.98@7d
- t_latency_high := gate_latency>2h@7d
- t_incident_level := incident_level>=2@24h
- t_data_drift := data_drift>0.03@14d
- t_compat_break := compat_rate<0.99@replay
- t_budget_breach := unit_cost>1.1x@30d
- Decision logic: trigger = any(t in TRIGGERS satisfies policy(t)); policy may be instant, consecutive K, or moving average.
IV. Monitoring & Alerting (Mandatory)
- Surfaces: performance, latency, error rate, incident level, data drift, compatibility replay, resources & cost.
- Escalation: T0(observe) → T1(oncall) → T2(owner) → T3(release/management) with ack deadlines and actions defined per tier.
- Evidence retention: sampling window, raw-log digest, metric snapshots, script & dataset versions (script@commit, dataset@version).
V. Rollback Strategies & Decision Tree (Mandatory)
- Strategy types:
- Hot rollback: service remains online; switch version/config/feature flags.
- Cold rollback: brief downtime or phased offline; restore to a known-stable release.
- Partial rollback: rollback only affected subsystems/regions.
- Data rollback: restore parameter/model/data-contract snapshots and replay checks.
- Decision tree (condensed):
- Trigger true with level ≥ L2 → assess impact surface and isolatability;
- If isolatable → partial rollback + intensified monitoring; else perform full hot/cold rollback;
- After rollback → run restoration verification; if failing, escalate strategy or enter L4 emergency.
VI. Rollback Execution Flow (Mandatory)
- Freeze writes (as needed): pause new traffic/writes or switch to read-only.
- Switch path: revert to release-<stable> or flip feature_flag.off.
- Restore artifacts: parameters/calibers/data contracts/models with version & hash checks.
- Restoration verification: run restoration smoke and restoration regression; perform fast health checks on key metrics.
- Observe & unfreeze: track recovery curves over the window; unfreeze/gradually ramp once gates pass.
- Record & communicate: produce RollbackReport, update audit trail, and issue external comms as required.
VII. Restoration Verification & Pass Lines (Mandatory)
- Gate naming: gate_<metric><comparator><threshold>@<window>; examples:
gate_accuracy>=0.99@24h, gate_latency<=2h@24h, gate_error_rate<=1e-3@24h, compat_rate>=0.995@replay. - Evidence caliber: data sources, statistical method, confidence interval, script locator, report ID; no release from freeze if any hard gate fails.
VIII. Data & Contract Consistency (Mandatory)
- Contract rollback: specify API/Schema version range and fallback; mark breaking changes with breaking=true and force rollback.
- Replay requirement: provide the minimal replay set and pass-rate threshold; cross-environment consistency must meet the configured floor.
IX. Communication & Sign-off (Mandatory)
- Internal: Requester/Implementer execute; Approver/Owner signs; Auditor witnesses audit elements.
- External: per release matrix, notify affected parties and mitigation; include buffer window and restoration timeline.
X. Machine-Readable Schema (YAML; JSON equivalent, copy-ready)
risk:
levels:
L1: { impact: "localized", action: "monitor", notify: ["oncall"] }
L2: { impact: "single-subsystem", action: "partial_rollback", notify: ["oncall","owner"] }
L3: { impact: "multi-subsystem/global", action: "full_rollback", notify: ["oncall","owner","release_mgr"] }
L4: { impact: "safety/compliance", action: "emergency_shutdown", notify: ["exec","legal","pr"] }
triggers:
- name: "t_accuracy_low"
rule: "gate_accuracy<0.98@7d"
policy: { mode: "consecutive", k: 2 }
- name: "t_latency_high"
rule: "gate_latency>2h@7d"
policy: { mode: "instant" }
- name: "t_incident_level"
rule: "incident_level>=2@24h"
policy: { mode: "moving_avg", window: "24h" }
- name: "t_data_drift"
rule: "data_drift>0.03@14d"
policy: { mode: "instant" }
- name: "t_compat_break"
rule: "compat_rate<0.99@replay"
policy: { mode: "instant" }
rollback_plan:
type: ["hot","cold","partial","data"]
freeze_io: true
steps:
- "switch_traffic: release-stable"
- "restore_snapshot: params@2025-09-20"
- "run_suite: restoration_smoke"
- "run_suite: restoration_regression"
- "observe: 24h"
artifacts:
snapshots: ["params@hash","schema@v2.3","model@a1b2c3"]
scripts: ["restore.py@d4e5f6","smoke.sh@a1b2c3","regress.py@9f8e7d"]
success_gates:
- "gate_accuracy>=0.99@24h"
- "gate_latency<=2h@24h"
- "gate_error_rate<=1e-3@24h"
- "compat_rate>=0.995@replay"
consistency:
api_schema:
version_range: "[2.0,3.0)"
fallback: "adapter_v1_enabled"
breaking: true
replay:
minimal_set: ["cmb_set_v3","lens_v1"]
pass_rate: ">=0.992"
communication:
internal: ["oncall","owner","auditor","release_mgr"]
external: { policy: "as_needed", channels: ["status_page","mailing_list"] }
audit_trail:
record:
- "timestamp"
- "actor"
- "risk_level"
- "trigger"
- "action"
- "evidence_hash"
- "notes"
XI. Human × Machine Alignment (Mandatory)
Human Section | Machine Field | Validation Focus |
|---|---|---|
Risk levels & definitions | risk.levels.* | Clear L1–L4 semantics and actions |
Trigger set | triggers[] | Naming & rule caliber consistent; policy present |
Rollback strategies & flow | rollback_plan.* | Freeze → switch → restore → verify → observe loop complete |
Restoration verification & gates | success_gates[] | All hard gates; quantifiable & replayable |
Contract & replay consistency | consistency.* | Version ranges, fallback, pass-rate threshold |
Comms & sign-off | communication.* | Role coverage and external comms policy |
Audit trail | audit_trail.record[] | Traceable evidence; complete fields |
XII. Minimal Filled Example (copy-ready)
risk:
current_level: "L3"
reason: "gate_accuracy<0.98@7d & compat_rate<0.99@replay"
triggers_fired: ["t_accuracy_low","t_compat_break"]
rollback_plan:
type: ["hot","data"]
freeze_io: true
steps:
- "switch_traffic: release-stable"
- "restore_snapshot: schema@v2.2"
- "run_suite: restoration_smoke"
- "run_suite: restoration_regression"
- "observe: 24h"
success_gates:
- "gate_accuracy>=0.99@24h"
- "compat_rate>=0.995@replay"
audit_trail:
record_id: "RB-2025-0915-01"
timestamp: "2025-09-27T12:00:00Z"
actor: "Approver/Owner"
evidence_hash: "sha256:…"
communication:
internal: ["oncall","owner","auditor","release_mgr"]
external: { policy: "status_page" }
XIII. Path/Formula Consistency (Mandatory)
- If risks involve arrival-time criteria, use the unified forms:
- Constant factored: T_arr = ( 1 / c_ref ) * ( ∫ n_eff d ell )
- General form: T_arr = ( ∫ ( n_eff / c_ref ) d ell )
- When T_arr appears, in the same or adjacent paragraph declare path gamma(ell) and measure d ell; dimensional checks via check_dim must pass.
- No mixing: T_fil ≠ T_trans, n ≠ n_eff, c ≠ c_ref; no Chinese in formulas/symbols/definitions.
XIV. Cross-References & Citation Style (Mandatory)
- Fixed in-text format: “See 《 vX.Y》 Ch.x S/P/M/I…”, anchors preferred over whole volumes.
- Provide a machine-readable list in the DR’s references.see (examples):
- "EFT.WP.Core.Terms v1.0:P10-3"
- "EFT.WP.Core.Equations v1.1:S20-1"
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Core.DataSpec v1.0:I30-2"
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/