Home / Docs-Technical WhitePaper / 46-EFT.WP.Data.Benchmarks v1.0
Chapter 16 Implementation Binding & Evaluation API
I. Chapter Purpose & Scope
: interface prototypes, request/response envelopes, error codes, auth & idempotency, rate limits and version negotiation; cover suite loading, task execution, scoring & normalization, significance & uncertainty computation, leaderboard publish/revoke; align with data contracts, metrology posture, cross-volume anchors, and the export manifest.evaluation APIs and normative implementation bindingsProvideII. Service Surface (Normative)
services:
benchmarks.v1:
- POST /api/v1/benchmarks/load_suite
- POST /api/v1/benchmarks/list_tasks
- POST /api/v1/benchmarks/get_task
- POST /api/v1/benchmarks/evaluate
- POST /api/v1/benchmarks/score
- POST /api/v1/benchmarks/significance
- POST /api/v1/benchmarks/uncertainty
- POST /api/v1/benchmarks/robustness
- POST /api/v1/benchmarks/fairness_ethics
- POST /api/v1/benchmarks/runtime/metrics
- POST /api/v1/benchmarks/runtime/lineage
- POST /api/v1/benchmarks/runtime/replay
- POST /api/v1/benchmarks/submit
- POST /api/v1/benchmarks/publish
- POST /api/v1/benchmarks/revoke
- POST /api/v1/benchmarks/hash_artifact
- POST /api/v1/benchmarks/sign_artifact
III. Common Request/Response & Auth
request_envelope:
headers:
Authorization: "Bearer <oidc-token> | HMAC <key>:<sig>"
x-eift-idempotency: "<uuid>"
content-type: "application/json"
body:
suite?: { ... }
task_id?: "<suite.task>"
spec?: { ... }
payload?: {artifacts:[{path, bytes_b64?, sha256?}]}
options?: {dry_run?: true, strict?: true}
filters?: {run_id?: "<id>", since?: "<ISO8601>", until?: "<ISO8601>"}
response_envelope:
status: "ok" | "warn" | "error"
errors: [{code, message, path?, see?}]
warnings:[{code, message, path?, see?}]
metrics: { ... }
data?: { ... }
version: "benchmarks.v1"
security:
auth: "OIDC bearer | HMAC"
tls: "TLS1.2+"
scope: ["load","evaluate","metrics","lineage","submit","publish","admin"]
rate_limits:
per_key_per_minute: 120
burst: 60
IV. Normative OpenAPI Excerpt
openapi: 3.0.3
info: {title: "EFT Benchmarks API", version: "v1"}
paths:
/api/v1/benchmarks/load_suite:
post:
summary: Validate and load a benchmark suite
requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/SuiteEnvelope"}}}}
responses:
"200": {description: "Result", content: {"application/json": {schema: {$ref: "#/components/schemas/Result"}}}}
/api/v1/benchmarks/evaluate:
post:
summary: Execute evaluation (offline/online/stream/interactive)
requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/EvalRequest"}}}}
responses:
"200": {description: "Run accepted", content: {"application/json": {schema: {$ref: "#/components/schemas/EvalResult"}}}}
components:
schemas:
SuiteEnvelope: {type: object, properties: {suite: {}, options:{type:object}}}
EvalRequest:
type: object
properties:
task_id: {type: string}
spec: {type: object}
options: {type: object, properties:{mode:{type:string, enum:["sync","async"]}}}
EvalResult:
type: object
properties:
run_id: {type: string}
state: {type: string, enum: ["queued","running","succeeded","failed"]}
scores: {type: object}
ci: {type: object}
artifacts: {type: array, items:{type: object}}
Result:
type: object
properties:
status: {type: string, enum: [ok, warn, error]}
errors: {type: array, items: {$ref: "#/components/schemas/Issue"}}
warnings:{type: array, items: {$ref: "#/components/schemas/Issue"}}
metrics: {type: object}
data: {type: object}
Issue:
type: object
properties:
code: {type: string}
message: {type: string}
path: {type: string}
see: {type: array, items: {type: string}}
V. Endpoint Semantics (Essentials)
- /benchmarks/load_suite (blocking): structure/type/regex, anchor dependencies; metrology.units="SI"&check_dim=true; frozen splits & leakage guardrails; scoring/significance/compliance minima.
- /benchmarks/evaluate: execute per Chapter 7 (offline/online/stream/interactive); return run_id and artifacts; online supports shadow/canary and guardrails.
- /benchmarks/score: aggregate & normalize per Chapter 8; unify metric directions before combining; output score_raw/score_norm and tie_break details.
- /benchmarks/significance: significance, power & multiple-comparison correction; output Δ/CI_95/p; link to gates.
- /benchmarks/uncertainty: compose uncertainty per Chapter 9 (GUM|linear|montecarlo|bayes) and report coverage intervals; unify dimensions.
- /benchmarks/robustness & /fairness_ethics: run Chapter 12/13 items and evaluate against thresholds.
- */benchmarks/runtime/ **: query runtime perf/energy and lineage/replay consistency.
- /benchmarks/submit|publish|revoke: submit, publish/revoke leaderboard entries; follow stability-line and notice policy; revokes produce tombstones and update mirrors/indexes.
- /benchmarks/hash_artifact|sign_artifact: compute sha256 and sign/verify; reconcile with export_manifest.artifacts[].
VI. Error Codes (Normative)
errors:
- {code:"ESCHEMA001", message:"suite schema violation", path:"$.suite"}
- {code:"EREF001", message:"invalid reference format", path:"$.export_manifest.references[*]"}
- {code:"EDIM001", message:"units must be SI and check_dim", path:"$.metrology"}
- {code:"ESPLIT001", message:"splits must be frozen and frozen indices enabled", path:"$.tasks[*].splits"}
- {code:"ELEAK000", message:"cross-split leakage detected", path:"$.tasks[*].leakage_guard"}
- {code:"EPROTO001", message:"protocol mode invalid", path:"$.tasks[*].protocol.mode"}
- {code:"EMETRIC001", message:"metric missing family/unit/higher_is_better", path:"$.tasks[*].metrics[*]"}
- {code:"ESIG001", message:"significance params incomplete", path:"$.tasks[*].significance"}
- {code:"EPUB001", message:"publish gate not met", path:"$.scoring.stability"}
VII. Idempotency, Versioning & Compatibility
idempotency:
header: "x-eift-idempotency"
window_hours: 24
versioning:
api: "benchmarks.v1" # breaking change → bump MAJOR
minor: "backward-compatible additions"
compatibility:
request_backward: "minor+patch"
response_fields: "additive only; no removals"
VIII. Security, Audit & Compliance
- Auth: OIDC/HMAC; Transport: TLS1.2+; Least privilege by scope.
- Audit: record request_id, idempotency_key, caller, timestamps, summary; logs feed the compliance module and appear in the export manifest.
- Compliance: regional limits and data-subject rights per Chapter 14; publish/revoke per stability-line governance.
IX. Machine-Readable Implementation Snippets (Ixx-? Prototypes)
def load_suite(suite: dict) -> dict: ...
def list_tasks(suite_id: str) -> dict: ...
def get_task(suite_id: str, task_id: str) -> dict: ...
def evaluate(task_id: str, spec: dict, mode: str = "async") -> dict: ...
def score(results: dict, aggregation: dict, normalization: dict) -> dict: ...
def significance(a: dict, b: dict, method: str = "bootstrap", B: int = 10000) -> dict: ...
def uncertainty(model: str, components: list[dict], policy: dict) -> dict: ...
def robustness(spec: dict) -> dict: ...
def fairness_ethics(spec: dict) -> dict: ...
def runtime_metrics(run_id: str, since: str|None=None, until: str|None=None) -> dict: ...
def lineage(spec: dict|None=None, run_id: str|None=None) -> dict: ...
def replay(run_id: str, policy: str="strict") -> dict: ...
def hash_artifact(path: str|bytes) -> dict: ...
def sign_artifact(path: str|bytes, key_id: str) -> dict: ...
def submit(payload: dict) -> dict: ...
def publish(entry: dict) -> dict: ...
def revoke(tag: str, reason: str) -> dict: ...
X. Example Invocations (Ready-to-use)
curl -s -X POST https://api.eift.org/api/v1/benchmarks/load_suite \
-H "Authorization: Bearer <token>" \
-H "x-eift-idempotency: 7b7a0b1e-0a21-4f3f-9d0b-3b1e9b1f3c22" \
-H "Content-Type: application/json" \
-d @benchmark.json
curl -s -X POST https://api.eift.org/api/v1/benchmarks/evaluate \
-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \
-d '{"task_id":"cls.binary","spec":{...},"options":{"mode":"async"}}'
curl -s -X POST https://api.eift.org/api/v1/benchmarks/score -d @scores.json
curl -s -X POST https://api.eift.org/api/v1/benchmarks/significance -d @pair.json
XI. Coupling with Export Manifest (Normative)
export_manifest:
artifacts:
- {path:"api/openapi.yaml", sha256:"..."}
- {path:"api/clients/python.tar.gz", sha256:"..."}
- {path:"runs/RUN-123/scores.json", sha256:"..."}
- {path:"runs/RUN-123/ci.json", sha256:"..."}
- {path:"runs/RUN-123/leaderboard.csv",sha256:"..."}
references:
- "EFT.WP.Core.DataSpec v1.0:EXPORT"
- "EFT.WP.Core.Metrology v1.0:check_dim"
- "EFT.WP.Data.Benchmarks v1.0:Ch.6"
- "EFT.WP.Data.Benchmarks v1.0:Ch.8"
- "EFT.WP.Data.Benchmarks v1.0:Ch.9"
XII. Chapter Compliance Checklist
- Blocking endpoints (load_suite/evaluate/score/significance/uncertainty) implemented with auth, idempotency, and rate limits.
- Citations use “Volume vX.Y:Anchor” and appear in export_manifest.references[]; no shortcodes or versionless refs.
- Metrology checks active (units="SI", check_dim=true); frozen splits/leakage guardrails/scoring–normalization–significance minima pass.
- Publish/revoke follow stability-line governance; OpenAPI/SDK and scores/CI/leaderboard artifacts listed in the export manifest and verifiable.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/