46-EFT.WP.Data.Benchmarks v1.0 | Chapter 16 Implementation Binding & Evaluation API

Home ／ Docs-Technical WhitePaper (V6.0) ／ 46-EFT.WP.Data.Benchmarks v1.0

Chapter 16 Implementation Binding & Evaluation API

I. Chapter Purpose & Scope

: interface prototypes, request/response envelopes, error codes, auth & idempotency, rate limits and version negotiation; cover suite loading, task execution, scoring & normalization, significance & uncertainty computation, leaderboard publish/revoke; align with data contracts, metrology posture, cross-volume anchors, and the export manifest.evaluation APIs and normative implementation bindingsProvide

II. Service Surface (Normative)

services:

benchmarks.v1:

- POST /api/v1/benchmarks/load_suite

- POST /api/v1/benchmarks/list_tasks

- POST /api/v1/benchmarks/get_task

- POST /api/v1/benchmarks/evaluate

- POST /api/v1/benchmarks/score

- POST /api/v1/benchmarks/significance

- POST /api/v1/benchmarks/uncertainty

- POST /api/v1/benchmarks/robustness

- POST /api/v1/benchmarks/fairness_ethics

- POST /api/v1/benchmarks/runtime/metrics

- POST /api/v1/benchmarks/runtime/lineage

- POST /api/v1/benchmarks/runtime/replay

- POST /api/v1/benchmarks/submit

- POST /api/v1/benchmarks/publish

- POST /api/v1/benchmarks/revoke

- POST /api/v1/benchmarks/hash_artifact

- POST /api/v1/benchmarks/sign_artifact

III. Common Request/Response & Auth

request_envelope:

headers:

Authorization: "Bearer <oidc-token> | HMAC <key>:<sig>"

x-eift-idempotency: "<uuid>"

content-type: "application/json"

body:

suite?: { ... }

task_id?: "<suite.task>"

spec?: { ... }

payload?: {artifacts:[{path, bytes_b64?, sha256?}]}

options?: {dry_run?: true, strict?: true}

filters?: {run_id?: "<id>", since?: "<ISO8601>", until?: "<ISO8601>"}

response_envelope:

status: "ok" | "warn" | "error"

errors: [{code, message, path?, see?}]

warnings:[{code, message, path?, see?}]

metrics: { ... }

data?: { ... }

version: "benchmarks.v1"

security:

auth: "OIDC bearer | HMAC"

tls: "TLS1.2+"

scope: ["load","evaluate","metrics","lineage","submit","publish","admin"]

rate_limits:

per_key_per_minute: 120

burst: 60

IV. Normative OpenAPI Excerpt

openapi: 3.0.3

info: {title: "EFT Benchmarks API", version: "v1"}

paths:

/api/v1/benchmarks/load_suite:

post:

summary: Validate and load a benchmark suite

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/SuiteEnvelope"}}}}

responses:

"200": {description: "Result", content: {"application/json": {schema: {$ref: "#/components/schemas/Result"}}}}

/api/v1/benchmarks/evaluate:

post:

summary: Execute evaluation (offline/online/stream/interactive)

requestBody: {required:true, content: {"application/json": {schema: {$ref: "#/components/schemas/EvalRequest"}}}}

responses:

"200": {description: "Run accepted", content: {"application/json": {schema: {$ref: "#/components/schemas/EvalResult"}}}}

components:

schemas:

SuiteEnvelope: {type: object, properties: {suite: {}, options:{type:object}}}

EvalRequest:

type: object

properties:

task_id: {type: string}

spec: {type: object}

options: {type: object, properties:{mode:{type:string, enum:["sync","async"]}}}

EvalResult:

type: object

properties:

run_id: {type: string}

state: {type: string, enum: ["queued","running","succeeded","failed"]}

scores: {type: object}

ci: {type: object}

artifacts: {type: array, items:{type: object}}

Result:

type: object

properties:

status: {type: string, enum: [ok, warn, error]}

errors: {type: array, items: {$ref: "#/components/schemas/Issue"}}

warnings:{type: array, items: {$ref: "#/components/schemas/Issue"}}

metrics: {type: object}

data: {type: object}

Issue:

type: object

properties:

code: {type: string}

message: {type: string}

path: {type: string}

see: {type: array, items: {type: string}}

V. Endpoint Semantics (Essentials)

/benchmarks/load_suite (blocking): structure/type/regex, anchor dependencies; metrology.units="SI"&check_dim=true; frozen splits & leakage guardrails; scoring/significance/compliance minima.
/benchmarks/evaluate: execute per Chapter 7 (offline/online/stream/interactive); return run_id and artifacts; online supports shadow/canary and guardrails.
/benchmarks/score: aggregate & normalize per Chapter 8; unify metric directions before combining; output score_raw/score_norm and tie_break details.
/benchmarks/significance: significance, power & multiple-comparison correction; output Δ/CI_95/p; link to gates.
/benchmarks/uncertainty: compose uncertainty per Chapter 9 (GUM|linear|montecarlo|bayes) and report coverage intervals; unify dimensions.
/benchmarks/robustness & /fairness_ethics: run Chapter 12/13 items and evaluate against thresholds.
*/benchmarks/runtime/ **: query runtime perf/energy and lineage/replay consistency.
/benchmarks/submit|publish|revoke: submit, publish/revoke leaderboard entries; follow stability-line and notice policy; revokes produce tombstones and update mirrors/indexes.
/benchmarks/hash_artifact|sign_artifact: compute sha256 and sign/verify; reconcile with export_manifest.artifacts[].

VI. Error Codes (Normative)

errors:

- {code:"ESCHEMA001", message:"suite schema violation", path:"$.suite"}

- {code:"EREF001", message:"invalid reference format", path:"$.export_manifest.references[*]"}

- {code:"EDIM001", message:"units must be SI and check_dim", path:"$.metrology"}

- {code:"ESPLIT001", message:"splits must be frozen and frozen indices enabled", path:"$.tasks[*].splits"}

- {code:"ELEAK000", message:"cross-split leakage detected", path:"$.tasks[*].leakage_guard"}

- {code:"EPROTO001", message:"protocol mode invalid", path:"$.tasks[*].protocol.mode"}

- {code:"EMETRIC001", message:"metric missing family/unit/higher_is_better", path:"$.tasks[*].metrics[*]"}

- {code:"ESIG001", message:"significance params incomplete", path:"$.tasks[*].significance"}

- {code:"EPUB001", message:"publish gate not met", path:"$.scoring.stability"}

VII. Idempotency, Versioning & Compatibility

idempotency:

header: "x-eift-idempotency"

window_hours: 24

versioning:

api: "benchmarks.v1" # breaking change → bump MAJOR

minor: "backward-compatible additions"

compatibility:

request_backward: "minor+patch"

response_fields: "additive only; no removals"

VIII. Security, Audit & Compliance

Auth: OIDC/HMAC; Transport: TLS1.2+; Least privilege by scope.
Audit: record request_id, idempotency_key, caller, timestamps, summary; logs feed the compliance module and appear in the export manifest.
Compliance: regional limits and data-subject rights per Chapter 14; publish/revoke per stability-line governance.

IX. Machine-Readable Implementation Snippets (Ixx-? Prototypes)

def load_suite(suite: dict) -> dict: ...

def list_tasks(suite_id: str) -> dict: ...

def get_task(suite_id: str, task_id: str) -> dict: ...

def evaluate(task_id: str, spec: dict, mode: str = "async") -> dict: ...

def score(results: dict, aggregation: dict, normalization: dict) -> dict: ...

def significance(a: dict, b: dict, method: str = "bootstrap", B: int = 10000) -> dict: ...

def uncertainty(model: str, components: list[dict], policy: dict) -> dict: ...

def robustness(spec: dict) -> dict: ...

def fairness_ethics(spec: dict) -> dict: ...

def runtime_metrics(run_id: str, since: str|None=None, until: str|None=None) -> dict: ...

def lineage(spec: dict|None=None, run_id: str|None=None) -> dict: ...

def replay(run_id: str, policy: str="strict") -> dict: ...

def hash_artifact(path: str|bytes) -> dict: ...

def sign_artifact(path: str|bytes, key_id: str) -> dict: ...

def submit(payload: dict) -> dict: ...

def publish(entry: dict) -> dict: ...

def revoke(tag: str, reason: str) -> dict: ...

X. Example Invocations (Ready-to-use)

curl -s -X POST https://api.eift.org/api/v1/benchmarks/load_suite \

-H "Authorization: Bearer <token>" \

-H "x-eift-idempotency: 7b7a0b1e-0a21-4f3f-9d0b-3b1e9b1f3c22" \

-H "Content-Type: application/json" \

-d @benchmark.json

curl -s -X POST https://api.eift.org/api/v1/benchmarks/evaluate \

-H "Authorization: Bearer <token>" -H "Content-Type: application/json" \

-d '{"task_id":"cls.binary","spec":{...},"options":{"mode":"async"}}'

curl -s -X POST https://api.eift.org/api/v1/benchmarks/score -d @scores.json

curl -s -X POST https://api.eift.org/api/v1/benchmarks/significance -d @pair.json

XI. Coupling with Export Manifest (Normative)

export_manifest:

artifacts:

- {path:"api/openapi.yaml", sha256:"..."}

- {path:"api/clients/python.tar.gz", sha256:"..."}

- {path:"runs/RUN-123/scores.json", sha256:"..."}

- {path:"runs/RUN-123/ci.json", sha256:"..."}

- {path:"runs/RUN-123/leaderboard.csv",sha256:"..."}

references:

- "EFT.WP.Core.DataSpec v1.0:EXPORT"

- "EFT.WP.Core.Metrology v1.0:check_dim"

- "EFT.WP.Data.Benchmarks v1.0:Ch.6"

- "EFT.WP.Data.Benchmarks v1.0:Ch.8"

- "EFT.WP.Data.Benchmarks v1.0:Ch.9"

XII. Chapter Compliance Checklist

Blocking endpoints (load_suite/evaluate/score/significance/uncertainty) implemented with auth, idempotency, and rate limits.
Citations use “Volume vX.Y:Anchor” and appear in export_manifest.references[]; no shortcodes or versionless refs.
Metrology checks active (units="SI", check_dim=true); frozen splits/leakage guardrails/scoring–normalization–significance minima pass.
Publish/revoke follow stability-line governance; OpenAPI/SDK and scores/CI/leaderboard artifacts listed in the export manifest and verifiable.

Copyright & License: Unless otherwise stated, the copyright of “Energy Filament Theory” (including text, charts, illustrations, symbols, and formulas) is held by the author (屠广林).
License (CC BY 4.0): With attribution to the author and source, you may copy, repost, excerpt, adapt, and redistribute.
Attribution (recommended): Author: 屠广林｜Work: “Energy Filament Theory”｜Source: energyfilament.org｜License: CC BY 4.0
Call for verification: Independent and self-funded—no employer and no sponsorship. Next, we will prioritize venues that welcome public discussion, public reproduction, and public critique, with no country limits. Media and peers worldwide are invited to organize verification during this window and contact us.
Version info: First published: 2025-11-11 ｜ Current version: v6.0+5.05