Home / Docs-Technical WhitePaper / 06-EFT.WP.Core.DataSpec v1.0
Chapter 9 — Privacy, Security, and Governance
I. Scope and Objectives
- Define a unified approach for data classification, access control, encryption, de-identification, and compliance auditing, yielding executable governance workflows and gating criteria.
- Bind I60 8 anonymize / mask_fields / enforce_retention to manifest extension fields to implement an end-to-end privacy & security loop—ingest → process → publish → archive/delete.
- Provide specialized governance requirements for cross-volume anchors (e.g., T_arr, gamma(ell), n_eff(x,t)), ensuring performance and traceability without compromising the principle of privacy minimization.
II. Terms, Symbols, and Dependencies
- Data tiers & labels: pii_level ∈ {"P0","P1","P2","P3"}, gov_tag ∈ {"internal","confidential","restricted","public"}.
- Identity & access: uid (subject identifier), role, policy, ABAC (attribute-based), RBAC (role-based).
- Keys & crypto: K_enc (data key), K_wrap (key wrapper), IV (initialization vector), AEAD (authenticated encryption), rotate(K, t) (time-window rotation).
- De-identification & differential privacy: k (k-anonymity), l (l-diversity), t (t-closeness), epsilon, delta (DP budget).
- Retention & deletion: ttl_days, legal_hold ∈ {0,1}, deletion_log.
- Evidence & audit: hash_sha256(blob), signature, Trace = [source -> method -> artifact].
- Manifest extensions: manifest.privacy, manifest.governance, manifest.cryptography, manifest.retention.
III. Governance Postulates (P69-*)
- P69-1 Minimization: collect and retain only fields necessary to achieve stated S/P/M/I objectives; drop redundant identifiers by setting m=0 or serving derived views.
- P69-2 Classification-First: every field_i must declare pii_level and gov_tag at the schema layer; unclassified fields cannot enter production.
- P69-3 Authenticated Identity: all read/write operations must execute under {uid, role, policy} and be fully auditable.
- P69-4 Encryption Always-On: use AEAD in transit and at rest; keep K_enc physically separate from data, manage K_enc via K_wrap, and rotate keys per policy.
- P69-5 Revocable & Deletable: upon ttl_days expiry or valid withdrawal, trigger irreversible deletion and record deletion_log with hash_sha256 attestation.
- P69-6 Two-Form Consistency (Arrival-Time): any T_arr publication must prove that post-de-identification delta_form does not inflate beyond threshold (preserving scientific comparability).
IV. Data Classification and Field-Dictionary Constraints (S69-1)
- Illustrative classification:
- P3 direct identifiers: name, phone, email, gov_id, precise_location (lon,lat with < 100 m), biometric.
- P2 indirect identifiers: device_id, cookie, ip, coarse_location (grid ≥ 1 km), high-precision timestamp.
- P1 sensitive non-identifiers: financial_metric, aggregated health_indicator.
- P0 non-sensitive: environmental_sensor, public_reference.
- New field-entry keys: privacy.pii_level, privacy.sensitivity_note, governance.owner, governance.steward, governance.policy_ref.
- Constraints: P3 fields must specify a mask_strategy and a minimization mapping; ts precision must align with pii_level (e.g., P3 downsample to Delta_t >= 1 min, scrub seconds if needed).
V. Access Control and Auditing (S69-2)
- Access admission:
- RBAC: role ∈ {"producer","consumer","steward","admin"}.
- ABAC conditions: env ∈ {"prod","staging"}, purpose ∈ {"ops","research","billing"}, pii_level_max.
- Decision function:
allow(uid, action, resource) = evaluate(policy, {role, env, purpose, pii_level(resource)}). - Audit minimums:
ts, uid, role, action, resource, pii_level, purpose, decision, hash_sha256(manifest), signature. - Core.Errors alignment: access denial or decrypt failure must log via log_event(E.*,"ERROR", context) with traceback_summary attached.
VI. Encryption and Key Management (S69-3)
- In transit: TLS ≥ v1.2.
- At rest: AEAD(K_enc, IV, aad=manifest_id); apply file- or column-level encryption differentially by pii_level.
- Keys:
- Generation & rotation: K_enc <- rotate(K_enc, t_rotate); for highly sensitive fields, enforce t_rotate <= 30 d.
- Wrapping: K_wrap = KMS_wrap(K_enc); store only K_wrap and aad; never co-locate plaintext K_enc with ciphertext.
- Verification: on each decrypt, verify AEAD tag and record failure-rate baselines.
VII. De-identification and Minimization (S69-4)
- k-anonymity: partition equivalence classes E_j, require min_j |E_j| >= k.
- l-diversity: for sensitive attribute S, distinct(S in E_j) >= l.
- t-closeness (distributional distance):
distance( P_S(E_j), P_S(global) ) <= t, with distance recommended as JSD or W1 / IQR_global. - Common strategies:
- Generalization: age -> age_band, lon,lat -> geohash(r).
- Redaction: drop(P3).
- Percentile clipping: winsorize(p_low, p_high) on extreme tails.
- Quality binding: recompute q_score and drift before/after de-identification; record deltas in manifest.privacy.impact.
VIII. Differential Privacy (S69-5)
- Mechanism ((epsilon, delta)-DP):
For any adjacent datasets D, D' and any output set S,
Pr[M(D) ∈ S] <= exp(epsilon) * Pr[M(D') ∈ S] + delta. - Sensitivity & noise:
- Laplace (counts/sums): noise ~ Laplace(b), b = sensitivity / epsilon.
- Gaussian (means/ratios): noise ~ Normal( 0, sigma^2 ), sigma from (epsilon, delta) and sensitivity.
- Budget ledger:
epsilon_total = sum epsilon_i (sequential composition upper bound); manage via manifest.privacy.epsilon_ledger; disallow further releases once exhausted. - Disclosure labeling:
All DP outputs must annotate epsilon_used, delta, sensitivity, and effective N_eff.
IX. Masking, Hashing, and Tokenization (S69-6)
- mask_fields(ds, fields, mode) recommended modes:
- "hash": hash_sha256( salt || value ); salt ≥ 128-bit and rotated regularly; avoid reversible collision domains.
- "token": random one-to-one token_id; store mapping in a separately encrypted vault with admin-only access.
- "redact": set to null or m=0; never use dummy placeholders.
- "generalize": bin(field, bins) or quantile_bucket(q).
- "noise": add zero-mean noise to numeric fields; record sigma and applicability.
- Caliber alignment: after masking, update unit/dim and re-run check_dim(expr) to prevent semantic drift.
X. Retention, Freeze, and Deletion (S69-7)
- Retention contract: enforce_retention(ds, ttl_days) must honor legal_hold; frozen datasets use freeze_release(tag).
- Deletion workflow (Mx-5):
- Compute and record hash_sha256 fingerprint inventory;
- Disassociate token maps and revoke K_enc;
- Securely erase primaries and replicas, including indexes (idx_k);
- Record deletion_log = {ts, uid, resources, method, hash_before, signature};
- Write completion status under manifest.retention.
- Irreversibility boundary: after P3 deletion, no recovery of equivalent identifiers from Trace or auxiliary data is permitted.
XI. Governance Roles and Responsibilities (S69-8)
- owner: define purpose and approve collection/retention policies.
- steward: maintain schema, classifications, manifest, and quality gates.
- producer: execute collection, masking, and encryption; implement I60 8 interfaces.
- consumer: access derived views under least privilege.
- auditor: periodically audit policies, logs, and the epsilon_ledger; issue compliance reports.
XII. Privacy Notes for Cross-Volume Anchors (S69-9)
- Path data gamma(ell) and pid are P2 or higher; upon publication, apply at least geohash(r>=6) generalization or sparsify ell.
- T_arr publication should prefer intervals and statistics (median, IQR, RMSE) over per-record details; where details are necessary, inject noise under a W1 fidelity bound and record epsilon_used.
- No processing may increase delta_form beyond tol_Tarr; otherwise rollback or publish aggregates instead.
XIII. Manifest Extensions (S69-10)
- manifest.privacy = {pii_map, mask_strategy, epsilon_ledger, dp_method, sensitivity_ref, impact}.
- manifest.governance = {owner, steward, policy_ref, approvals, audit_log_ref}.
- manifest.cryptography = {enc: "AEAD", key_ref, wrap_ref, rotate_days, aad}.
- manifest.retention = {ttl_days, legal_hold, frozen_tags, deletion_log_ref}.
- manifest.access = {rbac_roles, abac_attrs, pii_level_max}.
XIV. Implementation Bindings and Interface Contracts (Aligned with I60 8)
- anonymize(ds:any, policy:dict) -> any
- policy = {k, l, t, strategies: {field -> {mode, params}}, dp?: {epsilon, delta, method}}.
- Returns ds' with manifest.privacy augmented and an impact report.
- mask_fields(ds:any, fields:list[str], mode:str="hash") -> any
mode ∈ {"hash","token","redact","generalize","noise"}; update field entries and downgrade pii_level. - enforce_retention(ds:any, ttl_days:int) -> any
Validate legal_hold, execute Mx-5, and produce deletion_log. - Contractual assertions (to combine with assert_contract):
k_anonymity_ok >= k_min, epsilon_total <= budget, pii_level(field) <= pii_level_max(role), rotate(K, t) <= t_rotate_max.
XV. Metrics and Gating Thresholds (Recommendations)
- De-identification sufficiency: min_j |E_j| >= k_min (default k_min=10), l >= 2, t <= 0.2 (using JSD_norm).
- DP budget: epsilon_total <= 3.0 (per-subject annual cap), delta <= 1e-6.
- Crypto operations: decrypt failure rate < 1e-6; key expiry rate = 0.
- Access policy: abnormal denial rate (> p95_ref * 1.5) triggers audit.
XVI. Incident Response and Postmortem (S69-11)
- Severity: Info (policy drift), Warn (possible exfiltration), Error (confirmed breach or policy violation).
- Response flow:
- Immediately freeze_release(tag) and revoke affected K_enc;
- Switch to de-identified views and read-only access;
- Produce incident_report = {ts, scope, fields, pii_level, blast_radius, keys_revoked, epsilon_state};
- Audit & remedy: update policy and harden tests under assert_contract;
- Regression check: re-evaluate quality and privacy impact on D_ref and D_new.
XVII. Interfaces with Neighboring Chapters
- With Chapter 6: index keys for high-tier fields should use de-identified keys (e.g., token_id); avoid reversible composite indexes.
- With Chapter 7: any schema change affecting privacy semantics is major+1 and must update manifest.privacy and migration scripts.
- With Chapter 8: recompute q_score and drift pre/post privacy transforms to avoid degrading utility via excessive noise.
XVIII. Executive Summary
Any dataset D entering production must:- Complete pii_level / gov_tag classification and minimization;
- Enforce AEAD encryption and key rotation;
- Pass anonymize / mask_fields and contract assertions;
- Establish retention and deletion_log;
- Persist full governance metadata and audit trails in the manifest.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/