Home / Docs-Technical WhitePaper / 09-EFT.WP.Core.Density v1.0
Chapter 7 — Discretization and Histograms
I. Objectives and Scope
- Establish a unified notation, normalization, and conservation convention for histograms and voxelization across this volume, distinguishing discretized representations of probability density p(x) versus physical density rho(x).
- Provide density estimators for equal-width and variable-width bins, weighted samples, and 1D/2D/3D settings; define conservation checks and regridding (refine/coarsen) via workflow Mx-96.
- Align sampling assumptions with Core.Sea Chapters 2/4/5 (sampling, anti-alias H(f), and S_xx(f)), and enforce Jacobian-bearing variable transforms per this volume’s Chapter 9.
II. Notation and Conventions
- Data and samples: sample count N, samples x_i, nonnegative weights w_i ≥ 0 (default w_i = 1 if omitted).
- Edges and bins: edges = {e_0, ..., e_K}, bin j is [e_j, e_{j+1}) (the last bin may be closed); width Delta_j = e_{j+1} - e_j; equal-width bins use Delta.
- Counts and voxels: count_j for bin counts; voxel volumes V_i; physical density grid values rho_i.
- Missing mask: m_i ∈ {0,1}; only m_i = 1 contribute to sample counts and mass.
III. One-Dimensional Histogram Density Estimation (Probability View)
- Equal-width histogram density:
S92-10 : p_hat(x ∈ bin j) = count_j / ( N * Delta ). - Variable-width histogram density:
S92-18 : p_hat(x ∈ bin j) = count_j / ( N * Delta_j ). - Weighted samples (with total weight W = ∑ w_i) in variable-width bins:
S92-19 : p_hat(x ∈ bin j) = ( ∑_{i ∈ bin j} w_i ) / ( W * Delta_j ). - Normalization check:
∑_{j=0}^{K-1} p_hat_j * Delta_j = 1; otherwise the implementation is incorrect or edge overflow/underflow is unhandled. - Suggested metadata: {K, edges, Delta_j or Delta, weighted: bool, W, underflow, overflow}.
IV. Multivariate Histograms and Voxelization (Probability and Physical Views)
- 2D probability histogram (equal-width example):
S92-20 : p_hat[j,k] = count_{j,k} / ( N * Delta_x * Delta_y ). - Physical density voxel conservation:
S92-11 : mass_preserve = ( ∑_i rho_i * V_i ); publish the target M_ref = ( ∫ rho dV ) or a baseline mass for reconciliation. - Cell mass and piecewise-constant reconstruction:
- S92-25 : M_c = ( ∫_{cell c} rho(x) dV );
- S92-26 : rho_tilde(x ∈ cell c) = M_c / V_c.
- Curvilinear coordinates (discrete Jacobian):
rho_u(u) = rho_x( x(u) ) * | det( ∂x/∂u ) |; compute V_i in the target coordinate system.
V. Bin-Width Selection Rules and Recommendations
- Freedman–Diaconis (robust to tails):
S92-22 : Delta_FD = 2 * IQR_x / N^(1/3), with IQR_x = Q3 - Q1. - Scott (optimal MSE for Gaussian):
S92-23 : Delta_Scott = 3.5 * sigma_x / N^(1/3). - Sturges (rule-of-thumb bin count):
S92-24 : K_Sturges = ceil( log2(N) + 1 ). - Guidance: use Delta_FD for unknown shapes and moderate N; Delta_Scott for near-Gaussian data; Sturges for small-sample initialization then refine via CV(h) or likelihood criteria.
VI. Boundaries, Overflows, and Missingness
- Boundary convention: default [e_j, e_{j+1}) (left-closed, right-open); the terminal bin may be closed to include the maximum.
- Overflow accounting: record underflow = count( x < e_0 ), overflow = count( x ≥ e_K ); either map to extra bins or exclude with justification in the report.
- Missing: samples with m_i = 0 do not contribute to N or W; report missing_rate = 1 - ( ∑ m_i / N_raw ).
- De-trending and scaling: if you apply z = ( x - mu_x ) / sigma_x (see S92-14, Chapter 9), publish in original units to avoid dimension confusion.
VII. Error, Bias, and Uncertainty (Histogram Convention)
- Leading-order binning bias (1D, smooth p(x), equal-width):
S92-27 : bias(x) ≈ ( Delta^2 / 24 ) * p''(x). - Single-bin variance (Bernoulli counting, variable width):
S92-28 : var( p_hat_j ) ≈ p_j * ( 1 - p_j ) / ( N * Delta_j^2 ), where p_j = ( ∫_{bin j} p(x) dx ). - Weighted case: replace N by W and interpret p_j as weight-normalized probability mass; state the approximate-independence assumption (per the volume’s “quality loop”).
VIII. Conservation and Resampling (Refine / Coarsen)
- Refinement (coarse→fine): keep cell mass invariant; distribute M_c over child cells c' by volume or overlap, ensuring ∑ M_{c'} = M_c.
- Coarsening (fine→coarse): aggregate fine-grid rho_i * V_i into coarse cells and divide by V_c to obtain rho_coarse, preserving S92-11.
- Cross-grid transforms (with coordinate changes): compute overlap volumes or use conservative remapping; ensure | M_target - M_source | / M_source ≤ tol (recommend tol ≤ 1e-6).
- Mass check: report mass_rel_err = | ( ∑ rho_i V_i ) - M_ref | / M_ref; if > tol, flag and roll back.
IX. Engineering Workflow Mx-96 (Conservative Refinement/Coarsening)
- Inputs. mode ∈ {"hist-pdf","grid-phys"}; edges or target grid; m_i and w_i; reference mass M_ref (physical view).
- Preprocessing. Drop m_i = 0; handle overflows; optional normalization/de-trending (publish in original units).
- Bin/grid selection. Choose edges via Delta_FD / Delta_Scott / K_Sturges or external policy; compute Delta_j and metadata.
- Statistics and estimation.
- Probability view: p_hat_j via S92-18/19.
- Physical view: compute rho_i and M_c / M_ref.
- Conservation checks.
- Probability: ∑ p_hat_j * Delta_j = 1.
- Physical: verify S92-11, output mass_rel_err.
- Resampling (optional). Refine/coarsen to target grid with conservative remapping; repeat conservation checks.
- Report and persistence. Emit hist.parquet|nc with suggested fields: edges, centers, Delta_j, counts, p_hat, weighted, W, underflow, overflow, mass_rel_err, qc{}.
- Provenance and cross-volume alignment. Record ts, tau_mono, fmt, chan; if the time axis is aligned by T_arr, attach delta_form (see Core.Sea Chapter 8).
X. Interface Contract (Aligned with I90 4)
- bin_edges(domain:any, rule:str="fd") -> array
rule ∈ {"fd","scott","sturges","fixed"}; return edges and Delta_j (array for variable-width). - hist_density(data:any, edges:any, normalize:bool=True) -> PdfRef
Input may include weights, mask; suggested output fields:
{"edges":..., "centers":..., "counts":..., "p_hat":..., "Delta":..., "weighted":..., "underflow":..., "overflow":..., "qc":{"sum_pDelta":..., "mass_rel_err":...}}.
XI. Cross-Volume and Cross-Chapter Consistency
- Sampling and anti-alias constraints per Core.Sea Chapters 2/4; a histogram is a kernel density estimator with a rectangular kernel—its bin width Delta corresponds to KDE bandwidth h (Chapter 4).
- Variable transforms and normalization follow this volume’s Chapter 9 (S92-14/15); uncertainty propagation ties into Chapter 10.
- If you “spectralize” histograms, use the PSD conventions S92-30..S92-38 from Chapter 6, and publish ENBW_Hz and U_w alongside.
XII. Minimal Publication Manifest (Suggested)
N, K, edges, Delta_j or Delta, counts, p_hat, weighted, W, underflow, overflow, sum_pDelta, mode, unit(x), unit(p_hat), ts, tau_mono, fmt, q_score.
XIII. Chapter Highlights
- Histograms/voxelization center on S92-10/18/19/20 for density estimation and S92-11 for mass conservation.
- Bin selection grounded in S92-22/23/24; publication must include edges, widths, and a normalization check.
- Workflow Mx-96 guarantees conservative refine/coarsen with traceability and seamless interoperability via interface I90 4.
Copyright & License (CC BY 4.0)
Copyright: Unless otherwise noted, the copyright of “Energy Filament Theory” (text, charts, illustrations, symbols, and formulas) belongs to the author “Guanglin Tu”.
License: This work is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0). You may copy, redistribute, excerpt, adapt, and share for commercial or non‑commercial purposes with proper attribution.
Suggested attribution: Author: “Guanglin Tu”; Work: “Energy Filament Theory”; Source: energyfilament.org; License: CC BY 4.0.
First published: 2025-11-11|Current version:v5.1
License link:https://creativecommons.org/licenses/by/4.0/