Grid Guess #88
PaulWAyers
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When generating a grid for the grid-based selection, we need to construct an appropriate guess for the number of bins.
If we assume that the data is uniformly distributed (multi-dimensional-Poisson process), and there are
N
data points in the volume of the data,Vol
, then the density of points isdensity = N/Vol
. If we divide inton_bins
bins of equal size, then the volume of a bin isv_bin = Vol/n_bins
. The probability that a bin is empty is, assuming a Poisson process,p(empty) = exp(-1*N/n_bins)
.Choosing n_bins < N/3 would correspond to roughly 5% of bins being empty.
Choosing n_bins < N/5 would correspond to roughtly 1% (~0.6%) of bins being empty.
Choosing n_bins < N/6 would correspond to roughly 1e-4% (ppm) of bins being empty.
I'm not sure what the default should be. One could ask the user to specify the probability of an empty bin or select something else.
Beta Was this translation helpful? Give feedback.
All reactions