bug: probabilities not integrating to 1. #292

mpilosov · 2017-06-15T02:04:35Z

I think have now confirmed that this is a bug (after wrestling with this issue all day):

using more than than 1 cell per dimension to discretize observed densities can yield input probabilities not integrating to 1.

What they integrate to is the proportion of bins in the output space that had samples land in them to total bins in output space.

Here's how to recreate the bug easily and see what I mean.
go to examples/contaminant/

Change lines 63-65 to

simpleFunP.regular_partition_uniform_distribution_rectangle_scaled(data_set=my_discretization,
                                                                     Q_ref=Q_ref,
                                                                     rect_scale=1.0,
                                                                     cells_per_dimension = 2)

So that the observed density covers the whole data space and we discretize it with more than 1 cell per dimension. Every input sample will be assigned positive probability (you can check this with print sum(my_discretization._input_sample_set._probabilities>0) and print my_discretization._input_sample_set._check_num

Now, launch iPython and %cpaste the first 75 lines of contaminant.py (everything up until the plotting)
Finally, run
print sum(my_discretization._input_sample_set._probabilities)
and you will get 0.9375 instead of 1 (which is what you get if you run the first 75 lines without the aforementioned change).

In fact, you'll get this same quantity even if you leave rect_scale=0.25 unchanged and only change cells_per_dimension=1 to cells_per_dimension=2 (change it to 3, 4, … and the sum of the probabilities keeps decreasing).

I changed rect_scale=1 just so that I knew I would capture the entirety data space, but it's obviously not important to the recreation of the bug (this is a bug, right? not a feature? Was I wrong to assume probabilities is supposed to integrate to 1?)

I proposed a fix in the comments of issue #288 and with your blessing(s) can go ahead and implement this change (feel free to assign it to me) and submit a pull request.

It's not an elegant fix, so I want to make sure there are no objections first before going forward with it. I'm fairly sure (but could use confirmation) this change would need to apply to the other functions in calculateP as well.

Alternatively, we can let the probabilities not integrate to 1 (perhaps this was intentional?) and just say that it's the probability measure up to a constant of proportionality, and let the user scale it themselves with something like
my_discretization._input_sample_set._probabilities/=sum(my_discretization._input_sample_set._probabilities)

Thoughts? This issue consumed most of my workday today. I hope I actually caught a bug and am not just mistaken in my assumption.

The text was updated successfully, but these errors were encountered:

mathematicalmichael · 2019-01-23T15:23:19Z

@smattis @eecsu this appears to still be a thing.
Can we please address it so I can close the issue?

smattis · 2019-05-20T14:48:09Z

@mathematicalmichael I know this is really old, but I know exactly what is happening here. When no input samples map under the forward map to a cell in the output probability sample set, the probability of that set is not distributed to the parameter space. If the probabilities sum to something less than one, it really identifies that you are not sampling (or emulating) enough in the input space for the given output probability sample set. In this case where the dimension of the output space is 4, changing the number of cells per dimension from 1 to 2 changes the relative volume of the cells from 1.0 to (0.5)^4=0.0625. In the case above, there is one of those 16 cells that are not getting mapped to, so its probability is not being distributed.

We should probably keep it how it is, but raise a warning that you are undersampling, if they don't sum to 1.

smattis · 2019-05-20T14:52:20Z

This also shows that with simpleFunP.regular_partition_uniform_distribution_rectangle_scaled, you probably shouldn't use cells_per_dimension > 1 if the dimension of the output space is bigger than 2 or 3.

mathematicalmichael · 2019-05-20T14:55:53Z

@smattis thanks for the explanation. Dimensionality certainly can hit us, eh? =)
I think raising a warning would be a great idea. I came across a few instances in my new branches where similar nuances called for warnings and it really helped me later on when I was using the code to run some experiments. I think WARN is appropriate here?

Perhaps adding a LOG or WARN additionally to address cells_per_dimension > 1 ... something like Volume of cells specified may require additional samples. Be aware that probabilities may not sum to unity. and then reinforced with Probabilities do not integrate to unity. This is likely due to under-sampling in high dimension. Consider adding samples using merge.

eecsu · 2019-05-20T19:59:37Z

This is basically a violation of a discrete/set-valued form of the "predictability assumption." It is kind of equivalent to the expected value of r not being near 1 in the sample-based density approach. We should probably think of a way to phrase that into the warning that suggests the user may want to use a different solution approach (e.g., the sampling based approach instead of set-valued approach) to avoid this issue.

mathematicalmichael · 2019-05-20T20:14:59Z

@eecsu any thoughts on the phrasing of what I put up? Keep all/part? Or replace entirely with suggestion to use sampling approach (once that gets merged)? I think it’s helpful to explain some context behind why we raise a warning. Can discuss in this thread.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: probabilities not integrating to 1. #292

bug: probabilities not integrating to 1. #292

mpilosov commented Jun 15, 2017 •

edited

Loading

mathematicalmichael commented Jan 23, 2019

smattis commented May 20, 2019

smattis commented May 20, 2019

mathematicalmichael commented May 20, 2019

eecsu commented May 20, 2019

mathematicalmichael commented May 20, 2019 •

edited

Loading

bug: probabilities not integrating to 1. #292

bug: probabilities not integrating to 1. #292

Comments

mpilosov commented Jun 15, 2017 • edited Loading

mathematicalmichael commented Jan 23, 2019

smattis commented May 20, 2019

smattis commented May 20, 2019

mathematicalmichael commented May 20, 2019

eecsu commented May 20, 2019

mathematicalmichael commented May 20, 2019 • edited Loading

mpilosov commented Jun 15, 2017 •

edited

Loading

mathematicalmichael commented May 20, 2019 •

edited

Loading