Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Sum of sliced histogram #621

Closed
Nollde opened this issue Jul 27, 2021 · 2 comments
Closed

[BUG] Sum of sliced histogram #621

Nollde opened this issue Jul 27, 2021 · 2 comments

Comments

@Nollde
Copy link

Nollde commented Jul 27, 2021

Describe the bug

When slicing a histogram, and summing over the sliced axis, the values which have been "sliced away" enter the sum.
I am not sure if this is a bug or an expected behaviour.
If the behaviour is expected, I would greatly appreciate if you could tell me how this operation should properly be done.

Issue may be linked to #281.

Steps to reproduce


In [1]: import boost_histogram as bh

In [2]: bh.__version__
Out[2]: '1.0.2'

In [3]: hist = bh.Histogram(
   ...:     bh.axis.StrCategory(["a", "b"]),
   ...:     bh.axis.Regular(1, 0, 2),
   ...:     storage=bh.storage.Int64(),
   ...: )

In [4]: hist.fill(["a", "a", "a", "b", "b"], [1, 1, 3, 1, 1])
Out[4]:
Histogram(
  StrCategory(['a', 'b']),
  Regular(1, 0, 2),
  storage=Int64()) # Sum: 4.0 (5.0 with flow)

In [5]: hist[:1, :]
Out[5]:
Histogram(
  StrCategory(['a']),
  Regular(1, 0, 2),
  storage=Int64()) # Sum: 2.0 (5.0 with flow)

In [6]: hist[:1, :][::sum,:]
Out[6]: Histogram(Regular(1, 0, 2), storage=Int64()) # Sum: 4.0 (5.0 with flow)  # sum includes values from bin "b"

In [7]: hist[:1, :].view()
Out[7]: array([[2]], dtype=uint64)

In [8]: hist[:1, :][::sum,:].view()
Out[8]: array([4], dtype=uint64)  # sum includes values from bin "b"
@henryiii
Copy link
Member

This is the expected behavior. Leaving an endpoint off will include underflow or overflow; you should think of no endpoint as "infinite". If you don't want this behavior, make the endpoints explicit (such as [0:len:sum]), or use an axis that does not have flow bins (underflow=False, overflow=False when you make the axis). For a category axis, there is only an overflow bin, and we don't have a no flow version (IIRC), so you should use the first suggestion.

@Nollde
Copy link
Author

Nollde commented Aug 19, 2021

Hi @henryiii , thanks for your explanation.
As I can not easily change the code which does the sum, I circumenvented the problem by using two histograms.
Thank you for your help.
Closing the issue as this appears to be the expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants