Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add a function to integrate axes (including over partial ranges) #501

Closed
Dominic-Stafford opened this issue Jun 7, 2023 · 1 comment · Fixed by #505
Closed
Labels
enhancement New feature or request

Comments

@Dominic-Stafford
Copy link

Dominic-Stafford commented Jun 7, 2023

It would be useful to have an integrate function, which could be used to do the following:

  1. Remove a single axis from a histogram, reducing its dimension by 1: h.integrate("y")
  2. Integrate over a range of an axis: h.integrate("y", i, j)
  3. Sum certain entries from a category axis: h.integrate("y", ["cats", "dogs"])

Currently it is possible to do all of these things, however the syntax is unclear and there are a number of pitfalls:

  1. Can reasonably easily be achieved with h[{"y": sum}] or h[{"y": slice(None, None, sum)}], though would be nice to add for completeness.
  2. Can be achieved with h[{"y": slice(i, j, sum)}], however the more obvious h[:, i:j]["y": sum] will give the wrong result, since sum includes the overflow as noted here: [BUG] Sum of sliced histogram boost-histogram#621
  3. For this, the corresponding h[{"y": ["cats", "dogs"]}][{"y": sum}] almost works, as with this slice any other categories don't seem to be added to the overflow. However, if the overflow already contains entries, these will be added to the sum, so seemingly the only way to get the correct result is to do the sum by hand: h[{"y": "cats"}]+h[{"y": "dogs"}] which could quickly become laborious. (Could be done as h[{"y": ["cats", "dogs"]}][{"y": slice(0, len, sum)}])

Linked to this issue, it would be helpful if one could specify whether to include the overflows when projecting out axes using the project method, which if adding a new function is not desired, would at least make some other work-arounds easier.

@Dominic-Stafford Dominic-Stafford added the enhancement New feature or request label Jun 7, 2023
@henryiii
Copy link
Member

henryiii commented Jun 9, 2023

@fabriceMUKARAGE, here is a rough draft of what the method of BaseHist would look like.

# Loc is int | str | ...
def integrate(self, name: int | str, i_or_list: Loc | list[str | int] | None = None, j: Loc | None = None]) -> Self:
    if is_instance(i_or_list, list):
        return self[{name: i_or_list}][{name: slice(0, len, sum)}]
    
    return self[{name: slice(i_or_list, j, sum}]

Rough draft of tests:

def test_integrate_simple_cat():
    h = hist.new.IntCat([4, 1, 2], name="x").StrCat(["AB", "BCC", "BC"], name="y").Int()
    h.fill(4, "AB", 1)
    h.fill(4, "BCC", 2)
    h.fill(4, "BC", 4)
    h.fill(4, "X", 8)
    h1 = h.integrate("y", ["AB", "BC"])
    assert h1[4j] == 5

henryiii added a commit that referenced this issue Jun 16, 2023
Closes #501.

---------

Signed-off-by: Henry Schreiner <[email protected]>
Co-authored-by: fabriceMUKARAGE <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Henry Schreiner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants