[FEATURE] Add a function to integrate axes (including over partial ranges) #501

Dominic-Stafford · 2023-06-07T17:56:18Z

It would be useful to have an integrate function, which could be used to do the following:

Remove a single axis from a histogram, reducing its dimension by 1: h.integrate("y")
Integrate over a range of an axis: h.integrate("y", i, j)
Sum certain entries from a category axis: h.integrate("y", ["cats", "dogs"])

Currently it is possible to do all of these things, however the syntax is unclear and there are a number of pitfalls:

Can reasonably easily be achieved with h[{"y": sum}] or h[{"y": slice(None, None, sum)}], though would be nice to add for completeness.
Can be achieved with h[{"y": slice(i, j, sum)}], however the more obvious h[:, i:j]["y": sum] will give the wrong result, since sum includes the overflow as noted here: [BUG] Sum of sliced histogram boost-histogram#621
For this, the corresponding h[{"y": ["cats", "dogs"]}][{"y": sum}] almost works, as with this slice any other categories don't seem to be added to the overflow. However, if the overflow already contains entries, these will be added to the sum, so seemingly the only way to get the correct result is to do the sum by hand: h[{"y": "cats"}]+h[{"y": "dogs"}] which could quickly become laborious. (Could be done as h[{"y": ["cats", "dogs"]}][{"y": slice(0, len, sum)}])

Linked to this issue, it would be helpful if one could specify whether to include the overflows when projecting out axes using the project method, which if adding a new function is not desired, would at least make some other work-arounds easier.

The text was updated successfully, but these errors were encountered:

henryiii · 2023-06-09T15:09:16Z

@fabriceMUKARAGE, here is a rough draft of what the method of BaseHist would look like.

# Loc is int | str | ...
def integrate(self, name: int | str, i_or_list: Loc | list[str | int] | None = None, j: Loc | None = None]) -> Self:
    if is_instance(i_or_list, list):
        return self[{name: i_or_list}][{name: slice(0, len, sum)}]
    
    return self[{name: slice(i_or_list, j, sum}]

Rough draft of tests:

def test_integrate_simple_cat():
    h = hist.new.IntCat([4, 1, 2], name="x").StrCat(["AB", "BCC", "BC"], name="y").Int()
    h.fill(4, "AB", 1)
    h.fill(4, "BCC", 2)
    h.fill(4, "BC", 4)
    h.fill(4, "X", 8)
    h1 = h.integrate("y", ["AB", "BC"])
    assert h1[4j] == 5

Closes #501. --------- Signed-off-by: Henry Schreiner <[email protected]> Co-authored-by: fabriceMUKARAGE <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Henry Schreiner <[email protected]>

Dominic-Stafford added the enhancement New feature or request label Jun 7, 2023

henryiii mentioned this issue Jun 16, 2023

feat: Add a function to integrate axes #505

Merged

henryiii closed this as completed in #505 Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add a function to integrate axes (including over partial ranges) #501

[FEATURE] Add a function to integrate axes (including over partial ranges) #501

Dominic-Stafford commented Jun 7, 2023 •

edited by henryiii

Loading

henryiii commented Jun 9, 2023 •

edited

Loading

[FEATURE] Add a function to integrate axes (including over partial ranges) #501

[FEATURE] Add a function to integrate axes (including over partial ranges) #501

Comments

Dominic-Stafford commented Jun 7, 2023 • edited by henryiii Loading

henryiii commented Jun 9, 2023 • edited Loading

Dominic-Stafford commented Jun 7, 2023 •

edited by henryiii

Loading

henryiii commented Jun 9, 2023 •

edited

Loading