Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support under-/overflows in YODA2 writer #56

Merged
merged 11 commits into from
Mar 12, 2024

Conversation

20DM
Copy link
Collaborator

@20DM 20DM commented Mar 7, 2024

Add support for under/overflow bins in YODA writer + unit test.

Skips points with infinity edges in YODA1 writer.

Needed for #55

@20DM
Copy link
Collaborator Author

20DM commented Mar 7, 2024

Hi Graeme,

apologies for the delay, I had a few more urgent deadlines squeezing in, but I've finally managed to take a look at this.

This changes set ensures that under/overflow bins are supported with the default YODA writer.

For the YODA1 writer this is less trivial: In Rivet we'd work out the binning from the reference-data objects, so adding two more "dummy points" to the Scatters will likely cause mayhem. In the YODA1 series we would also typically drop the under/overflow bins from the MC histograms when converting them to Scatters, so not adding the overflows to the reference data seems more consistent with that, even though we'd be removing "good data points". This is what I've done in this PR now. Hopefully an additional incentive to upgrade to the Rivet4/YODA series where this is properly supported.

@coveralls
Copy link

coveralls commented Mar 8, 2024

Coverage Status

coverage: 91.841% (+0.1%) from 91.731%
when pulling ac0b9d1 on 20DM:yoda_inf_support
into 1e5b10f on HEPData:main.

Copy link
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is great! After I merge this PR, I'll look into making similar changes for the CSV and ROOT writers, but I've also got other things to work on first, so it might take a while.

I made some separate comments about the naming of the new test functions and that it would be good to reduce the size of the test YAML data file. Looking at the YODA2 output, I don't see any explicit labels for the underflow/overflow bins, so how does the YODA code identify them? Does it just count the number of edges (n_e) and values (n_v) and assume that the first and last bins are underflow and overflow if n_v = n_e + 1? Does this mean that the case of an overflow bin without an underflow bin (or vice versa) is not supported (i.e. n_v = n_e)? If so, I think the YODA2 code needs to handle this case in the conversion from YAML, perhaps by just omitting the single underflow/overflow bin as in the YODA1 case or maybe by writing a value of zero for the missing underflow/overflow bin.

hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved
@20DM
Copy link
Collaborator Author

20DM commented Mar 9, 2024

Looking at the YODA2 output, I don't see any explicit labels for the underflow/overflow bins, so how does the YODA code identify them? Does it just count the number of edges (n_e) and values (n_v) and assume that the first and last bins are underflow and overflow if n_v = n_e + 1? Does this mean that the case of an overflow bin without an underflow bin (or vice versa) is not supported (i.e. n_v = n_e)? If so, I think the YODA2 code needs to handle this case in the conversion from YAML, perhaps by just omitting the single underflow/overflow bin as in the YODA1 case or maybe by writing a value of zero for the missing underflow/overflow bin.

Hi Graeme, continuous axes in YODA2 will always range from -inf to +inf and the list of edges that define the visible range will then partition the infinitely long axis. As a result a continuous axis will always have an under- and overflow bins, irrespective of whether these are needed in the end, so the cases you're concerned about are implicitly supported.

Copy link
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to nitpick, but there is still an issue with the test function naming (see separate comments).

Also, I don't think you understood my previous comment, so let me be more explicit with a minimal example. Consider two YAML data files with only two bins:

  1. An underflow bin and a finite bin (0-100):
independent_variables:
- header:
    name: x
  values:
  - low: -.inf
    high: 0.0
  - low: 0.0
    high: 100.0
dependent_variables:
- header:
    name: data
  values:
  - value: 0.1
  - value: 0.2
  1. A finite bin (0-100) and an overflow bin.
independent_variables:
- header:
    name: x
  values:
  - low: 0.0
    high: 100.0
  - low: 100.0
    high: .inf
dependent_variables:
- header:
    name: data
  values:
  - value: 0.1
  - value: 0.2

I checked that both these examples give the same YODA output with the current code. I think it is a legitimate use case to give an overflow bin without an underflow bin (or vice versa) in the YAML format, so the YODA output would need to write a zero value (with no errors) for the missing underflow/overflow bin to remove the ambiguity.

hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved
hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved
@20DM
Copy link
Collaborator Author

20DM commented Mar 11, 2024

Hi Graeme,

I'm afraid I'm still a bit confused as to what the issue actually is, but it's probably just me. :-)

Let me try elaborate on my original response: All continuous axes in YODA will always come with an underflow and overflow whether they're needed or not. For the reference data we use BinnedEstimtates, which are initialised with a central value nan. The two edge cases that you point out will then look like this (for example):

IsRef: 1
Path: /REF/TEST/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [1.200000e+03, 1.220000e+03]
ErrorLabels: ["statistical"]
# value      	errDn(1)     	errUp(1)
6.790900e-01 	-2.116800e-04	2.117400e-04
6.402400e-02 	-6.499500e-05	6.506200e-05
nan          	---          	---
END YODA_ESTIMATE1D_V3


BEGIN YODA_ESTIMATE1D_V3 /REF/TEST/d02-x01-y01
IsRef: 1
Path: /REF/TEST/d02-x01-y01
Title: "Figure-aux_002-a"
Type: Estimate1D
---
Edges(A1): [2.080000e+03, 2.100000e+03]
ErrorLabels: ["statistical"]
# value      	errDn(1)     	errUp(1)
nan          	---          	---
6.719900e-06 	-5.133800e-07	5.541800e-07
1.222500e-04 	-2.191800e-06	2.231400e-06
END YODA_ESTIMATE1D_V3

Here, there first one corresponds to an object with underflow but no overflow (and hence the nan line comes last, i.e. the "overflow" position) and the second one corresponds to an object with overflow but no underflow (and hence the nan line comes first, i.e. the "underflow" position). The two cases that you suggested do not result in the same output with the current code for me. 🤷

* test_parse_no_independent is more accurate.
Copy link
Member

@GraemeWatt GraemeWatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I must have made a mistake when checking the YODA output yesterday. Checking again more carefully, the two YAML examples from my previous message give the following YODA output:

  1. An underflow bin and a finite bin (0-100):
BEGIN YODA_ESTIMATE1D_V3 /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
IsRef: 1
Path: /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [0.000000e+00, 1.000000e+02]
# value         
1.000000e-01    
2.000000e-01    
nan             
END YODA_ESTIMATE1D_V3
  1. A finite bin (0-100) and an overflow bin.
BEGIN YODA_ESTIMATE1D_V3 /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
IsRef: 1
Path: /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [0.000000e+00, 1.000000e+02]
# value         
nan             
1.000000e-01    
2.000000e-01    
END YODA_ESTIMATE1D_V3

So the nan value is in a different position for the two cases as you kindly explained. Sorry, the misunderstanding was on my side!

I'll merge now. Many thanks again for your contribution!

@GraemeWatt GraemeWatt merged commit d543b76 into HEPData:main Mar 12, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants