support under-/overflows in YODA2 writer #56

20DM · 2024-03-07T23:29:41Z

Add support for under/overflow bins in YODA writer + unit test.

Skips points with infinity edges in YODA1 writer.

Needed for #55

20DM · 2024-03-07T23:50:41Z

Hi Graeme,

apologies for the delay, I had a few more urgent deadlines squeezing in, but I've finally managed to take a look at this.

This changes set ensures that under/overflow bins are supported with the default YODA writer.

For the YODA1 writer this is less trivial: In Rivet we'd work out the binning from the reference-data objects, so adding two more "dummy points" to the Scatters will likely cause mayhem. In the YODA1 series we would also typically drop the under/overflow bins from the MC histograms when converting them to Scatters, so not adding the overflows to the reference data seems more consistent with that, even though we'd be removing "good data points". This is what I've done in this PR now. Hopefully an additional incentive to upgrade to the Rivet4/YODA series where this is properly supported.

coveralls · 2024-03-08T00:22:47Z

coverage: 91.841% (+0.1%) from 91.731%
when pulling ac0b9d1 on 20DM:yoda_inf_support
into 1e5b10f on HEPData:main.

hepdata_converter/testsuite/test_yoda1writer.py

GraemeWatt

Thanks, this is great! After I merge this PR, I'll look into making similar changes for the CSV and ROOT writers, but I've also got other things to work on first, so it might take a while.

I made some separate comments about the naming of the new test functions and that it would be good to reduce the size of the test YAML data file. Looking at the YODA2 output, I don't see any explicit labels for the underflow/overflow bins, so how does the YODA code identify them? Does it just count the number of edges (n_e) and values (n_v) and assume that the first and last bins are underflow and overflow if n_v = n_e + 1? Does this mean that the case of an overflow bin without an underflow bin (or vice versa) is not supported (i.e. n_v = n_e)? If so, I think the YODA2 code needs to handle this case in the conversion from YAML, perhaps by just omitting the single underflow/overflow bin as in the YODA1 case or maybe by writing a value of zero for the missing underflow/overflow bin.

hepdata_converter/testsuite/test_yodawriter.py

hepdata_converter/testsuite/testdata/yaml_inf/figure-aux_001.yaml

20DM · 2024-03-09T15:47:55Z

Looking at the YODA2 output, I don't see any explicit labels for the underflow/overflow bins, so how does the YODA code identify them? Does it just count the number of edges (n_e) and values (n_v) and assume that the first and last bins are underflow and overflow if n_v = n_e + 1? Does this mean that the case of an overflow bin without an underflow bin (or vice versa) is not supported (i.e. n_v = n_e)? If so, I think the YODA2 code needs to handle this case in the conversion from YAML, perhaps by just omitting the single underflow/overflow bin as in the YODA1 case or maybe by writing a value of zero for the missing underflow/overflow bin.

Hi Graeme, continuous axes in YODA2 will always range from -inf to +inf and the list of edges that define the visible range will then partition the infinitely long axis. As a result a continuous axis will always have an under- and overflow bins, irrespective of whether these are needed in the end, so the cases you're concerned about are implicitly supported.

GraemeWatt

Sorry to nitpick, but there is still an issue with the test function naming (see separate comments).

Also, I don't think you understood my previous comment, so let me be more explicit with a minimal example. Consider two YAML data files with only two bins:

An underflow bin and a finite bin (0-100):

independent_variables:
- header:
    name: x
  values:
  - low: -.inf
    high: 0.0
  - low: 0.0
    high: 100.0
dependent_variables:
- header:
    name: data
  values:
  - value: 0.1
  - value: 0.2

A finite bin (0-100) and an overflow bin.

independent_variables:
- header:
    name: x
  values:
  - low: 0.0
    high: 100.0
  - low: 100.0
    high: .inf
dependent_variables:
- header:
    name: data
  values:
  - value: 0.1
  - value: 0.2

I checked that both these examples give the same YODA output with the current code. I think it is a legitimate use case to give an overflow bin without an underflow bin (or vice versa) in the YAML format, so the YODA output would need to write a zero value (with no errors) for the missing underflow/overflow bin to remove the ambiguity.

hepdata_converter/testsuite/test_yodawriter.py

20DM · 2024-03-11T21:39:42Z

Hi Graeme,

I'm afraid I'm still a bit confused as to what the issue actually is, but it's probably just me. :-)

Let me try elaborate on my original response: All continuous axes in YODA will always come with an underflow and overflow whether they're needed or not. For the reference data we use BinnedEstimtates, which are initialised with a central value nan. The two edge cases that you point out will then look like this (for example):

IsRef: 1
Path: /REF/TEST/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [1.200000e+03, 1.220000e+03]
ErrorLabels: ["statistical"]
# value      	errDn(1)     	errUp(1)
6.790900e-01 	-2.116800e-04	2.117400e-04
6.402400e-02 	-6.499500e-05	6.506200e-05
nan          	---          	---
END YODA_ESTIMATE1D_V3


BEGIN YODA_ESTIMATE1D_V3 /REF/TEST/d02-x01-y01
IsRef: 1
Path: /REF/TEST/d02-x01-y01
Title: "Figure-aux_002-a"
Type: Estimate1D
---
Edges(A1): [2.080000e+03, 2.100000e+03]
ErrorLabels: ["statistical"]
# value      	errDn(1)     	errUp(1)
nan          	---          	---
6.719900e-06 	-5.133800e-07	5.541800e-07
1.222500e-04 	-2.191800e-06	2.231400e-06
END YODA_ESTIMATE1D_V3

Here, there first one corresponds to an object with underflow but no overflow (and hence the nan line comes last, i.e. the "overflow" position) and the second one corresponds to an object with overflow but no underflow (and hence the nan line comes first, i.e. the "underflow" position). The two cases that you suggested do not result in the same output with the current code for me. 🤷

* test_parse_no_independent is more accurate.

GraemeWatt

Apologies, I must have made a mistake when checking the YODA output yesterday. Checking again more carefully, the two YAML examples from my previous message give the following YODA output:

An underflow bin and a finite bin (0-100):

BEGIN YODA_ESTIMATE1D_V3 /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
IsRef: 1
Path: /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [0.000000e+00, 1.000000e+02]
# value         
1.000000e-01    
2.000000e-01    
nan             
END YODA_ESTIMATE1D_V3

A finite bin (0-100) and an overflow bin.

BEGIN YODA_ESTIMATE1D_V3 /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
IsRef: 1
Path: /REF/RIVET_ANALYSIS_NAME/d01-x01-y01
Title: "Figure-aux_001"
Type: Estimate1D
---
Edges(A1): [0.000000e+00, 1.000000e+02]
# value         
nan             
1.000000e-01    
2.000000e-01    
END YODA_ESTIMATE1D_V3

So the nan value is in a different position for the two cases as you kindly explained. Sorry, the misunderstanding was on my side!

I'll merge now. Many thanks again for your contribution!

20DM added 3 commits March 6, 2024 19:38

support under-/overflows in YODA2 writer

70e7908

add unit test

8071515

protect YODA1 writer against under/overflows

cd72583

20DM added 2 commits March 7, 2024 19:14

protect against negative nans

9edc748

more protection against negative nans

4bacfc7

GraemeWatt reviewed Mar 8, 2024

View reviewed changes

hepdata_converter/testsuite/test_yoda1writer.py Outdated Show resolved Hide resolved

GraemeWatt requested changes Mar 8, 2024

View reviewed changes

hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved

hepdata_converter/testsuite/testdata/yaml_inf/figure-aux_001.yaml Outdated Show resolved Hide resolved

20DM added 3 commits March 9, 2024 10:03

simplifying tests

7c2c6df

fix name

ec36997

fix name

f740354

GraemeWatt requested changes Mar 11, 2024

View reviewed changes

hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved

hepdata_converter/testsuite/test_yodawriter.py Outdated Show resolved Hide resolved

20DM added 2 commits March 11, 2024 16:01

fix function names

76a3b19

Merge branch 'main' into yoda_inf_support

dce72c0

testsuite: rename test_parse_no_dependent function

ac0b9d1

* test_parse_no_independent is more accurate.

GraemeWatt approved these changes Mar 12, 2024

View reviewed changes

GraemeWatt merged commit d543b76 into HEPData:main Mar 12, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support under-/overflows in YODA2 writer #56

support under-/overflows in YODA2 writer #56

20DM commented Mar 7, 2024 •

edited

Loading

20DM commented Mar 7, 2024

coveralls commented Mar 8, 2024 •

edited

Loading

GraemeWatt left a comment •

edited

Loading

20DM commented Mar 9, 2024

GraemeWatt left a comment •

edited

Loading

20DM commented Mar 11, 2024

GraemeWatt left a comment •

edited

Loading

support under-/overflows in YODA2 writer #56

support under-/overflows in YODA2 writer #56

Conversation

20DM commented Mar 7, 2024 • edited Loading

20DM commented Mar 7, 2024

coveralls commented Mar 8, 2024 • edited Loading

GraemeWatt left a comment • edited Loading

Choose a reason for hiding this comment

20DM commented Mar 9, 2024

GraemeWatt left a comment • edited Loading

Choose a reason for hiding this comment

20DM commented Mar 11, 2024

GraemeWatt left a comment • edited Loading

Choose a reason for hiding this comment

20DM commented Mar 7, 2024 •

edited

Loading

coveralls commented Mar 8, 2024 •

edited

Loading

GraemeWatt left a comment •

edited

Loading

GraemeWatt left a comment •

edited

Loading

GraemeWatt left a comment •

edited

Loading