BUG: generate valid `EBMModel` when merging #578

DerWeh · 2024-10-19T18:35:18Z

This PR is meant to fix #576. Upon merging, we average numerical parameters.
I would add a warning to the documentation, that we do not guarantee that EBMs produced by merge_ebms can be fitted. You often allow for so many different types of arguments, that hardly can catch every edge case meaningfully.

@paulbkoch please take a look at _initialize_ebm. Is this how you imagine handling the parameters? I would appreciate some early feedback. If this is in general what you have in mind, I'll try cleaning up the code some more. But it's quite messy, as your API is so flexible.

TODO

Cover special cases monotonize and exclude
Document limitations of parameter inference

Signed-off-by: DerWeh <[email protected]>

codecov · 2024-10-19T18:43:24Z

Codecov Report

Attention: Patch coverage is 42.59259% with 62 lines in your changes missing coverage. Please review.

Project coverage is 73.99%. Comparing base (ef742fd) to head (cbf4507).

Files with missing lines	Patch %	Lines
...erpret-core/interpret/glassbox/_ebm/_merge_ebms.py	42.59%	62 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    76.73%   73.99%   -2.74%     
===========================================
  Files           72       72              
  Lines         9622     9634      +12     
===========================================
- Hits          7383     7129     -254     
- Misses        2239     2505     +266

Flag	Coverage Δ
bdist_linux_310_python	`69.58% <42.59%> (-6.76%)`	⬇️
bdist_linux_311_python	`69.58% <42.59%> (-6.84%)`	⬇️
bdist_linux_312_python	`69.48% <42.59%> (-6.87%)`	⬇️
bdist_linux_39_python	`69.57% <42.99%> (-6.77%)`	⬇️
bdist_mac_310_python	`73.33% <42.59%> (-3.18%)`	⬇️
bdist_mac_311_python	`73.78% <42.59%> (-2.74%)`	⬇️
bdist_mac_312_python	`73.33% <42.59%> (-3.18%)`	⬇️
bdist_mac_39_python	`73.33% <42.99%> (-3.23%)`	⬇️
bdist_win_310_python	`72.78% <42.59%> (-3.76%)`	⬇️
bdist_win_311_python	`65.50% <42.59%> (-11.11%)`	⬇️
bdist_win_312_python	`58.31% <42.59%> (-18.13%)`	⬇️
bdist_win_39_python	`71.46% <42.99%> (-5.12%)`	⬇️
sdist_linux_310_python	`69.52% <42.59%> (-6.84%)`	⬇️
sdist_linux_311_python	`69.52% <42.59%> (-6.84%)`	⬇️
sdist_linux_312_python	`69.52% <42.59%> (-6.75%)`	⬇️
sdist_linux_39_python	`69.51% <42.99%> (-6.75%)`	⬇️
sdist_mac_310_python	`73.75% <42.59%> (-2.74%)`	⬇️
sdist_mac_311_python	`73.20% <42.59%> (-3.22%)`	⬇️
sdist_mac_312_python	`16.21% <9.25%> (-60.11%)`	⬇️
sdist_mac_39_python	`69.45% <42.99%> (-6.95%)`	⬇️
sdist_win_310_python	`60.04% <42.59%> (-16.50%)`	⬇️
sdist_win_311_python	`?`
sdist_win_312_python	`?`
sdist_win_39_python	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

paulbkoch · 2024-10-21T21:55:35Z

Yes, @DerWeh, this approach looks great. Let me know when you feel it's ready to be merged.

DerWeh · 2025-01-04T17:49:57Z

@paulbkoch sorry for the long break.

I think most things work by now, but I would need some help with the test.

The test for merging the monotone_constraints is awfully slow. Do you have an idea for fast data to try it out? I didn't find any existing tests for monotone_constraints which I could borrow… In principle, I can also test the internal function _initialize_ebm instead, avoiding the slow fitting.
If I understand correctly, the test test_merge_exclude unravels another standing issue merge_ebms doesn't work if some EBM's miss some features. This is caused by the lines:
```
for model in models:
    if any(len(set(map(type, bin_levels))) != 1 for bin_levels in model.bins_):
        msg = "Inconsistent bin types within a model."
        raise Exception(msg)

    feature_bounds = getattr(model, "feature_bounds_", None)
    old_bounds.append(None if feature_bounds is None else feature_bounds.copy())
```
I encountered this error before, the problem is if some model.bins_ are missing, i.e., len(...) == 0. So far, I always synchronized model.bins_ by hand to avoid this error. But if we are at it, it's probably best to fix it now. However, I have no real clue what that code is supposed to do. Can you help me out?

Another point is that on failures, always bare Exceptions are raised, which is rather bad practice. I think we should change them either to ValueError or RuntimeError.

paulbkoch · 2025-01-04T22:32:54Z

Hi @DerWeh -- Thanks for these improvements!

I suspect for the monotone_constraints test, the issue isn't the time spent in the fit function. I've noticed that merge_ebms is extremely slow in general. This isn't related to your PR, but there's something in merge_ebms that takes an inordinate amount of time, and merge_ebms often takes longer than fitting for smaller EBMs. I haven't had time to diagnose that issue yet. If I'm wrong, and it's the fitting time, then it might be that the given monotone constraints somehow violate what the model wants to do. We have a good example in our docs of the shape functions for the synthetic dataset here:

https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

I might suggest using monotone increasing for feature#6 and feature#3. We don't have a good example of a decreasing function actually, but if you want a feature to test monotone decreasing maybe use feature#7 (the unused one).

The other issue you pointed to is a bug somewhere in my code that is allowing a single feature to get mixed categorical/continuous bin definitions. The bins_ attribute should contain information that allows discretization of continuous features and bin assignment for categoricals. For EBMs we need to be able to discretize features differently depending on whether the feature is being used as a main, a pair, or a higher order of interaction. For continuous features, you might see something like this for a single feature:

[[1.5, 3.5, 5.5, 7.5, 9.5], [4.5, 6.5]]

Which means that if the feature is being used as a main, it is discretized into the bins with ranges:
[-inf, 1.5)
[1.5, 3.5)
[3.5, 5.5)
[5.5, 7.5)
[7.5, 9.5)
[9.5, +inf]

and for pairs:
[-inf, 4.5)
[4.5, 6.5)
[6.5, +inf]

For categoricals we use a dictionary, so you might see:

[{"Canada": 1, "France": 2, "Germany": 3}]

The bug seems to be that we somehow get into a state where we have mixed categorical/continuous bins, so something like this for a single feature:

[[1.5, 3.5, 5.5, 7.5, 9.5], {"4": 1, "5": 2, "BAD": 3}]

This probably arises, I think, when two models are initially merged where one model contains a categorical and the other model contains a corresponding continuous feature. It's easy to see how this might occur given we automatically detect whether a feature is continuous or categorical by default. When this occurs, we are able to convert the categorical into a continuous for one of the models, which then allows the merge to proceed. I think the bug occurs during this conversion and we somehow get inconsistent categorical/continuous bin definitions. If that's true, then it's probably located in this function:

interpret/python/interpret-core/interpret/glassbox/_ebm/_utils.py

Lines 43 to 52 in 64158be

    
           def convert_categorical_to_continuous(categories): 
        
               # we do automagic detection of feature types by default, and sometimes a feature which 
        
               # was really continuous might have most of it's data as one or two values.  An example would 
        
               # be a feature that we have "0" and "1" in the training data, but "-0.1" and "3.1" are also 
        
               # possible.  If during prediction we see a "3.1" we can magically convert our categories 
        
               # into a continuous range with a cut point at 0.5.  Now "-0.1" goes into the [-inf, 0.5) bin 
        
               # and 3.1 goes into the [0.5, +inf] bin. 
        
               # 
        
               # We can't convert a continuous feature that has cuts back into categoricals 
        
               # since the categorical value could have been anything between the cuts that we know about.

If your test_merge_exclude test is hitting this exception, it is probably a limited corner case of that bigger issue in that the excluded feature is probably containing an empty list [] inside the bins_ attribute. Since the feature is never used, it's probably not getting a bin definition. Since the bin_levels list contains no objects, len(set(map(type, bin_levels))) would be zero instead of 1. This check should be changed to <= 1 instead of != 1 to handle the excluded feature case. If the other model doesn't exclude the feature, then we can use whatever categorical/continuous definition is used in the other model.

I agree the exception types could be improved. Probably best to do that in a new PR.

DerWeh · 2025-01-05T14:27:12Z

I suspect for the monotone_constraints test, the issue isn't the time spent in the fit function. ... If ... it's the fitting time, then it might be that the given monotone constraints somehow violate what the model wants to do. We have a good example in our docs of the shape functions for the synthetic dataset here:

https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

I might suggest using monotone increasing for feature#6 and feature#3. We don't have a good example of a decreasing function actually, but if you want a feature to test monotone decreasing maybe use feature#7 (the unused one).

Thanks for the info, in this case the bottleneck is fitting the model with constrains. Switching from classification to regression and focusing only on the features you mentioned, I can get a somewhat reasonable test times.

If your test_merge_exclude test is hitting this exception, it is probably a limited corner case of that bigger issue in that the excluded feature is probably containing an empty list [] inside the bins_ attribute. Since the feature is never used, it's probably not getting a bin definition. Since the bin_levels list contains no objects, len(set(map(type, bin_levels))) would be zero instead of 1. This check should be changed to <= 1 instead of != 1 to handle the excluded feature case. If the other model doesn't exclude the feature, then we can use whatever categorical/continuous definition is used in the other model.

This is exactly the case. When setting exclude, a feature might not be used. Sadly, it's not as trivial as allowing len(set(map(type, bin_levels))) <= 1.
The function _get_new_bins has to be adjusted to handle the case. At first this is rather easy. We can replace

bin_types = {type(model.bins_[feature_idx][0]) for model in models}

with

feature_bins = [model.bins_[feature_idx] for model in models if model.bins_[feature_idx]]
bin_types = {type(bin_[0]) for bin_ in feature_bins}

to ignore if a single model doesn't define the bins. If it is missing in all models, it becomes more intricate. I planned to catch that case using

if not feature_bins:
    new_feature_types.append("continuous")  # default type
    new_bins.append([])

but this leads to crashes in _harmonize_tensor, as we did not set the elements of old_mapping, old_bins, and old_bounds. Can you help me here? Haven't worked through _harmonize_tensor yet. The generality of EBMs tend to make things a little messy…

Common case is just a single element, thus that the overhead of NumPy is very large.

paulbkoch · 2025-01-05T18:41:47Z

Hi @DerWeh -- I think we might not need to call _harmonize_tensor if a feature is excluded in all models. What that function does is match the tensors between two features, so if you had two mains with the cuts:

model1: [2.5, 4.5, 6.5]
model2: [3.5, 5.5]

The new model would have cuts:
result: [2.5, 3.5, 4.5, 5.5, 6.5]

The _harmonize_tensors function is then used to update the term_score_ arrays to match the increases in the number of cuts. In the case where all of the models ignore a feature, there are no tensors that need to be updated, so I think we can just avoid calling _harmonize_tensor in that case. The resulting model shouldn't include this feature as a term, so the call to _harmonize_tensors is superfluous I think and will get ignored in the end.

@DerWeh, I'd be happy to jump in on your branch if this isn't enough explanation and you'd like some help getting this to the finish.

DerWeh · 2025-01-05T21:49:56Z

@paulbkoch if you feel like you can jump in easily, you are welcome to do so. I pushed all my changes. I have one test case which currently fails due to exclude. Rest seems in reasonable shape. I would go through the CI issues, once the test runs locally.

Your explanation is fine, but it feels like I need plenty more time to work through the intrinsics to properly fix it.

DerWeh and others added 7 commits October 19, 2024 20:25

TST: speed up merging test by less accurate fits

d594fad

Signed-off-by: DerWeh <[email protected]>

MAINT: fix linting issues in merge_ebms

a5474c1

Signed-off-by: DerWeh <[email protected]>

MAINT: use ternary expressions (ruff)

cb70530

Signed-off-by: DerWeh <[email protected]>

MAINT: apply ruff format

922edf9

Signed-off-by: DerWeh <[email protected]>

MAINT: split merge_ebms in more functions

90e20d0

Signed-off-by: DerWeh <[email protected]>

ENH: add initial version of merged parameters

996a9e3

Signed-off-by: DerWeh <[email protected]>

ENH: handle special arguments in EBM merge

d22f52b

Signed-off-by: DerWeh <[email protected]>

DerWeh marked this pull request as draft October 19, 2024 18:35

paulbkoch force-pushed the develop branch 6 times, most recently from f8501a4 to e0369a7 Compare December 10, 2024 06:25

paulbkoch force-pushed the develop branch 5 times, most recently from 50d4254 to 5e999b6 Compare December 26, 2024 01:22

Weh Andreas added 4 commits January 4, 2025 10:30

TST: enable validity check for merged EBMs

99e4620

MNT: simplify EBM type selection

d5a45e9

BUG: check for fitting before creating EBM

df978bc

BUG: correctly clean up exclude attribute

cda91a7

Weh Andreas added 3 commits January 5, 2025 15:06

TST: speed up test for monotonous feature merging

359b8f9

MNT: use variable for more readable if-case

4210e6a

BUG: fix type hint

f70536a

ENH: speed up _harmonize_tensor

ac2c6dd

Common case is just a single element, thus that the overhead of NumPy is very large.

Weh Andreas and others added 3 commits January 5, 2025 22:43

TST: ignore warning about NaN in data

f8c5292

MNT: use NumPy function for union

a28492c

Merge branch 'develop' into feature/merge-ebms

cbf4507

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: generate valid `EBMModel` when merging #578

BUG: generate valid `EBMModel` when merging #578

DerWeh commented Oct 19, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

paulbkoch commented Oct 21, 2024

DerWeh commented Jan 4, 2025

paulbkoch commented Jan 4, 2025

DerWeh commented Jan 5, 2025

paulbkoch commented Jan 5, 2025 •

edited

Loading

DerWeh commented Jan 5, 2025

BUG: generate valid EBMModel when merging #578

Are you sure you want to change the base?

BUG: generate valid EBMModel when merging #578

Conversation

DerWeh commented Oct 19, 2024 • edited Loading

codecov bot commented Oct 19, 2024 • edited Loading

Codecov Report

paulbkoch commented Oct 21, 2024

DerWeh commented Jan 4, 2025

paulbkoch commented Jan 4, 2025

DerWeh commented Jan 5, 2025

paulbkoch commented Jan 5, 2025 • edited Loading

DerWeh commented Jan 5, 2025

BUG: generate valid `EBMModel` when merging #578

BUG: generate valid `EBMModel` when merging #578

DerWeh commented Oct 19, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

paulbkoch commented Jan 5, 2025 •

edited

Loading