Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: generate valid EBMModel when merging #578

Draft
wants to merge 18 commits into
base: develop
Choose a base branch
from

Conversation

DerWeh
Copy link
Contributor

@DerWeh DerWeh commented Oct 19, 2024

This PR is meant to fix #576. Upon merging, we average numerical parameters.
I would add a warning to the documentation, that we do not guarantee that EBMs produced by merge_ebms can be fitted. You often allow for so many different types of arguments, that hardly can catch every edge case meaningfully.

@paulbkoch please take a look at _initialize_ebm. Is this how you imagine handling the parameters? I would appreciate some early feedback. If this is in general what you have in mind, I'll try cleaning up the code some more. But it's quite messy, as your API is so flexible.

TODO

  • Cover special cases monotonize and exclude
  • Document limitations of parameter inference

@DerWeh DerWeh marked this pull request as draft October 19, 2024 18:35
Copy link

codecov bot commented Oct 19, 2024

Codecov Report

Attention: Patch coverage is 42.59259% with 62 lines in your changes missing coverage. Please review.

Project coverage is 73.99%. Comparing base (ef742fd) to head (cbf4507).

Files with missing lines Patch % Lines
...erpret-core/interpret/glassbox/_ebm/_merge_ebms.py 42.59% 62 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #578      +/-   ##
===========================================
- Coverage    76.73%   73.99%   -2.74%     
===========================================
  Files           72       72              
  Lines         9622     9634      +12     
===========================================
- Hits          7383     7129     -254     
- Misses        2239     2505     +266     
Flag Coverage Δ
bdist_linux_310_python 69.58% <42.59%> (-6.76%) ⬇️
bdist_linux_311_python 69.58% <42.59%> (-6.84%) ⬇️
bdist_linux_312_python 69.48% <42.59%> (-6.87%) ⬇️
bdist_linux_39_python 69.57% <42.99%> (-6.77%) ⬇️
bdist_mac_310_python 73.33% <42.59%> (-3.18%) ⬇️
bdist_mac_311_python 73.78% <42.59%> (-2.74%) ⬇️
bdist_mac_312_python 73.33% <42.59%> (-3.18%) ⬇️
bdist_mac_39_python 73.33% <42.99%> (-3.23%) ⬇️
bdist_win_310_python 72.78% <42.59%> (-3.76%) ⬇️
bdist_win_311_python 65.50% <42.59%> (-11.11%) ⬇️
bdist_win_312_python 58.31% <42.59%> (-18.13%) ⬇️
bdist_win_39_python 71.46% <42.99%> (-5.12%) ⬇️
sdist_linux_310_python 69.52% <42.59%> (-6.84%) ⬇️
sdist_linux_311_python 69.52% <42.59%> (-6.84%) ⬇️
sdist_linux_312_python 69.52% <42.59%> (-6.75%) ⬇️
sdist_linux_39_python 69.51% <42.99%> (-6.75%) ⬇️
sdist_mac_310_python 73.75% <42.59%> (-2.74%) ⬇️
sdist_mac_311_python 73.20% <42.59%> (-3.22%) ⬇️
sdist_mac_312_python 16.21% <9.25%> (-60.11%) ⬇️
sdist_mac_39_python 69.45% <42.99%> (-6.95%) ⬇️
sdist_win_310_python 60.04% <42.59%> (-16.50%) ⬇️
sdist_win_311_python ?
sdist_win_312_python ?
sdist_win_39_python ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@paulbkoch
Copy link
Collaborator

Yes, @DerWeh, this approach looks great. Let me know when you feel it's ready to be merged.

@paulbkoch paulbkoch force-pushed the develop branch 6 times, most recently from f8501a4 to e0369a7 Compare December 10, 2024 06:25
@paulbkoch paulbkoch force-pushed the develop branch 5 times, most recently from 50d4254 to 5e999b6 Compare December 26, 2024 01:22
@DerWeh
Copy link
Contributor Author

DerWeh commented Jan 4, 2025

@paulbkoch sorry for the long break.

I think most things work by now, but I would need some help with the test.

  1. The test for merging the monotone_constraints is awfully slow. Do you have an idea for fast data to try it out? I didn't find any existing tests for monotone_constraints which I could borrow… In principle, I can also test the internal function _initialize_ebm instead, avoiding the slow fitting.

  2. If I understand correctly, the test test_merge_exclude unravels another standing issue merge_ebms doesn't work if some EBM's miss some features. This is caused by the lines:

    for model in models:
        if any(len(set(map(type, bin_levels))) != 1 for bin_levels in model.bins_):
            msg = "Inconsistent bin types within a model."
            raise Exception(msg)
    
        feature_bounds = getattr(model, "feature_bounds_", None)
        old_bounds.append(None if feature_bounds is None else feature_bounds.copy())

    I encountered this error before, the problem is if some model.bins_ are missing, i.e., len(...) == 0. So far, I always synchronized model.bins_ by hand to avoid this error. But if we are at it, it's probably best to fix it now. However, I have no real clue what that code is supposed to do. Can you help me out?


Another point is that on failures, always bare Exceptions are raised, which is rather bad practice. I think we should change them either to ValueError or RuntimeError.

@paulbkoch
Copy link
Collaborator

Hi @DerWeh -- Thanks for these improvements!

I suspect for the monotone_constraints test, the issue isn't the time spent in the fit function. I've noticed that merge_ebms is extremely slow in general. This isn't related to your PR, but there's something in merge_ebms that takes an inordinate amount of time, and merge_ebms often takes longer than fitting for smaller EBMs. I haven't had time to diagnose that issue yet. If I'm wrong, and it's the fitting time, then it might be that the given monotone constraints somehow violate what the model wants to do. We have a good example in our docs of the shape functions for the synthetic dataset here:

https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

I might suggest using monotone increasing for feature#6 and feature#3. We don't have a good example of a decreasing function actually, but if you want a feature to test monotone decreasing maybe use feature#7 (the unused one).

The other issue you pointed to is a bug somewhere in my code that is allowing a single feature to get mixed categorical/continuous bin definitions. The bins_ attribute should contain information that allows discretization of continuous features and bin assignment for categoricals. For EBMs we need to be able to discretize features differently depending on whether the feature is being used as a main, a pair, or a higher order of interaction. For continuous features, you might see something like this for a single feature:

[[1.5, 3.5, 5.5, 7.5, 9.5], [4.5, 6.5]]

Which means that if the feature is being used as a main, it is discretized into the bins with ranges:
[-inf, 1.5)
[1.5, 3.5)
[3.5, 5.5)
[5.5, 7.5)
[7.5, 9.5)
[9.5, +inf]

and for pairs:
[-inf, 4.5)
[4.5, 6.5)
[6.5, +inf]

For categoricals we use a dictionary, so you might see:

[{"Canada": 1, "France": 2, "Germany": 3}]

The bug seems to be that we somehow get into a state where we have mixed categorical/continuous bins, so something like this for a single feature:

[[1.5, 3.5, 5.5, 7.5, 9.5], {"4": 1, "5": 2, "BAD": 3}]

This probably arises, I think, when two models are initially merged where one model contains a categorical and the other model contains a corresponding continuous feature. It's easy to see how this might occur given we automatically detect whether a feature is continuous or categorical by default. When this occurs, we are able to convert the categorical into a continuous for one of the models, which then allows the merge to proceed. I think the bug occurs during this conversion and we somehow get inconsistent categorical/continuous bin definitions. If that's true, then it's probably located in this function:

def convert_categorical_to_continuous(categories):
# we do automagic detection of feature types by default, and sometimes a feature which
# was really continuous might have most of it's data as one or two values. An example would
# be a feature that we have "0" and "1" in the training data, but "-0.1" and "3.1" are also
# possible. If during prediction we see a "3.1" we can magically convert our categories
# into a continuous range with a cut point at 0.5. Now "-0.1" goes into the [-inf, 0.5) bin
# and 3.1 goes into the [0.5, +inf] bin.
#
# We can't convert a continuous feature that has cuts back into categoricals
# since the categorical value could have been anything between the cuts that we know about.

If your test_merge_exclude test is hitting this exception, it is probably a limited corner case of that bigger issue in that the excluded feature is probably containing an empty list [] inside the bins_ attribute. Since the feature is never used, it's probably not getting a bin definition. Since the bin_levels list contains no objects, len(set(map(type, bin_levels))) would be zero instead of 1. This check should be changed to <= 1 instead of != 1 to handle the excluded feature case. If the other model doesn't exclude the feature, then we can use whatever categorical/continuous definition is used in the other model.

I agree the exception types could be improved. Probably best to do that in a new PR.

@DerWeh
Copy link
Contributor Author

DerWeh commented Jan 5, 2025

I suspect for the monotone_constraints test, the issue isn't the time spent in the fit function. ... If ... it's the fitting time, then it might be that the given monotone constraints somehow violate what the model wants to do. We have a good example in our docs of the shape functions for the synthetic dataset here:

https://interpret.ml/docs/python/examples/interpretable-regression-synthetic.html

I might suggest using monotone increasing for feature#6 and feature#3. We don't have a good example of a decreasing function actually, but if you want a feature to test monotone decreasing maybe use feature#7 (the unused one).

Thanks for the info, in this case the bottleneck is fitting the model with constrains. Switching from classification to regression and focusing only on the features you mentioned, I can get a somewhat reasonable test times.

If your test_merge_exclude test is hitting this exception, it is probably a limited corner case of that bigger issue in that the excluded feature is probably containing an empty list [] inside the bins_ attribute. Since the feature is never used, it's probably not getting a bin definition. Since the bin_levels list contains no objects, len(set(map(type, bin_levels))) would be zero instead of 1. This check should be changed to <= 1 instead of != 1 to handle the excluded feature case. If the other model doesn't exclude the feature, then we can use whatever categorical/continuous definition is used in the other model.

This is exactly the case. When setting exclude, a feature might not be used. Sadly, it's not as trivial as allowing len(set(map(type, bin_levels))) <= 1.
The function _get_new_bins has to be adjusted to handle the case. At first this is rather easy. We can replace

bin_types = {type(model.bins_[feature_idx][0]) for model in models}

with

feature_bins = [model.bins_[feature_idx] for model in models if model.bins_[feature_idx]]
bin_types = {type(bin_[0]) for bin_ in feature_bins} 

to ignore if a single model doesn't define the bins. If it is missing in all models, it becomes more intricate. I planned to catch that case using

if not feature_bins:
    new_feature_types.append("continuous")  # default type
    new_bins.append([]) 

but this leads to crashes in _harmonize_tensor, as we did not set the elements of old_mapping, old_bins, and old_bounds. Can you help me here? Haven't worked through _harmonize_tensor yet. The generality of EBMs tend to make things a little messy…

Common case is just a single element, thus that the overhead of NumPy is
very large.
@paulbkoch
Copy link
Collaborator

paulbkoch commented Jan 5, 2025

Hi @DerWeh -- I think we might not need to call _harmonize_tensor if a feature is excluded in all models. What that function does is match the tensors between two features, so if you had two mains with the cuts:

model1: [2.5, 4.5, 6.5]
model2: [3.5, 5.5]

The new model would have cuts:
result: [2.5, 3.5, 4.5, 5.5, 6.5]

The _harmonize_tensors function is then used to update the term_score_ arrays to match the increases in the number of cuts. In the case where all of the models ignore a feature, there are no tensors that need to be updated, so I think we can just avoid calling _harmonize_tensor in that case. The resulting model shouldn't include this feature as a term, so the call to _harmonize_tensors is superfluous I think and will get ignored in the end.

@DerWeh, I'd be happy to jump in on your branch if this isn't enough explanation and you'd like some help getting this to the finish.

@DerWeh
Copy link
Contributor Author

DerWeh commented Jan 5, 2025

@paulbkoch if you feel like you can jump in easily, you are welcome to do so. I pushed all my changes. I have one test case which currently fails due to exclude. Rest seems in reasonable shape. I would go through the CI issues, once the test runs locally.

Your explanation is fine, but it feels like I need plenty more time to work through the intrinsics to properly fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

merge_ebm produces broken classifiers
2 participants