Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial attempt to implementing hard majority voting #126

Merged
merged 9 commits into from
Oct 6, 2022

Conversation

gardberg
Copy link
Contributor

This is a first start to implementing hard majority voting, as mention in #118.

For a brief description of soft and hard voting, see sklearn's definition.

voting{‘hard’, ‘soft’}, default=’hard’

If ‘hard’, uses predicted class labels for majority rule voting. Else if ‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities, which is recommended for an ensemble of well-calibrated classifiers.

For hard voting, the vector proba returned by the forward method of the VotingClassifier class now instead becomes a one-hot vector with a 1 on the class which had the majority of the base_estimators votes.

Worth considering is that torch.max is called in the evaluate method which essentially just gets the index of the 1 in proba from the forward method, which works, but maybe isn't a problem?

Any suggestions for improvements are very welcome!

@gardberg gardberg mentioned this pull request Sep 20, 2022
@xuyxu
Copy link
Member

xuyxu commented Sep 21, 2022

Thanks @LukasGardberg, dont worry about the CI problem. I will leave the detailed comment as soon as possible.

@@ -349,8 +357,8 @@ def __init__(self, input_dim, output_dim, depth=5, lamda=1e-3, cuda=False):

self._validate_parameters()

self.internal_node_num_ = 2 ** self.depth - 1
self.leaf_node_num_ = 2 ** self.depth
self.internal_node_num_ = 2**self.depth - 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several lines here do not meet the coding style requirements. Please consider to execute the command black --skip-string-normalization --config pyproject.toml torchensemble/ in the root directory of the project, which will automatically format the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, I ran the same black command and it automatically did the changes mentioned above. I reverted them manually now. Maybe it has something to do with my version of black.

def __init__(self, voting_strategy="soft", **kwargs):
super(VotingClassifier, self).__init__(**kwargs)

self._implemented_voting_strategies = ["soft", "hard"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute is redundant, maybe we could hard-code the supported strategies, for example, change the attribute in line 100 into a set ("soft", "hard"), and replace the latter {} in line 103 with a fixed string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if my latest changes is what you meant :)

torchensemble/voting.py Outdated Show resolved Hide resolved
Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core implementation on voting strategies looks good after a rough look, we need extra unit tests to ensure the correctrness. Have you ever written unit tests?

@gardberg
Copy link
Contributor Author

Yes, a bit, but never with pytest. I'm thinking I'll just imitate the tests that already exists. Is there anything specific I should test for?

Also, when running the tests locally I seem to be getting an error:

AttributeError: 'NeuralForestClassifier' object has no attribute 'voting_strategy'

I tried adding an init method to the NeuralForestClassifier class, but am a bit unsure how to deal with having two superclasses, and how to properly handle this problem. Do you have any advice for how to fix it :)? Sorry for needing some guidance haha

@xuyxu
Copy link
Member

xuyxu commented Sep 26, 2022

Thanks for your great work @LukasGardberg ! Kind of busy these days, and I will take a look in a few days.

Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to unit tests, provided that we move the implementation of hard voting to the utils/operator sub-package, it should be enough to only test this function in tests/test_operator.py. You could manually specify several tensors, and see if the result from hard voting conforms to the expected result.

@xuyxu
Copy link
Member

xuyxu commented Sep 27, 2022

Meanwhile, please click the Details button of the CI torchensemble-CI / build (ubuntu-latest, 3.7) (pull_request) and see pytest results, since the neural forest class inherits from voting, its __init__ method should be modified accordingly.

@xuyxu
Copy link
Member

xuyxu commented Sep 27, 2022

@all-contributors please add @LukasGardberg for code

@allcontributors
Copy link
Contributor

@xuyxu

I've put up a pull request to add @LukasGardberg! 🎉

@gardberg
Copy link
Contributor Author

gardberg commented Sep 28, 2022

Okay, now hard majority has been moved to ops. I also added a simple test, and added soft voting to NeuralForestClassifier and SnapshotEnsembleClassifier, but they have not been tested.

Are there any other models I should add hard voting for? I'm assuming it's only applicable to classifiers right?

Also, I'm unsure if I solved the init inheritance problem properly, but the tests are at least passing now @xuyxu :)

Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the comments, and I will add some final docs once they are resolved.

@@ -203,7 +203,7 @@ def predict(self, *x):
class BaseTreeEnsemble(BaseModule):
def __init__(
self,
n_estimators,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason on adding the default value for n_estimators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, no, I accidentally left it in there when trying some other stuff, hope its ok!

torchensemble/utils/operator.py Outdated Show resolved Hide resolved
Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@xuyxu xuyxu merged commit 061bef7 into TorchEnsemble-Community:master Oct 6, 2022
@xuyxu
Copy link
Member

xuyxu commented Oct 6, 2022

Merged, thanks @LukasGardberg

@gardberg
Copy link
Contributor Author

gardberg commented Oct 6, 2022

Thanks a lot @xuyxu for all the help! Exciting making my first contribution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants