Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

Merged
merged 4 commits into from
Nov 21, 2023

Conversation

mnoukhov
Copy link
Contributor

A simpler and cleaner version of #472 focused on the main issues

examples/multi_adapter_ppo.py fails with multi-gpu

  1. you can't call model.compute_reward_score if model is wrapped in DDP
  • fix by using unwrap_model
  1. running crashes with RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one....
  • Caused because the reward score is being calculated with gradient
  • Add a torch.no_grad() and make sure that the reward adapter's score has requires_grad=False

in modeling_base.py, create the reward adapter before initializing the model and pass score into init

  • makes adding the reward adapter a class method that can be extended
  • makes the reward adapter name and score module more explicit, setting them in the init
  • makes use of PeftModel.load_adapter which includes a lot of useful logic

more flexible, clearer args
unwrap model since it is DDP
downside, with reward adapter it seems we need to use
find_unused_parameters=True
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great clean up! Thanks a ton for working on this @mnoukhov !
I left only one question, what do you think?

@@ -68,7 +66,7 @@ class PreTrainedModelWrapper(nn.Module):
The list of arguments that are supported by the wrapper class.
"""
transformers_parent_class = None
supported_args = None
supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if these are used? It seems we always overwrite them: https://github.com/huggingface/trl/blob/main/trl/models/modeling_value_head.py#L90

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change so it would be a bit more obvious the args being passed in. I'm happy to change it back if you'd like

I actually don't think we use supported_args anywhere. I assumed it would be a future feature for some sort of argparsing. To be more in line with transformers we would probably want a config class instead of supported args anyways so it isn't a big deal either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see, thanks!
Yes it is not a big deal, technically I have set it to None so that the PreTrainedModelWrapper can't be used as it is (I should have set that class as an abstract class). Yes it would be great if you can change this back then we can merge I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -68,7 +66,7 @@ class PreTrainedModelWrapper(nn.Module):
The list of arguments that are supported by the wrapper class.
"""
transformers_parent_class = None
supported_args = None
supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")
supported_args = None

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again!

@younesbelkada younesbelkada merged commit b307faf into huggingface:main Nov 21, 2023
9 checks passed
lapp0 pushed a commit to lapp0/trl that referenced this pull request May 10, 2024
…e#982)

* reward adapter loaded as part of init

more flexible, clearer args

* fixed script for multi gpu

unwrap model since it is DDP
downside, with reward adapter it seems we need to use
find_unused_parameters=True

* remove gradient from reward score calculation

* change supported_args back to None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants