[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

mnoukhov · 2023-11-10T23:51:04Z

A simpler and cleaner version of #472 focused on the main issues

examples/multi_adapter_ppo.py fails with multi-gpu

you can't call model.compute_reward_score if model is wrapped in DDP

fix by using unwrap_model

running crashes with RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one....

Caused because the reward score is being calculated with gradient
Add a torch.no_grad() and make sure that the reward adapter's score has requires_grad=False

in modeling_base.py, create the reward adapter before initializing the model and pass score into init

makes adding the reward adapter a class method that can be extended
makes the reward adapter name and score module more explicit, setting them in the init
makes use of PeftModel.load_adapter which includes a lot of useful logic

more flexible, clearer args

unwrap model since it is DDP downside, with reward adapter it seems we need to use find_unused_parameters=True

HuggingFaceDocBuilderDev · 2023-11-14T16:58:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

younesbelkada

Great clean up! Thanks a ton for working on this @mnoukhov !
I left only one question, what do you think?

younesbelkada · 2023-11-14T18:27:01Z

trl/models/modeling_base.py

@@ -68,7 +66,7 @@ class PreTrainedModelWrapper(nn.Module):
            The list of arguments that are supported by the wrapper class.
    """
    transformers_parent_class = None
-    supported_args = None
+    supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")


Do you know if these are used? It seems we always overwrite them: https://github.com/huggingface/trl/blob/main/trl/models/modeling_value_head.py#L90

I made this change so it would be a bit more obvious the args being passed in. I'm happy to change it back if you'd like

I actually don't think we use supported_args anywhere. I assumed it would be a future feature for some sort of argparsing. To be more in line with transformers we would probably want a config class instead of supported args anyways so it isn't a big deal either way.

OK I see, thanks!
Yes it is not a big deal, technically I have set it to None so that the PreTrainedModelWrapper can't be used as it is (I should have set that class as an abstract class). Yes it would be great if you can change this back then we can merge I think

younesbelkada · 2023-11-14T18:27:46Z

trl/models/modeling_base.py

@@ -68,7 +66,7 @@ class PreTrainedModelWrapper(nn.Module):
            The list of arguments that are supported by the wrapper class.
    """
    transformers_parent_class = None
-    supported_args = None
+    supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")


Suggested change

supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")

supported_args = None

younesbelkada

Thanks again!

…e#982) * reward adapter loaded as part of init more flexible, clearer args * fixed script for multi gpu unwrap model since it is DDP downside, with reward adapter it seems we need to use find_unused_parameters=True * remove gradient from reward score calculation * change supported_args back to None

mnoukhov added 3 commits November 10, 2023 22:07

reward adapter loaded as part of init

c67993d

more flexible, clearer args

fixed script for multi gpu

1ed6f82

unwrap model since it is DDP downside, with reward adapter it seems we need to use find_unused_parameters=True

remove gradient from reward score calculation

209f2da

younesbelkada reviewed Nov 14, 2023

View reviewed changes

change supported_args back to None

3371d52

younesbelkada approved these changes Nov 21, 2023

View reviewed changes

younesbelkada merged commit b307faf into huggingface:main Nov 21, 2023
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

mnoukhov commented Nov 10, 2023

HuggingFaceDocBuilderDev commented Nov 14, 2023

younesbelkada left a comment

younesbelkada Nov 14, 2023

mnoukhov Nov 15, 2023

younesbelkada Nov 16, 2023

mnoukhov Nov 20, 2023

younesbelkada Nov 14, 2023

younesbelkada left a comment

	supported_args = ("score_module", "supports_rm_adapter", "rm_adapter_name")
	supported_args = None

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

[Multi-Adapter PPO] Fix and Refactor reward model adapter #982

Conversation

mnoukhov commented Nov 10, 2023

HuggingFaceDocBuilderDev commented Nov 14, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Nov 14, 2023

Choose a reason for hiding this comment

mnoukhov Nov 15, 2023

Choose a reason for hiding this comment

younesbelkada Nov 16, 2023

Choose a reason for hiding this comment

mnoukhov Nov 20, 2023

Choose a reason for hiding this comment

younesbelkada Nov 14, 2023

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment