-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSO (Statistical Rejection Sampling Improves Preference Optimization) #902
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Hello @kashif, thanks for looking at this. It looks like the RSO authors experimented both with sigmoid (used in dpo) and hinge (used in SliC) loss function. Can you send the link where you posted the questions for the seq2seq models please? I do not find it |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Stay calm stale-bot :D |
Looks cool! Are there any recent updates? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Implementation of Statistical Rejection Sampling Improves Preference Optimization
Respond to #816.
generate
andscore
function totrainer.utils
as those two functions will also be useful for raft and rest implementations ([WIP] Reward ranked finetuning (RAFT) and Reinforced Self-Training (ReST) #704). It would also be possible to add those functions to autils.py
script inside the examples folder. What do you think?conduct_rejection_sampling
function.dpo.py
. I don't think adding another similar script is useful. What's your opinion?cc @kashif @younesbelkada @lvwerra @philschmid