-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update utils.py #1012
Update utils.py #1012
Conversation
update compute_accuracy to deal with the cases where str_chosen and str_rej got the same scores, which is probably what the developers don't want
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Thanks for the PR! Can you elaborate a bit more why this is bad and what bad side effects this has? |
When a model gives exactly same scores for the sentence to accept and the sentence to reject, the current evaluation metric will set this pair to be "correct". However, the above scenario is usually caused by the model having some problems in encoding sentences (e.g. over-quantized), or the two sentences exactly being the same, which we both don't expect. I think putting a warning here and defining this pair as "wrong" or "random" would be helpful for the developers to observe these problems. |
I see, thanks! In that case I'd prefer to just add a warning but leave the computation logic the same. Does that make sense? |
sure it makes sense. Thanks for your suggestions! |
updated so only warning is reserved
Co-authored-by: Leandro von Werra <[email protected]>
* Update utils.py update compute_accuracy to deal with the cases where str_chosen and str_rej got the same scores, which is probably what the developers don't want * Update utils.py updated so only warning is reserved * Update trl/trainer/utils.py Co-authored-by: Leandro von Werra <[email protected]> --------- Co-authored-by: Leandro von Werra <[email protected]>
update compute_accuracy to deal with the cases where str_chosen and str_rej got the same scores, which is probably what the developers don't want