-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Running examples/scripts/dpo.py crashes #914
Comments
addresses huggingface#914
This seems to be caused by saving the trainer state during saving a checkpoint. We are logging a I submitted a PR to remove the |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
This should be fixed in #919. |
I think this problem has resurfaced at some point. I'm running TRL indirectly through Axolotl and I'm seeing this line triggering the ProgressCallback. Then pretty naturally when that callback does The key parts of my stacktrace are:
Important versions of libraries are:
The only solution i've found is to drop wandb logging or deleting the log line in the DPO Trainer. I feel like the right fix is that when the DPO trainer calls |
Error
Running
examples/scripts/dpo.py
with no changes to the args crashes with the following error messageInstalled Environment
I installed the dependencies as provided in
requirements.txt
specifically, I use
and a source installation of this commit 7de7db6
The text was updated successfully, but these errors were encountered: