Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracking issue] Wrong loss scaling when accumulating gradient #2617

Open
13 of 18 tasks
qgallouedec opened this issue Jan 23, 2025 · 0 comments
Open
13 of 18 tasks

[Tracking issue] Wrong loss scaling when accumulating gradient #2617

qgallouedec opened this issue Jan 23, 2025 · 0 comments
Labels
🐛 bug Something isn't working 🏋 DPO Related to DPO 🏋 DPPO Related to DDPO 🏋 GKD Related to GKD 🏋 GRPO Related to GRPO 🏋 Iterative SFT Related to Iterative SFT 🏋 KTO Related to KTO 🏋 Online DPO Related to Online DPO 🏋 ORPO Related to ORPO 🏋 PPO Related to PPO 🏋 PRM Related to PRM 🏋 Reward Related to Reward modelling 🏋 RLOO Related to RLOO 🏋 SFT Related to SFT 🏋 XPO Related to XPO

Comments

@qgallouedec
Copy link
Member

qgallouedec commented Jan 23, 2025

Caused by huggingface/transformers#35651 that adds a new condition for scaling the loss.
Spotted huggingface/transformers#35856.

For each one, check if there is the same issue, if so fix it.

@github-actions github-actions bot added 🏋 DPO Related to DPO 🏋 DPPO Related to DDPO 🏋 GKD Related to GKD 🏋 GRPO Related to GRPO 🏋 Iterative SFT Related to Iterative SFT 🏋 KTO Related to KTO 🏋 Online DPO Related to Online DPO 🏋 ORPO Related to ORPO 🏋 PPO Related to PPO 🏋 PRM Related to PRM 🏋 Reward Related to Reward modelling 🏋 RLOO Related to RLOO 🏋 SFT Related to SFT 🏋 XPO Related to XPO labels Jan 23, 2025
@qgallouedec qgallouedec added the 🐛 bug Something isn't working label Jan 23, 2025
@qgallouedec qgallouedec pinned this issue Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working 🏋 DPO Related to DPO 🏋 DPPO Related to DDPO 🏋 GKD Related to GKD 🏋 GRPO Related to GRPO 🏋 Iterative SFT Related to Iterative SFT 🏋 KTO Related to KTO 🏋 Online DPO Related to Online DPO 🏋 ORPO Related to ORPO 🏋 PPO Related to PPO 🏋 PRM Related to PRM 🏋 Reward Related to Reward modelling 🏋 RLOO Related to RLOO 🏋 SFT Related to SFT 🏋 XPO Related to XPO
Projects
None yet
Development

No branches or pull requests

1 participant