[Tracking issue] Wrong loss scaling when accumulating gradient #2617
Labels
🐛 bug
Something isn't working
🏋 DPO
Related to DPO
🏋 DPPO
Related to DDPO
🏋 GKD
Related to GKD
🏋 GRPO
Related to GRPO
🏋 Iterative SFT
Related to Iterative SFT
🏋 KTO
Related to KTO
🏋 Online DPO
Related to Online DPO
🏋 ORPO
Related to ORPO
🏋 PPO
Related to PPO
🏋 PRM
Related to PRM
🏋 Reward
Related to Reward modelling
🏋 RLOO
Related to RLOO
🏋 SFT
Related to SFT
🏋 XPO
Related to XPO
Caused by huggingface/transformers#35651 that adds a new condition for scaling the loss.
Spotted huggingface/transformers#35856.
For each one, check if there is the same issue, if so fix it.
The text was updated successfully, but these errors were encountered: