PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

viethoangtranduong · 2023-12-15T01:28:36Z

I'm using the code from this GitHub repository to train my RLHF using PPOTrainer. My custom reward model ranges from 0 (bad) to 1 (good).

After 150 steps, I've observed some issues: objective_KL increases to 10, ppo/loss_value decreases significantly, ppo/loss/total varies around 0 and can be negative, and the mean_reward per epoch (ideally increases) gradually drops, even going negative against a 0.5 baseline.

I think there might be a problem with the PPOTrainer's loss function, though I'm not fully familiar with the code base. Any advice/pointers would be greatly appreciated, thank you!

(Note: The KL value in the graph is scaled down by a factor of 10 for better visualization.)

Here are my questions:

Why do ppo/loss/total and ppo/loss/value become negative? Could it be due to a high KL value? Shouldn't the loss function limit the KL value from getting too large?
At which point in the current epoch should I have stopped?
Also, I notice here that ppo/loss/total = ppo/loss/value + ppo/loss/policy, but I'm not sure I'm seeing that in the stats_dict. Also, ppo/mean_non_score_reward (KL penalty) values also did not make much sense to me given the high KL values.
I'm attaching stats_loss.csv for further analysis of the graph and loss. The machine_ranks column indicates the rank of each machine. I use four GPUs, running the same rl_training.py script, but with my custom reward model.

File: https://drive.google.com/file/d/1To8POOxcxmHcw298S7M24_ql8f6lyOGV/view?usp=sharing

stats_dict is the output of log_stats, with numpy converted to list for serialization purposes.

Thank you! Looking forward to any suggestions!

The text was updated successfully, but these errors were encountered:

younesbelkada · 2023-12-20T19:39:26Z

cc @lvwerra @vwxyzjn 🙏

lvwerra · 2023-12-21T15:56:12Z

Hi @viethoangtranduong, in general things look ok (besides the reward not going up), KL can be at ~10 and the loss goes down or hovers around a low value. So I would do the following:

start with the example script at examples/scripts/ppo.py with the default parameters
then step by step customize to your needs: add your model, your reward model etc.
check at which step things go wrong

github-actions · 2024-01-15T15:05:05Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

viethoangtranduong changed the title ~~PPOTrainer: PPO loss coming down, but objective/kl increases a lot, and mean_rewards decreases~~ PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases Dec 15, 2023

younesbelkada added the 🏋 PPO Related to PPO label Dec 20, 2023

github-actions bot closed this as completed Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

viethoangtranduong commented Dec 15, 2023

younesbelkada commented Dec 20, 2023

lvwerra commented Dec 21, 2023

github-actions bot commented Jan 15, 2024

PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

Comments

viethoangtranduong commented Dec 15, 2023

younesbelkada commented Dec 20, 2023

lvwerra commented Dec 21, 2023

github-actions bot commented Jan 15, 2024