Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases #1098

Closed
viethoangtranduong opened this issue Dec 15, 2023 · 3 comments
Labels
🏋 PPO Related to PPO

Comments

@viethoangtranduong
Copy link
Contributor

I'm using the code from this GitHub repository to train my RLHF using PPOTrainer. My custom reward model ranges from 0 (bad) to 1 (good).

After 150 steps, I've observed some issues: objective_KL increases to 10, ppo/loss_value decreases significantly, ppo/loss/total varies around 0 and can be negative, and the mean_reward per epoch (ideally increases) gradually drops, even going negative against a 0.5 baseline.

I think there might be a problem with the PPOTrainer's loss function, though I'm not fully familiar with the code base. Any advice/pointers would be greatly appreciated, thank you!

image
(Note: The KL value in the graph is scaled down by a factor of 10 for better visualization.)

Here are my questions:

  • Why do ppo/loss/total and ppo/loss/value become negative? Could it be due to a high KL value? Shouldn't the loss function limit the KL value from getting too large?
  • At which point in the current epoch should I have stopped?
  • Also, I notice here that ppo/loss/total = ppo/loss/value + ppo/loss/policy, but I'm not sure I'm seeing that in the stats_dict. Also, ppo/mean_non_score_reward (KL penalty) values also did not make much sense to me given the high KL values.
  • I'm attaching stats_loss.csv for further analysis of the graph and loss. The machine_ranks column indicates the rank of each machine. I use four GPUs, running the same rl_training.py script, but with my custom reward model.

File: https://drive.google.com/file/d/1To8POOxcxmHcw298S7M24_ql8f6lyOGV/view?usp=sharing

stats_dict is the output of log_stats, with numpy converted to list for serialization purposes.

Thank you! Looking forward to any suggestions!

@viethoangtranduong viethoangtranduong changed the title PPOTrainer: PPO loss coming down, but objective/kl increases a lot, and mean_rewards decreases PPOTrainer issue: PPO loss seems coming down, but objective/kl increases a lot, and mean_rewards decreases Dec 15, 2023
@younesbelkada
Copy link
Contributor

cc @lvwerra @vwxyzjn 🙏

@younesbelkada younesbelkada added the 🏋 PPO Related to PPO label Dec 20, 2023
@lvwerra
Copy link
Member

lvwerra commented Dec 21, 2023

Hi @viethoangtranduong, in general things look ok (besides the reward not going up), KL can be at ~10 and the loss goes down or hovers around a low value. So I would do the following:

  • start with the example script at examples/scripts/ppo.py with the default parameters
  • then step by step customize to your needs: add your model, your reward model etc.
  • check at which step things go wrong

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 PPO Related to PPO
Projects
None yet
Development

No branches or pull requests

3 participants