PPOAgent Entropy Regularization, Clipping, GAE are working Incorrectly #681

kochlisGit · 2021-11-25T09:09:01Z

I have been trying to implement a PPO Agent that solves LunarLander-v2 as in the official example in the github repo:
https://github.com/tensorflow/agents/blob/master/tf_agents/agents/ppo/examples/v2/train_eval_clip_agent.py

In this example, a PPOClip agent is used. However, I would like to use both Clipping & KL-Penalty, so I used the PPOAgent class, that provides both options according to the documentation here:
https://www.tensorflow.org/agents/api_docs/python/tf_agents/agents/PPOAgent

As You may notice, all the parameters of the KL-Penalty have already been selected from the original paper. However, importance_ratio_clipping (Clipping) , entropy_regularization (Entropy Coeff) , use_gae (Generalized Advantage Estimation) are set to 0.

I tried leaving the rest of parameters as they are, but made the following changes:

importance_ratio_clipping=0.3
entropy_regularization=0.01
use_gae=True

While the original PPOAgent, without changing those parameters, works perfectly, when changing on of those parameters or all of them, the agent diverges quickly and always gets negative rewards, no matter how much I train them. At first, I ran those experiments many times to see If I can get a better solution, but the algorithm always did very poorly.

In order to test if this is a bug of tf-agents PPOAgent class, I decided to run the same algorithm with the same parameters using RLLib. Also, i changed the rest of the parameters, so that they match the default ones of tf-agents. Surprisingly, their implementation has no problem at converging at all, using Clipping, KL-penalty, Entropy Coeff & GAE at the same time! Here are the results:

https://github.com/kochlisGit/DRL-Frameworks/blob/main/rllib/ppo_average_return.png

The text was updated successfully, but these errors were encountered:

summer-yue · 2021-12-08T21:34:44Z

Thank you for raising this issue, and your detailed description of the problem!

Could you please try this with our new examples https://github.com/tensorflow/agents/blob/master/tf_agents/examples/ppo/schulman17/ppo_clip_train_eval.py

The new versions of examples are nightly tested and verified against reported numbers from the paper (thus more reliable).

Once you try the new examples, could you verify 1. whether you get expected learning with the schulman17 parameters with just clipping 2. when you add the KL terms, does it stop learning?

This way it will help us narrow down the where the issues are. My guess is that something with the KL related implementation might be scaled differently or something. That logic is less widely used than the clipping version. Thanks in advance!

summer-yue · 2021-12-08T21:35:18Z

In the mean time, our team will look into pointing users towards our new examples better, as opposed to the older version. Thanks again.

kochlisGit · 2021-12-10T09:23:45Z

I have have tested 3 times PPOClipAgent in LunarLander-v2 using the example
https://github.com/tensorflow/agents/blob/master/tf_agents/examples/ppo/schulman17/ppo_clip_train_eval.py

It is working fine. The agent learns & converges quickly. Then I added entropy regularization = 0.01, which didn't change much in the training process (GAE was True by default). . In order to use KL parameters, I had to switch from PPOClipAgent to PPOAgent and set importance_ratio_clipping=0.2. This didn't have the expected returns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPOAgent Entropy Regularization, Clipping, GAE are working Incorrectly #681

PPOAgent Entropy Regularization, Clipping, GAE are working Incorrectly #681

kochlisGit commented Nov 25, 2021 •

edited

Loading

summer-yue commented Dec 8, 2021

summer-yue commented Dec 8, 2021

kochlisGit commented Dec 10, 2021

PPOAgent Entropy Regularization, Clipping, GAE are working Incorrectly #681

PPOAgent Entropy Regularization, Clipping, GAE are working Incorrectly #681

Comments

kochlisGit commented Nov 25, 2021 • edited Loading

summer-yue commented Dec 8, 2021

summer-yue commented Dec 8, 2021

kochlisGit commented Dec 10, 2021

kochlisGit commented Nov 25, 2021 •

edited

Loading