-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Why resample SDE noise matrices in PPO optimzation? #1929
Comments
Hello, |
After removing the reset_noise call, I tested it on the half-cheetah env and it worked without any errors. PR: #1933 |
looking at the std value, it doesn't seem you were using gSDE, but I could try with |
The currently published version does give me a std of a little less than 1.0 consitently with sde enabled
|
❓ Question
I'm currently implementing a personal RL library and use SB3 as inspiration. I have recently implemented SDE and I'm confused about line 214/215 in your implementation of PPO. Here, the SDE noise matrices are resampled before each PPO update but as far as I have seen, this doesn't do anything since the noise matrices are not used during the PPO updates, only during exploration. Let me elaborate:
In ppo train, we reset the noise matrices and then call evaluate_actions afterwards:
stable-baselines3/stable_baselines3/ppo/ppo.py
Lines 214 to 217 in 6c00565
reset_noise, which just calls sample_weights, does not cause any state changes except for the exploration matrices
stable-baselines3/stable_baselines3/common/distributions.py
Lines 499 to 512 in 6c00565
The exploration matrices are only used in get_noise which is in turn only used in sample
stable-baselines3/stable_baselines3/common/distributions.py
Lines 593 to 603 in 6c00565
stable-baselines3/stable_baselines3/common/distributions.py
Lines 580 to 585 in 6c00565
However, evaluate_actions does not call sample()
stable-baselines3/stable_baselines3/common/policies.py
Lines 719 to 741 in 6c00565
So in conclusion, the reset_noise call in the ppo train function is useless isn't it?
Checklist
The text was updated successfully, but these errors were encountered: