mountain car pomcp oscillating solution #435
Answered
by
zsunberg
gdaddi
asked this question in
Problem Implementation
-
Beta Was this translation helpful? Give feedback.
Answered by
zsunberg
Nov 1, 2022
Replies: 1 comment 6 replies
-
Hi @gdaddi, The problem is likely that POMCP is just not able to solve this problem very well without a decent rollout policy. Since the rewards for the problem are very sparse, the tree search likely never finds any rewards in the tree if the rollout policy is random. You could try to use a rollout policy that always adds energy so that the rollouts will reach the goal. Let me know if this explanation makes sense. |
Beta Was this translation helpful? Give feedback.
6 replies
Answer selected by
gdaddi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @gdaddi,
The problem is likely that POMCP is just not able to solve this problem very well without a decent rollout policy.
Since the rewards for the problem are very sparse, the tree search likely never finds any rewards in the tree if the rollout policy is random. You could try to use a rollout policy that always adds energy so that the rollouts will reach the goal.
Let me know if this explanation makes sense.