Skip to content

Latest commit

 

History

History
9 lines (9 loc) · 930 Bytes

README.md

File metadata and controls

9 lines (9 loc) · 930 Bytes

Reinforcement-Learning

Training Reinforcement Learning agents in some classic gym environments using PPO algorithm.
The environments are:

  • Cartpole
  • Mountain Car
  • Montezuma Revenge (A sparse reward environment)
    • The two csv files show cumulative rewards obtained by the agent during testing throughout 1000 episodes.
    • Even though during training, the agent with 0.01 entropy coefficient manages to explore the state space and accumulate rewards faster than the agent with 0 entropy coefficient, the test results show that the agent with 0 entropy coefficient does a way better job and manages to accumulate rewards.
    • The other agent was not able to learn an efficient policy and was maybe disturbed by the incite for more exploration. However this is not a general conclusion as the entropy coefficient parameter needs to be more thouroughly finetuned in order to conclude about its effect.