Reinforcement-Learning

Training Reinforcement Learning agents in some classic gym environments using PPO algorithm.
The environments are:

Cartpole
Mountain Car
Montezuma Revenge (A sparse reward environment)
- The two csv files show cumulative rewards obtained by the agent during testing throughout 1000 episodes.
- Even though during training, the agent with 0.01 entropy coefficient manages to explore the state space and accumulate rewards faster than the agent with 0 entropy coefficient, the test results show that the agent with 0 entropy coefficient does a way better job and manages to accumulate rewards.
- The other agent was not able to learn an efficient policy and was maybe disturbed by the incite for more exploration. However this is not a general conclusion as the entropy coefficient parameter needs to be more thouroughly finetuned in order to conclude about its effect.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
CartPole.ipynb		CartPole.ipynb
GoExploreMontezumaAnalysis.ipynb		GoExploreMontezumaAnalysis.ipynb
Montezuma Test.ipynb		Montezuma Test.ipynb
MoutainCar.ipynb		MoutainCar.ipynb
README.md		README.md
montezuma_ppo_no_expl.py		montezuma_ppo_no_expl.py
ppo_0.01_test_results.csv		ppo_0.01_test_results.csv
ppo_0_test_results.csv		ppo_0_test_results.csv

Provide feedback