Misc code associated with exercises in Sutton's 'Reinforcement Learning'. Suitable environment
provided for conda in envirnment.yml
Hadn't started this repo at this point
Go back and implement at least a bandit framework
Implemented MDP framework in a minimal-dependencies manner, with reusable abstractions for probability
distributions. The only dependencies are numpy
and matplotlib
(the latter just to generate chart
results for the exercises)
Should try porting to a probabilistic programming framework such as Pyro
Added code for (4.5) and (4.9) using the MDP framework. Interesting results in regard to stability of policy where multiple optimal (or near optimal?) policies exist