You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Kismuz I'm been doing quite a bit of research and experimentation with BTGym lately. You have built a really impressive framework that is very rich and interesting to work with. There is so many possibilities and research direction that are already present in the library to explore and play with.
The system have A lot of moving parts, so I started to look for ways I can boost my experimentation and exploration of different architecture and hyperparameter in an easier way.
DeepMind had proposed a framework to efficiently explore and exploit the hyperparameter space (mentioned in #82 under Population Based Training).
Ray 'Tune' library have this framework already implemented and ready for integration for RL projects. General integration steps are as follows:
Run the Tune service
Give access to the hyperparameter you want Tune to have control over (example: learn_rate = tune_config['lr'])
Register you training class/function to Tune
Give access to evaluation metrics so hyperparameter can have result to optimize upon (loss, accuracy...)
Config the Tune framework parameters
Start training
For small projects integration is straight forward for BTGym, I tried to examin the code and it is seem to be more complex.
Ideally, we can have a tune_config in the launcher that control hyperparameters for the other configs (env_config, policy_config, trainer_config, cluster_config) so we can dynamically control which hyperparameters we want to be fixed and others that we want the system to explore.
A section to control the Tune Trial Schedulers parameters:
what framwork to use? (Population Based Training or other offered by Tune)
the method hyperparameters are updated (random, Bayesian... from what distribution range...)
And finally we need a way for the launcher.run() to properly interact with Tune.run_experiments(...)
@JacobHanouna, smart h.parms search is an excellent idea but extremely computationally expensive in DRL case; note a chilling comment in example code you pointed at :) :
Note that this requires a cluster with at least 8 GPUs in order for all trials
to run concurrently,
....
I'll take a closer look to see what can be done here but no earlier than in 3 - 5 days / bit busy developing combined model-based/model-free approach which looks very promising.
@Kismuz actually PBT shouldn't be so computationally expensive. This was part of the objectives of the DeepMind team when they created this search optimization framework.
the idea was to look for balance between random search that is sequential and require many iterations but trivial to select hyperparameters, and the use search of Bayesian optimizer that is heavy on computational cost for selecting the hyperparameters but works parrallel.
PBT use random hyperparameters selection but in a smart way. It compares the performance of the best current models and replacing bad performing ones with new random hyperparameters that are close to the good performing model. A smart random evolutionary optimizer
@Kismuz I'm been doing quite a bit of research and experimentation with BTGym lately. You have built a really impressive framework that is very rich and interesting to work with. There is so many possibilities and research direction that are already present in the library to explore and play with.
The system have A lot of moving parts, so I started to look for ways I can boost my experimentation and exploration of different architecture and hyperparameter in an easier way.
DeepMind had proposed a framework to efficiently explore and exploit the hyperparameter space (mentioned in #82 under Population Based Training).
Ray 'Tune' library have this framework already implemented and ready for integration for RL projects. General integration steps are as follows:
For small projects integration is straight forward for BTGym, I tried to examin the code and it is seem to be more complex.
Ideally, we can have a
tune_config
in the launcher that control hyperparameters for the other configs (env_config, policy_config, trainer_config, cluster_config) so we can dynamically control which hyperparameters we want to be fixed and others that we want the system to explore.A section to control the Tune Trial Schedulers parameters:
And finally we need a way for the
launcher.run()
to properly interact withTune.run_experiments(...)
an example from Ray Tune can be found here .
@Kismuz, if it's something you think is worth and possible to implement and we can come up with a good design I can try to implement it
The text was updated successfully, but these errors were encountered: