-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Multiple model iterations per Optuna trial and mean performance objective #204
Comments
Regarding the duplicate tag (you are probably referring to issue #151 ?) I can definitely see your point, but why not implement it and let the user decide via a configurable training script argument. If implemented correctly, I also don't see why this would hinder the use of pruners. They could work based on mean/median objective performance of current and past trials. |
Hello, sorry for the late reply was on holidays...
Yes
yes and that comment:
I would be happy to have a draft PR ;) You should also know that this exist: #114
How do you prune a trial before the end a run if your objective is the mean/median of several runs? |
I agree with this.
By training multiple models simultaneously. Something like # ...
for split in range(n):
mean_rewards = []
for model in models:
model.learn(split_size, reset_num_timesteps=False)
mean_reward, _ = evaluate_policy(model, eval_env)
mean_rewards.append(mean_reward)
median_score = np.median(mean_rewards)
trial.report(median_score, split*split_size) I wonder if you can run, say, 50 or so models simultaneously, without having memory problems or anything. |
Please do =)
I was afraid of that answer... yes it does work but not for image-based environment and requires beefy machine anyway (for instance for DQN on Atari, a single model may require 40GB of RAM).
I would run only maximum 3-5 models simultaneously, unless the env is very simple and the network small. |
Let's open a draft PR and continue the discussion there. |
I currently have the problem that, a lot of times, the results Optuna optimization produces are not really too optimal, due to the stochastic nature of RL training. For example, training 3 agents with the same set of hyperparameters can result in 3 completely different learning curves (at least for the environment I'm training on).
Might it make sense to implement the optimization code in way, such that for each trial multiple agents are trained, and the mean or median performance is reported to Optuna instead?
Inside
utils/exp_manager.py hyperparameter_optimization
, line 713, I saw your comment "# TODO: eval each hyperparams several times to account for noisy evaluation". Is that maybe exactly what you mention there?I already had a look at the code and thought a little bit about how one might be able to do that. If somebody would be interested I could implement it and issue a pull request!
The text was updated successfully, but these errors were encountered: