You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I run (DreamerV3) experiments, especially ones with a replay_ratio > 1.0, training takes quite a long time. During these runs, my hardware resources are not being used much (e.g. only 1-2 cpu cores at around 50% each) - so clearly there is more computational power available.
I was wondering if there is anything I can do to make SheepRL use more of the available hardware resources. I am already running multiple environments in parallel. I also tried increasing the num_threads, but this seems to have no effect.
Training this for up to aroung 8000 steps, where it reached the ~500 reward threshold, took around 3 hours. In the log data it lists a Time/sps_train of ~0.046 (which I assume is environment steps per second).
Thanks in advance for this great library!
The text was updated successfully, but these errors were encountered:
Hi @defrag-bambino, the slowdown when raising the replay-ratio is expected, as the higher the replay-ratio the more gradient steps are computed by the agent per policy-step. Since the training steps happens mainly in the GPU i would look at the GPU rather than the CPU (which is used mainly for saving experiences in the buffer and running a fairly simple env in this case) stats.
Furthermore I suggest you to not use the fabric.strategy=ddp when running on single device.
Another suggestion to speedup the training is to use this branch where we have introduced the compilation through torch.compile which should speedup your training on the right GPU.
If you try out that branch can you kindly report your findings in this issue?
Hi,
when I run (DreamerV3) experiments, especially ones with a replay_ratio > 1.0, training takes quite a long time. During these runs, my hardware resources are not being used much (e.g. only 1-2 cpu cores at around 50% each) - so clearly there is more computational power available.
I was wondering if there is anything I can do to make SheepRL use more of the available hardware resources. I am already running multiple environments in parallel. I also tried increasing the num_threads, but this seems to have no effect.
Here is a simple example training command:
Training this for up to aroung 8000 steps, where it reached the ~500 reward threshold, took around 3 hours. In the log data it lists a Time/sps_train of ~0.046 (which I assume is environment steps per second).
Thanks in advance for this great library!
The text was updated successfully, but these errors were encountered: