This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Replies: 1 comment 2 replies
-
Thank you @feevos. Do you think Horovod should set this variable automatically during startup via horovodrun? |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Dear all,
during distributed training with Horovod on our local HPC facilities (~100 nodes with 4xP100 GPUs each), we observed different memory footprint differences between GPU in local rank 0, and the other GPUs. This is documented in horovod issue 2625 here.
This solution fixed it for me, leaving it for reference for other users:
Regards,
Foivos
Beta Was this translation helpful? Give feedback.
All reactions