This repository has been archived by the owner on Nov 25, 2024. It is now read-only.
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add horovodrun launch agent for Wholegraph (#200)
We have many users running the [Kubeflow training operator](https://github.com/kubeflow/training-operator) who are also interested in using Wholegraph. For our MPIJobs users, many of them still use [HorovodRun](https://github.com/horovod/horovod/tree/master) as the startup command. Therefore, we want to add HorovodRun as one of the Wholegraph launch agents so our users can use Wholegraph on top of Kubeflow. The new function will be similar to the existing MPI launcher agent, where the horovod library is only imported on demand. The horovod.tensorflow library will be used solely for the Horovod initialization command due to the issue with horovod.torch (see horovod/horovod#4009). After the Horovod initialization, the program can continue to run normal PyTorch code within each rank just like the mpi4py. fixes #201 Authors: - Tommy Li (https://github.com/Tomcli) Approvers: - https://github.com/linhu-nv - Brad Rees (https://github.com/BradReesWork) URL: #200
- Loading branch information