-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add gather/scatter support 1D tensor #74
[Feature] Add gather/scatter support 1D tensor #74
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good to me. Thx!
/ok to test |
@chang-l looks like you have a check style failure here too |
@alexbarghi-nv I think it should be fixed now. Can you pls kick off the CI again? |
/ok to test |
/merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
/merge |
1 similar comment
/merge |
Migrated from: rapidsai/wholegraph#229
This PR is to add gather/scatter support 1D tensor on python level, as WholeGraph should support basic indexing operations for both 1D (array) and 2D (matrix) wholememory tensors. Without this PR, if with 1D wholememory tensor, gather/scatter op does not work, e.g., https://github.com/rapidsai/wholegraph/blob/0efba33835d6e4e104b5d7101a91e0ea55a6ca53/python/pylibwholegraph/pylibwholegraph/torch/tensor.py#L89
To test, run
Remaining issue:
On my local test with single GPU, the test can pass.
For multiGPU setup, gather op works fine, but 1D scatter seems not working as it would crash at:
https://github.com/rapidsai/wholegraph/blob/2e963b98aa6027c300d60e839010d3dd8ca422eb/python/pylibwholegraph/pylibwholegraph/tests/wholegraph_torch/ops/test_wholegraph_gather_scatter.py#L108 with incorrect scatter outputs:
Indices where allclose fails: tensor([0., 0., 0., ..., 0., 0., 0.]) tensor([ 1435., 1439., 1443., ..., 257703., 257707., 257711.])
This would work if this bugfix is merged: #73
cc. @linhu-nv