-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Fix incorrect gpu ids if placement group bundle index specified #44385
[Core] Fix incorrect gpu ids if placement group bundle index specified #44385
Conversation
Signed-off-by: wuxibin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Oh actually I think this PR will cause issue if user uses both bundle index and non-bundle index in the same PG. For example:
It's possible that both actors are assigned to GPU 0. This is not an easy fix unfortunately given how placement group is implemented now. |
@jjyao Yes, I can reproduce this problem, I will take a look how to fix it. |
@wuxibin89 I think this is not easy to fix right now. This problem will be fixed automatically once we re-implemented PG using labels and virtual clusters (ray-project/enhancements#49) |
@jjyao Hi, is there any plan that re-implementing PG with labels and virtual clusters? It seems like there's a lot work to do and may not available in short term. We have a project(will open soon) that heavily relay on this fix to colocate multiple actor groups on the same PG. So can we fix this first and give user a warning not to use bundle index and non-bundle index in the same PG? cc @anyscalesam |
@wuxibin89 grabbed some time to chat about this further as full fledged VC implementation will be difficult and perhaps overkill. I think what would make more sense is for us to sit down for 15m and understand your use case well and than brainstorm on a solution that perhaps you can contribute back to. I setup time for later this week. cc @jjyao |
Why are these changes needed?
This PR fix incorrect
CUDA_VISIBLE_DEVICES
whenplacement_group_bundle_index
is specified.Related issue number
Closes #29811
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.cc @jjyao