[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units #37574

achordia20 · 2023-07-19T16:39:15Z

Description

Rather than just allowing just a num_gpus resource, it would be great to also have the ability to specify num_gpu_resources as a logical requirement.

Use case

This would allows us to port workloads across different gpu types very easily. Right now, we have to adjust gpu resource requests across different gpu's with different gpu memory available.

rkooo567 · 2023-07-24T21:38:51Z

cc @ericl Do you have any thoughts on this issue?

ericl · 2023-07-24T21:51:44Z

We've been discussing this for LLM serving use cases, and this would solve some problems but not all problems of scheduling large models.

It's not a bad idea to add a logical "gpu_memory" resource automatically though, similar to how we add the "memory" logical resource. This could be done in the same code that adds the "accelerator_type" resource.

cadedaniel · 2023-07-24T22:09:31Z

@achordia20 could you add more detail here? What kind of workloads are you able to port between GPU types?

This would allows us to port workloads across different gpu types very easily. Right now, we have to adjust gpu resource requests across different gpu's with different gpu memory available.

Asking because in our experience there's a lot of per-GPU-type configuration that needs to change 😄 . Are there workloads that can easily move between GPU types?

martystack · 2023-10-05T00:31:50Z

I also would like to be able to schedule on graphics memory as well. I think it provides a better utilization strategy for the Ray users rather than an admin segmenting off certain gpus for one task vs another. My team sees all types of configurations, some very advanced GPU rigs and some simple. GPU memory seems to be the most logical method for requesting / allocating resources.

ericl · 2023-10-05T05:50:33Z

Maybe one way we could support this is by translating gpu_memory into GPU requests of a specific accelerator type label(s) under the hood (i.e., it's syntactic sugar for manually specifying accelerator types). That way we wouldn't have to make changes to the scheduler internals.

thatcort · 2023-10-10T16:50:21Z

It's a bit strange to specify a percentage of a GPU that's required, since you don't know in advance the specs of the GPU the task will be scheduled on.

jonathan-anyscale · 2023-12-11T17:23:56Z

Hi, want to quick update on this. So we have REP and prototype ready for review. Please try out and leave feedback!
Prototype: #41147
REP: ray-project/enhancements#47

jonathan-anyscale · 2023-12-19T00:51:42Z

@thatcort @martystack @achordia20 have you guys have chance to take a look in the REP and try the prototype?

thatcort · 2023-12-26T22:36:56Z

I added a comment on the REP doc. Overall it looks good! It would be a nice improvement to be able to specify that a task needs multiple GPUs with a certain amount of memory on each.

achordia20 added enhancement Request for new feature and/or capability triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 19, 2023

jjyao added the core Issues that should be addressed in Ray Core label Jul 19, 2023

rkooo567 added P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Aug 2, 2023

jjyao assigned jonathan-anyscale and jjyao Sep 6, 2023

jjyao added core-scheduler core-hardware Hardware support in Ray core: e.g. accelerators labels Sep 25, 2023

jsdir mentioned this issue Nov 2, 2023

[Core] Remote placement using gpu memory #26929

Open

jonathan-anyscale linked a pull request Nov 16, 2023 that will close this issue

[Core] gpu memory scheduling prototype #41147

Draft

8 tasks

jjyao added P2 Important issue, but not time-critical and removed P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared labels Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units #37574

[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units #37574

achordia20 commented Jul 19, 2023

rkooo567 commented Jul 24, 2023

ericl commented Jul 24, 2023

cadedaniel commented Jul 24, 2023

martystack commented Oct 5, 2023

ericl commented Oct 5, 2023

thatcort commented Oct 10, 2023

jonathan-anyscale commented Dec 11, 2023

jonathan-anyscale commented Dec 19, 2023

thatcort commented Dec 26, 2023

[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units #37574

[Ray Core] - Add ability to specify gpu memory resources in addition to gpu units #37574

Comments

achordia20 commented Jul 19, 2023

Description

Use case

rkooo567 commented Jul 24, 2023

ericl commented Jul 24, 2023

cadedaniel commented Jul 24, 2023

martystack commented Oct 5, 2023

ericl commented Oct 5, 2023

thatcort commented Oct 10, 2023

jonathan-anyscale commented Dec 11, 2023

jonathan-anyscale commented Dec 19, 2023

thatcort commented Dec 26, 2023