[inf] Add config var to enable keeping module on host #6846

oelayan7 · 2024-12-10T12:13:19Z

Using keep_module_on_host config var will let us control if the loaded checkpoints to model parameters will be moved to the device or stay on host

tjruwase · 2024-12-11T01:28:56Z

deepspeed/module_inject/auto_tp.py

@@ -17,9 +17,11 @@
 from deepspeed.module_inject.tp_shard import get_shard_size, get_shard_size_list


-def move(tensor, device):
+def move(tensor, device, keep_module_on_host=False):


Won't it be simpler to modify callers to pass device='cpu' when keep_module_on_host=True?

oelayan7 · 2024-12-31T12:02:43Z

@tjruwase Can you please rereview and retrigger the CI?

tests/unit/inference/test_inference.py

Using keep_module_on_host config var will let us control if the loaded checkpoints to model parameters will be moved to the device or stay on host --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]> Signed-off-by: siqi <[email protected]>

Using keep_module_on_host config var will let us control if the loaded checkpoints to model parameters will be moved to the device or stay on host --------- Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Logan Adams <[email protected]>

[inf] Add config var to enable keeping module on host

1635065

Using keep_module_on_host config var will let us control if the loaded checkpoints to model parameters will be moved to the device or stay on host

oelayan7 requested review from awan-10, tjruwase, loadams and tohtana as code owners December 10, 2024 12:13

tjruwase reviewed Dec 11, 2024

View reviewed changes

oelayan7 and others added 3 commits December 17, 2024 09:38

Merge branch 'master' into keep_module_on_host

e2f8932

Code review fixes

7d39f60

Merge branch 'master' into keep_module_on_host

45b74e4

oelayan7 requested a review from hwchen2017 as a code owner December 24, 2024 06:49

Merge branch 'master' into keep_module_on_host

b6004da

Merge branch 'master' into keep_module_on_host

cf92e87

tjruwase reviewed Jan 6, 2025

View reviewed changes

tests/unit/inference/test_inference.py Show resolved Hide resolved

oelayan7 and others added 2 commits January 7, 2025 11:09

Merge branch 'master' into keep_module_on_host

934b6a9

code review fix part 2

ff5fc8b

oelayan7 requested a review from tjruwase January 7, 2025 13:51

loadams removed the request for review from awan-10 January 7, 2025 16:08

Merge branch 'master' into keep_module_on_host

aecebd3

tjruwase approved these changes Jan 15, 2025

View reviewed changes

loadams enabled auto-merge January 15, 2025 19:15

loadams added this pull request to the merge queue Jan 15, 2025

Merged via the queue into deepspeedai:master with commit fae714d Jan 15, 2025
12 checks passed

oelayan7 deleted the keep_module_on_host branch January 16, 2025 06:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inf] Add config var to enable keeping module on host #6846

[inf] Add config var to enable keeping module on host #6846

oelayan7 commented Dec 10, 2024

tjruwase Dec 11, 2024

oelayan7 Dec 17, 2024

oelayan7 commented Dec 31, 2024

[inf] Add config var to enable keeping module on host #6846

[inf] Add config var to enable keeping module on host #6846

Conversation

oelayan7 commented Dec 10, 2024

tjruwase Dec 11, 2024

Choose a reason for hiding this comment

oelayan7 Dec 17, 2024

Choose a reason for hiding this comment

oelayan7 commented Dec 31, 2024