[COMPILE] workflow for deepspeed + torch.compile #6570

YizhouZ · 2024-09-25T06:17:44Z

We use simple model + deepspeed zero 3 + torch.compile and count graph break numbers to demonstrate current status of combing deepspeed + torch.compile.

loadams · 2024-09-25T22:38:22Z

.github/workflows/xpu-compile.yml

+
+jobs:
+  compile-tests:
+    runs-on: [self-hosted, intel, xpu]


@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?

@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?

you mean adding a new worker for this workflow?

loadams · 2024-09-25T22:38:49Z

tests/torch_compile/test_compile.py

+dynamo_stats.subtract(start_stats)
+
+if comm.get_rank() == 0:
+    print(dynamo_stats)


@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.

@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.

sure!

delock · 2024-09-26T00:30:07Z

I also have a question for discussion. Initially I thought of reference test result of this workflow (as number of graph breaks) on a page, a little bit like workflow badges on DeepSpeed README page. But I didn't find proper tool allowing us to do that. Is there a pointer or direction where we should look at? Thanks!

delock · 2024-09-26T09:29:59Z

xpu_compile workflow now passed. Now the graph break number shows in test summary
https://github.com/microsoft/DeepSpeed/actions/runs/11044896580?pr=6570

tohtana

Thank you @YizhouZ, looks good to me. It is great to have utilities like this in our workflows.

This PR extends #6570 by showing a breakdown of graph breaks. So we can see how graph breaks are distributed among different reasons. An example of graph break output can be seen from the following workflow run https://github.com/microsoft/DeepSpeed/actions/runs/11199157962

With intel-extension-for-pytorch=2.3.110 released last month, max1100 CI workflow can be updated too. Software versions aligned with #6570 . Increased CI tests scope for torch/ipex2.3 will be in later PR. This workflow passed in my cloned repo self-hosted runner.

init

23b0338

YizhouZ requested review from tjruwase and loadams as code owners September 25, 2024 06:17

tjruwase requested review from tohtana and removed request for loadams September 25, 2024 12:37

loadams reviewed Sep 25, 2024

View reviewed changes

format fix & adding env variable

f52222c

tohtana approved these changes Sep 27, 2024

View reviewed changes

Merge branch 'master' into yizhou/add_compile_test

77c7cf7

tohtana enabled auto-merge September 27, 2024 05:28

tohtana added this pull request to the merge queue Sep 27, 2024

Merged via the queue into deepspeedai:master with commit d4e1895 Sep 27, 2024
12 checks passed

delock mentioned this pull request Oct 6, 2024

[compile] Show breakdown of graph break #6601

Merged

Liangliang-Ma mentioned this pull request Oct 21, 2024

[XPU] upgrade xpu max1100 CI workflow to pytorch2.3 #6646

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[COMPILE] workflow for deepspeed + torch.compile #6570

[COMPILE] workflow for deepspeed + torch.compile #6570

YizhouZ commented Sep 25, 2024

loadams Sep 25, 2024

YizhouZ Sep 26, 2024

loadams Sep 25, 2024

YizhouZ Sep 26, 2024

delock commented Sep 26, 2024

delock commented Sep 26, 2024

tohtana left a comment

[COMPILE] workflow for deepspeed + torch.compile #6570

[COMPILE] workflow for deepspeed + torch.compile #6570

Conversation

YizhouZ commented Sep 25, 2024

loadams Sep 25, 2024

Choose a reason for hiding this comment

YizhouZ Sep 26, 2024

Choose a reason for hiding this comment

loadams Sep 25, 2024

Choose a reason for hiding this comment

YizhouZ Sep 26, 2024

Choose a reason for hiding this comment

delock commented Sep 26, 2024

delock commented Sep 26, 2024

tohtana left a comment

Choose a reason for hiding this comment