Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COMPILE] workflow for deepspeed + torch.compile #6570

Merged
merged 3 commits into from
Sep 27, 2024

Conversation

YizhouZ
Copy link
Contributor

@YizhouZ YizhouZ commented Sep 25, 2024

We use simple model + deepspeed zero 3 + torch.compile and count graph break numbers to demonstrate current status of combing deepspeed + torch.compile.

@tjruwase tjruwase requested review from tohtana and removed request for loadams September 25, 2024 12:37

jobs:
compile-tests:
runs-on: [self-hosted, intel, xpu]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?

you mean adding a new worker for this workflow?

dynamo_stats.subtract(start_stats)

if comm.get_rank() == 0:
print(dynamo_stats)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.

sure!

@delock
Copy link
Collaborator

delock commented Sep 26, 2024

I also have a question for discussion. Initially I thought of reference test result of this workflow (as number of graph breaks) on a page, a little bit like workflow badges on DeepSpeed README page. But I didn't find proper tool allowing us to do that. Is there a pointer or direction where we should look at? Thanks!

@delock
Copy link
Collaborator

delock commented Sep 26, 2024

xpu_compile workflow now passed. Now the graph break number shows in test summary
https://github.com/microsoft/DeepSpeed/actions/runs/11044896580?pr=6570

Copy link
Contributor

@tohtana tohtana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @YizhouZ, looks good to me. It is great to have utilities like this in our workflows.

@tohtana tohtana enabled auto-merge September 27, 2024 05:28
@tohtana tohtana added this pull request to the merge queue Sep 27, 2024
Merged via the queue into deepspeedai:master with commit d4e1895 Sep 27, 2024
12 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Oct 14, 2024
This PR extends #6570 by
showing a breakdown of graph breaks. So we can see how graph breaks are
distributed among different reasons. An example of graph break output
can be seen from the following workflow run
https://github.com/microsoft/DeepSpeed/actions/runs/11199157962
github-merge-queue bot pushed a commit that referenced this pull request Oct 21, 2024
With intel-extension-for-pytorch=2.3.110 released last month, max1100 CI
workflow can be updated too. Software versions aligned with #6570 .

Increased CI tests scope for torch/ipex2.3 will be in later PR.

This workflow passed in my cloned repo self-hosted runner.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants