-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[COMPILE] workflow for deepspeed + torch.compile #6570
Conversation
|
||
jobs: | ||
compile-tests: | ||
runs-on: [self-hosted, intel, xpu] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YizhouZ - does it make more sense to add this to its own workflow, or to add it to the existing xpu workflow?
you mean adding a new worker for this workflow?
dynamo_stats.subtract(start_stats) | ||
|
||
if comm.get_rank() == 0: | ||
print(dynamo_stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YizhouZ - could you run the pre-commit formatter on the PR, that will resolve the formatting/python issues.
sure!
I also have a question for discussion. Initially I thought of reference test result of this workflow (as number of graph breaks) on a page, a little bit like workflow badges on DeepSpeed README page. But I didn't find proper tool allowing us to do that. Is there a pointer or direction where we should look at? Thanks! |
xpu_compile workflow now passed. Now the graph break number shows in test summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @YizhouZ, looks good to me. It is great to have utilities like this in our workflows.
This PR extends #6570 by showing a breakdown of graph breaks. So we can see how graph breaks are distributed among different reasons. An example of graph break output can be seen from the following workflow run https://github.com/microsoft/DeepSpeed/actions/runs/11199157962
With intel-extension-for-pytorch=2.3.110 released last month, max1100 CI workflow can be updated too. Software versions aligned with #6570 . Increased CI tests scope for torch/ipex2.3 will be in later PR. This workflow passed in my cloned repo self-hosted runner.
We use simple model + deepspeed zero 3 + torch.compile and count graph break numbers to demonstrate current status of combing deepspeed + torch.compile.