-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed functions iterating over tensors from torch compilation process #224
base: habana-main
Are you sure you want to change the base?
Conversation
@@ -92,7 +92,6 @@ def biggest_single_chunk(offset): | |||
return 0 | |||
|
|||
|
|||
@torch_compile_for_eager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't use torch._dynamo.graph_break() in place of mark_step instead of removing compilation of graph?
same question applies to all below cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the change cause the ops within the function to be executed eagerly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the change cause the ops within the function to be executed eagerly?
I assume(I have not run logging of fallback to eager events) that functions excluded from torch compile regions (as done in this PR) are now running eager e.g. pytorch ops from code that got torch compile decorator discarded are running eager .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some of internal testing it was revealed that excluding those functions(as in this PR) from torch compile region did not have an impact on performance or accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Problem:
Recently from dependencies of tgi-gaudi project some torch compile graph breaks were event out and it made some torch compiled graphs much bigger and more memory consuming which in some models could led to Device out-of-memory.
Solution:
Torch compiled graphs that wer causing Device OOM behaviour where related to having loops inside of them that where processing lots of tensors. Those functions with loops were excluded from torch compilation process.