-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interim fix for op-by-op parser [#313] #314
base: main
Are you sure you want to change the base?
Conversation
Cannot find root cause - will add safeguards to make sure failure happens without interfering with other jobs
e560c11
to
078181e
Compare
@@ -843,5 +861,8 @@ def default(shlo_op, md_data): | |||
|
|||
|
|||
if __name__ == "__main__": | |||
generate_status_report() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the order not reversed now? Does that make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is reversed because I do not want process_json_files to be killed by generate_status_report. try-except block should handle this but added it for extra measure.
(count / total_ops) * 100 if total_ops > 0 else 0 | ||
) | ||
status_report[model_name] = status_percentages | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should still fail the job if this doesn't happen, perhaps we can add a test to see if any jsons failed and if they did, assert at the end.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #314 +/- ##
===========================
===========================
☔ View full report in Codecov by Sentry. |
Cannot find root cause - will add safeguards to make sure failure happens without interfering with other jobs
Ticket
#313
Problem description
Feb 17 nightly tests rerun attempts 2 and 3 failed during tt-torch/results/parse_op_by_op_results.py -- generate_status_report Rerun attempt 1 has succeded during generate_status_report. Problem cannot be reproduced but added safeguard so that generate_status_report does not interfere with the execution of nightly tests.
What's changed