Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interim fix for op-by-op parser [#313] #314

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ddilbazTT
Copy link
Contributor

Cannot find root cause - will add safeguards to make sure failure happens without interfering with other jobs

Ticket

#313

Problem description

Feb 17 nightly tests rerun attempts 2 and 3 failed during tt-torch/results/parse_op_by_op_results.py -- generate_status_report Rerun attempt 1 has succeded during generate_status_report. Problem cannot be reproduced but added safeguard so that generate_status_report does not interfere with the execution of nightly tests.

What's changed

  • Wrapped generate_status_report in a try/ except block
  • Wrapped json file parsing in a try/except block
  • Added print statements for easier debugging
  • Skipping parsing if json input is not loaded in a dictionary format

@ddilbazTT ddilbazTT changed the title Interim fix for op-by-op parseer [#313] Interim fix for op-by-op parser [#313] Feb 18, 2025
Cannot find root cause - will add safeguards to make sure failure happens without interfering with other jobs
@ddilbazTT ddilbazTT force-pushed the ddilbaz/op_by_op_parser branch from e560c11 to 078181e Compare February 18, 2025 20:45
@@ -843,5 +861,8 @@ def default(shlo_op, md_data):


if __name__ == "__main__":
generate_status_report()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the order not reversed now? Does that make a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is reversed because I do not want process_json_files to be killed by generate_status_report. try-except block should handle this but added it for extra measure.

(count / total_ops) * 100 if total_ops > 0 else 0
)
status_report[model_name] = status_percentages
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still fail the job if this doesn't happen, perhaps we can add a test to see if any jsons failed and if they did, assert at the end.

Copy link

TestsPassed ✅Skipped ⚠️Failed
TT-Torch Tests435 ran428 passed7 skipped0 failed
TestResult
No test annotations available

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (a93c878) to head (078181e).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@     Coverage Diff     @@
##   main   #314   +/-   ##
===========================
===========================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants