Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#13363: Surface job errors where Set up runner does not complete successfully #13379

Merged
merged 2 commits into from
Oct 2, 2024

Conversation

tt-rkim
Copy link
Collaborator

@tt-rkim tt-rkim commented Oct 2, 2024

Ticket

#13363

Problem description

We wanted to surface errors where we die at setup so we can more easily see when runs are failing to reset the card, and possibly causing hosts to die.

What's changed

Search for failing Set up runner steps where it starts the action, but never completes it.

We may need to expand to complete runner steps as well.

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • New/Existing tests provide coverage for changes

@tt-rkim tt-rkim changed the title #13363: #13363: Surface job errors where Set up runner does not complete successfully Oct 2, 2024
…tarts the step, but never successfully completes it
@tt-rkim tt-rkim force-pushed the rkim/13363-job-error branch from 0795764 to 7a46c6c Compare October 2, 2024 17:48
@tt-rkim tt-rkim merged commit 3478414 into main Oct 2, 2024
9 checks passed
@tt-rkim tt-rkim deleted the rkim/13363-job-error branch October 2, 2024 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant