-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't crash with mulitple tasks of the same name #607
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #607 +/- ##
==========================================
- Coverage 64.50% 64.50% -0.01%
==========================================
Files 280 280
Lines 14112 14113 +1
Branches 1824 1824
==========================================
Hits 9103 9103
- Misses 4348 4349 +1
Partials 661 661 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What situation is this covering? When would there be two tasks with the same name? Aren't the names the engine or build ids or something?
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/ClearMLMonitorService.cs
line 77 at r1 (raw file):
.OrderBy(t => t.Created) .Select((t, i) => (Position: i, Task: t)) .GroupBy(e => e.Task.Name)
The task name is the build id. Build ids are GUIDs. How are we getting duplicate build ids?
The failure (which has actually happened 30+ times) is multiple tasks names "Inferencev3". It causes a failure of the update script. |
Previously, ddaspit (Damien Daspit) wrote…
The name is different - the name is user-defined, while there is a separate internal ID assigned to each task. |
Previously, johnml1135 (John Lambert) wrote…
The name corresponds to our ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @johnml1135)
src/Machine/src/Serval.Machine.Shared/Services/ClearMLMonitorService.cs
line 77 at r1 (raw file):
Previously, johnml1135 (John Lambert) wrote…
The name corresponds to our ID.
Somehow non-Serval tasks end up in the list of tasks. First, we get all tasks that correspond to currently running builds. Then, we get all tasks that are currently in the Serval queues. Are these queries retrieving non-Serval tasks? Do others use the Serval queues?
Previously, ddaspit (Damien Daspit) wrote…
Mark put some jobs on production - I don't know why - I will ask him. That is the source of this issue. |
Previously, johnml1135 (John Lambert) wrote…
I talked with Mark - he is going to limit the number of simultaneous jobs to 2 (to not bomb our GPUs), but they may still have the same name. Let's put this fix in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @johnml1135)
Fix #606.
This change is