Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch panics in collector #1800

Merged
merged 3 commits into from
Jan 15, 2024
Merged

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jan 12, 2024

The way the site endpoint is currently set up, it expects that each collector bench_next invocation that starts a collection will eventually call the /perf/onpush endpoint. However, if the collector crashes (which happened today because of a DB timeout), it won't ever call the endpoint. This will cause the following successful collection to mark both collections as completed at once when it calls /perf/onpush, which is not intuitive and causes weird situations with GH comments (two comments being posted at once). We should rather just eagerly notify about the error.

We could perhaps somehow store a special error into the DB, letting it know that a panic has happened, but since the panic can happen because of DB not behaving reliably.. it's hard to say if it would work.

@Mark-Simulacrum
Copy link
Member

I'm fine with this but I wonder if we should just poke on push within the wrapper script? Or in addition? That seems more reliable to me than catching panics, and should be equally effective?

@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 12, 2024

We would need to somehow communicate to the bash script that a benchmark took place though (otherwise it shouldn't call onpush). Or make bench_next blocking, but that would not keep the collector in sync with git pushes.

@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 12, 2024

onpush can't be triggered more than once per a minute, so it would probably be harmless to call it a second time from the bash script, although it's quite hacky. If it didn't make the second call in the 60 second interval, it could in theory finish an ongoing perf. run, but there should not be any ongoing perf. run, since the script won't start a new one until onpush is called..

This means that it should be ok to just call the endpoint always, from the script. We will just spam the website with a request every two minutes, but it should work.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge as-is or with the every-2-minutes-when-idle hit.

@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 15, 2024

I thought about it a bit more and I don't like the idle call, because calling onpush will basically clear the site's cache (both the DB index and the landing page). This would mean that if there's no collection going on, the site would be needlessly refreshing its cache all the time, that doesn't seem good.

So let's try the panic catching approach.

@Kobzol Kobzol merged commit e982ff3 into rust-lang:master Jan 15, 2024
11 checks passed
@Kobzol Kobzol deleted the collector-catch-panic branch January 15, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants