You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should have a general alert for computes to report if they can't reach a pageserver for too long. To catch issues like #10309, where the pageserver is hung internally and it's not obvious exactly what's wrong & which endpoint is affected, without knowing which clients couldn't get what they wanted.
Ideas:
metric for the longest-waiting request currently in flight
metrics for ratio of failed/succeeded requests in recent history
@ololobus we might need your team's help/advice to build this
The text was updated successfully, but these errors were encountered:
We should have a general alert for computes to report if they can't reach a pageserver for too long. To catch issues like #10309, where the pageserver is hung internally and it's not obvious exactly what's wrong & which endpoint is affected, without knowing which clients couldn't get what they wanted.
Ideas:
@ololobus we might need your team's help/advice to build this
The text was updated successfully, but these errors were encountered: