You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Untracked follow up item from the 2024-11-19 Post Mortem
Background
We've had issues in the past where solvers are unable to parse our invocation to settle (due to a potentially breaking API change they didn't prepare for).
This tasks captures improving the metric we are currently collecting to contain more nuanced information about the reason for a settlement failure, which will allow us to create more appropriate alerting (e.g. alert immediately when a solver is unable to parse the request, while not alerting for expected settlement failures e.g. due to price volatility making the settlement revert)
Details
enum SettleError is currently just a wrapper for intransparent anyhow::Error
Yet, failure to settle can be manifold including
Timeout (somewhat expected in case solver tries to settle until the last millisecond)
Domain specific default driver implementation error (enum)
Other unexpected parsing error
Network errors
Autopilot's SettleError should become a more strongly typed enum that can differentiate between the different cases and also reports them in the settle failure metric (allowing custom alerting tolerances in case of unexpected errors)
This will allow us to differentiate between expected failures (settlement timeouts and reverts) and unexpected failures (e.g. parsing).
Acceptance criteria
Alert created that informs us with low tolerance threshold when solvers fail to settle for unexpected reasons.
The text was updated successfully, but these errors were encountered:
Untracked follow up item from the 2024-11-19 Post Mortem
Background
We've had issues in the past where solvers are unable to parse our invocation to settle (due to a potentially breaking API change they didn't prepare for).
This tasks captures improving the metric we are currently collecting to contain more nuanced information about the reason for a settlement failure, which will allow us to create more appropriate alerting (e.g. alert immediately when a solver is unable to parse the request, while not alerting for expected settlement failures e.g. due to price volatility making the settlement revert)
Details
SettleError
is currently just a wrapper for intransparentanyhow::Error
SettleError
should become a more strongly typed enum that can differentiate between the different cases and also reports them in the settle failure metric (allowing custom alerting tolerances in case of unexpected errors)This will allow us to differentiate between expected failures (settlement timeouts and reverts) and unexpected failures (e.g. parsing).
Acceptance criteria
Alert created that informs us with low tolerance threshold when solvers fail to settle for unexpected reasons.
The text was updated successfully, but these errors were encountered: