Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: alert when solvers fail to parse /settle call #3231

Open
fleupold opened this issue Jan 10, 2025 · 0 comments
Open

chore: alert when solvers fail to parse /settle call #3231

fleupold opened this issue Jan 10, 2025 · 0 comments
Labels
track:maintenance maintenance track track:post-mortem post mortem follow up tasks

Comments

@fleupold
Copy link
Contributor

Untracked follow up item from the 2024-11-19 Post Mortem

Background

We've had issues in the past where solvers are unable to parse our invocation to settle (due to a potentially breaking API change they didn't prepare for).

This tasks captures improving the metric we are currently collecting to contain more nuanced information about the reason for a settlement failure, which will allow us to create more appropriate alerting (e.g. alert immediately when a solver is unable to parse the request, while not alerting for expected settlement failures e.g. due to price volatility making the settlement revert)

Details

  • enum SettleError is currently just a wrapper for intransparent anyhow::Error
  • Yet, failure to settle can be manifold including
    • Timeout (somewhat expected in case solver tries to settle until the last millisecond)
    • Domain specific default driver implementation error (enum)
    • Other unexpected parsing error
    • Network errors
  • Autopilot's SettleError should become a more strongly typed enum that can differentiate between the different cases and also reports them in the settle failure metric (allowing custom alerting tolerances in case of unexpected errors)

This will allow us to differentiate between expected failures (settlement timeouts and reverts) and unexpected failures (e.g. parsing).

Acceptance criteria

Alert created that informs us with low tolerance threshold when solvers fail to settle for unexpected reasons.

@fleupold fleupold added track:maintenance maintenance track track:post-mortem post mortem follow up tasks labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
track:maintenance maintenance track track:post-mortem post mortem follow up tasks
Projects
None yet
Development

No branches or pull requests

1 participant