Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications not firing for Analysis Run fails when Analysis Run is part of an Expriment #4009

Open
meeech opened this issue Dec 18, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@meeech
Copy link
Contributor

meeech commented Dec 18, 2024

I am starting this ticket to capture the information as I investigate and try to resolve this issue.
If anyone has any pointers or thoughts, please add them.

When an analysis run fail happens and that analysis run is part of an inline experiment step, we don't get the on-analysis-run-error or on-analysis-run-fail notification.

Analysis Run Error
✅ Background Analysis Run: event: AnalysisRunError object: rollout/basic-rollout
❌ Inline Step Analysis Run: event: AnalysisRunError object: experiment/basic-rollout-exp-steps-b66774df5-3-0

We get the RolloutAborted notification for both, because the event that fires belongs to the rollout/* object in both cases

Analysis Run Fail
✅ Background Analysis Run: event: AnalysisRunFailed object: rollout/basic-rollout
❌ Inline Step Analysis Run: event: AnalysisRunFailed object: experiment/basic-rollout-exp-steps-bd7bdfcc8-4-0

We get the RolloutAborted notification for both, because the event that fires belongs to the rollout/* object in both cases

So this has me thinking theres a few possible options:

  • with the notif engine, we don't give it access to the experiment object events?
  • Or is this a case where the events are being fired off the wrong object - where we use the experiment EventRecorder, when we should find the parent(?) rollout object and use its EventRecorder?

I'll keep digging. Unsure what the ideal would be:
would we like something like on-experiment-analysis-run-failed, on-experiment-analysis-run-error... or would things be better served with them using the already existing triggers? I think when its a step it would make sense to use the existing triggers, and have the rollout object available for the templates, but what about stand alone experiments?

Version

1.7.2 (but this has existed as a problem as long as I've been using experiment step, so at least 1.5/1.6


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@meeech meeech added the bug Something isn't working label Dec 18, 2024
@meeech
Copy link
Contributor Author

meeech commented Dec 19, 2024

More rough notes from a conversation with @zachaller:

Rollouts uses the notification engine in 2 ways

  • As a library (handles on-event triggers)
  • As a controller (handles when triggers)

Path to explore

Create a new notification controller / deployment - this is similar to what Argo CD does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant