Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show more information in the UI about why Rollout gets aborted #20849

Open
andrii-korotkov-verkada opened this issue Nov 19, 2024 · 8 comments
Open
Labels
component:ui User interfaces bugs and enhancements enhancement New feature or request

Comments

@andrii-korotkov-verkada
Copy link
Contributor

Summary

Extract information from Argo Rollout or the controller to give users more context why rollout was aborted.

Motivation

Right now if the failure happened on the step before analysis run is configured to happen, there might be no indication what went wrong, e.g. no bad events and generic message in status like

    - lastTransitionTime: '2024-11-19T21:48:19Z'
      lastUpdateTime: '2024-11-19T21:48:19Z'
      message: Rollout aborted update to revision 386
      reason: RolloutAborted
      status: 'False'
      type: Progressing

The info+ logs weren't helpful either.

Proposal

Add more information to the UI about why rollout was aborted, showing it in the Events and/or status.

@andrii-korotkov-verkada andrii-korotkov-verkada added enhancement New feature or request component:ui User interfaces bugs and enhancements labels Nov 19, 2024
@crenshaw-dev
Copy link
Member

This might be a rollouts issue, if there's truly nothing else helpful in the status field.

@andrii-korotkov-verkada
Copy link
Contributor Author

Let me open a Rollout issue as well then.

@andrii-korotkov-verkada
Copy link
Contributor Author

@andrii-korotkov-verkada
Copy link
Contributor Author

Looks like the healthcheck already has

for _, condition in ipairs(obj.status.conditions) do
  if condition.type == "InvalidSpec" then
    hs.status = "Degraded"
    hs.message = condition.message
    return hs
  end
  if condition.type == "Progressing" and condition.reason == "RolloutAborted" then
    hs.status = "Degraded"
    hs.message = condition.message
    return hs
  end
  if condition.type == "Progressing" and condition.reason == "ProgressDeadlineExceeded" then
    hs.status = "Degraded"
    hs.message = condition.message
    return hs
  end
end

So it seems to be more of a rollout not providing enough info issue.

@andrii-korotkov-verkada
Copy link
Contributor Author

Tho I'm not sure why we check for type Progressing there.

@andrii-korotkov-verkada
Copy link
Contributor Author

Looks like it's how it's done

    - lastTransitionTime: '2024-12-19T17:30:48Z'
      lastUpdateTime: '2024-12-19T17:30:48Z'
      message: >-
        Rollout aborted update to revision 18: Metric "apdex-rate-fill" assessed
        Failed due to failed (4) > failureLimit (3)
      reason: RolloutAborted
      status: 'False'
      type: Progressing

@andrii-korotkov-verkada
Copy link
Contributor Author

Here's the relevant code in Rollouts:

	if isAborted {
		revision, _ := replicasetutil.Revision(c.rollout)
		message := fmt.Sprintf(conditions.RolloutAbortedMessage, revision)
		if c.pauseContext.abortMessage != "" {
			message = fmt.Sprintf("%s: %s", message, c.pauseContext.abortMessage)
		}
		condition := conditions.NewRolloutCondition(v1alpha1.RolloutProgressing, corev1.ConditionFalse, conditions.RolloutAbortedReason, message)
		if conditions.SetRolloutCondition(&newStatus, *condition) {
			c.recorder.Warnf(c.rollout, record.EventOptions{EventReason: conditions.RolloutAbortedReason}, message)
		}
	}

@andrii-korotkov-verkada
Copy link
Contributor Author

So looks like abortMessage can be missing sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:ui User interfaces bugs and enhancements enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants