-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting progress and final status for a self-healing fix related to a detected anomaly #2215
Comments
The other way I found was requesting on the Despite that I still think that having the notifier interface being called by the |
@CCisGG I was wondering what do you think about the above. Thanks! |
I'm not entirely sure what you are trying to achieve here. My understand is you probably want to know the self-healing triggered rebalance states during/after the rebalance. Here is what I think:
cc: @mhratson @allenxwang |
Hey @CCisGG first of all thank you very much for your prompt answer!
Currently the executor status JSON already has a
In general, I don't think that looking at logs is a great UX. I would expect that people using Cruise Control try to use REST API as for example the Cruise Control UI project does in order to show such information. Users could have their own tool to monitor this kind of info via the REST endpoints. Said that, I am a maintainer of the Strimzi project (an operator to run Apache Kafka on Kubernetes) which has a full integration with Cruise Control and I am working on bringing the self-healing feature onboard. I am using a "custom" notifier to notify the operator that an anomaly was detected and the fix started but then, for the operator it seems not that simple to know that the fix was done (while it's simple when the user starts a rebalance and we can use the user_tasks endpoint). I was also proposing to have the notifier to be called when an anomaly was fixed. It needs the corresponding |
@CCisGG I guess this issue didn't get enough interest from the Cruise Control maintainers? |
Hi @ppatierno sorry for the delay. I had initialized a discussion for this issue within the team and we haven't reached any conclusion yet. I personally think your proposal has solid reasons, but I'm still hesitate since it may break existing users who depend on cruise control. |
And practically I think the review for this change may also take a long time. To unblock your case, I think a better idea might be relying on the logs for now. Please also feel free to add more useful logs and I think it would be easier to review and accepted. |
If you are referring to the addition to the
Of course it will take the time it deserves as any other suggestions. I am running the Strimzi project and I can understand that. Not all proposals and PRs are merged straight away.
It's not a solution. Within Strimzi, Cruise Control is used in a more automated way by the Strimzi operator and looking at the log means a human doing that. I think that in a cloud-native environment, Cruise Control should aim to have more automation facilitating interaction with operators and not humans. This was my goal. |
It will break the build for people who implement the interface but not implement the method.
It makes sense. |
Hi all,
I was looking at a way for getting the current status of the self-healing fix in progress for a detected anomaly.
AFAIU from the code (mostly looking at the
Executor
class and the usage of the_userTaskManager
) , when a task runs because it was triggered by a detected anomaly, this task is not a user task (of course!), it doesn't have a correspondingUserTaskInfo
instance so it won't show up in the/user_tasks
endpoint.So I was looking at using the
/state?json=true&substates=anomaly_detector
endpoint (which provides info via theAnomalyDetectorState
class) but also in this case, for each anomaly (in the cache) the last status isFIX_STARTED
and there is noFIX_DONE
. So when I have the "anomalyIdthe only way I see is about searching in the reported JSON for the anomaly with such id and
FIX_STARTEDstatus and crossing this information with the
ongoingSelfHealingAnomalyfield (if it reports the same anomaly id). But I see this not a great workaround to get a fix for an anomaly is running (
ongoingSelfHealingAnomalyis filled with anomaly id) or it ended (the anomaly was FIX_STARTED but now
ongoingSelfHealingAnomaly` is empty or doesn't exist anymore).Is there any better way by using Cruise Control REST API I can't see?
Also, is it possible to get the optimization proposal related to the fix for the detected anomaly?
Also, I noticed that the
AnomalyDetectorState
class has a reference to the implementation of a notifier, so I was thinking that even theAnomalyNotifier
interface could have an additional method being called when the fix was done (through themarkSelfHealingFinished
). What do you think about this as well?Thanks!
The text was updated successfully, but these errors were encountered: