-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for intra.broker.goals in anomaly detection / self healing #6
base: master
Are you sure you want to change the base?
Conversation
@ecojan one particular aspect which is not yet clear to me is the order of the INTRA and INTER broker goal violations. |
My expectancy is GOAL_VIOLATIONS followed by INTRA_BROKER_GOAL_VIOLATIONS. Line 243 in 908ae8f
the detectors are scheduled on a fixed rate and in order (have not checked the inner workings for maintaining order on ScheduledExecutorService from the java.util.concurrent package). |
thanks for the pointer - yeah - they all seem to be scheduled concurrently with the same (default) interval. |
a2437a7
to
be9586f
Compare
ebd4582
to
2c8caf9
Compare
45eb48f
to
c399102
Compare
See: linkedin#1242 This is an internal patch to use SystemCpuLoad to monitor broker pod cpu usage knowing that we always run the broker in a container. As `SystemCpuLoad` reports un-normalized cpu load across all cores we do this normalization to match CC expected value in [0, 1] interval.
In our environment we've seen a case where RackAwareDistributionGoal and MinTopicLeaderPerBrokerGoal are conflicting. As both are hard goals the rebalance fails Lowering MinTopicLeaderPerBrokerGoal as a soft goal
…r intra broker goals
did a rebase, waiting for build status |
a1e2f2c
to
38f7b74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
@@ -553,7 +590,12 @@ public void run() { | |||
// We compute the proposal even if there is not enough modeled partitions. | |||
ModelCompletenessRequirements requirements = _loadMonitor.meetCompletenessRequirements(_defaultModelCompletenessRequirements) | |||
? _defaultModelCompletenessRequirements : _requirementsWithAvailableValidWindows; | |||
ClusterModel clusterModel = _loadMonitor.clusterModel(_time.milliseconds(), requirements, _allowCapacityEstimation, operationProgress); | |||
|
|||
ClusterModel clusterModel = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why the cluster model if first set to null and afterwards with an actual value?
912522b
to
ac32b30
Compare
fea087c
to
2ab9d9e
Compare
f8083bf
to
460f615
Compare
050d617
to
069949c
Compare
04e8dd3
to
ef8193e
Compare
1c25adc
to
8b0caf1
Compare
6ce0c86
to
0379c8f
Compare
This PR resolves linkedin#1264 by enabling the self-healing for intra-broker goals. This however does not account for race conditions between Cluster level Goals and will only trigger after said goals are finished.
The scope of this PR is to allow self-healing on IntraBroker goals for now (as future work might make this obsolete as intra-broker goals and regular goals might be made to work together).
New configuration:
New Anomaly Type: INTRA_BROKER_GOAL_VIOLATION
A new type of Anomaly that needs to be treated differently from the regular GOAL_VIOLATIONS.
When configuring for the rebalance parameters for this violation, we are skipping hard goals check and ignoring proposalsCache.
We are skipping hard goals check because intra-broker goals may not be hard goals (at the current given time).
We are ignoring proposalsCache because the proposals model takes into account all given default goals. If we add the intraBrokersGoals this will not work properly as currently, disk-granularity goals do not work with broker-granularity goals.
New Anomaly detectors: IntraBrokerGoalViolationDetector
This anomaly detector is only looking for the anomaly.detection.intra.broker.goal and unlike the GoalViolationDetector, it's using a cluster model that is generating replica placement on disks as well.
This anomaly detector mainly functions for INTRA_BROKER_GOAL_VIOLATIONS.
What is missing from this feature