Fixes issue#398 that decreasing replicas will make zookeeper unrecoverable when zookeeper not running. #399

stop-coding · 2021-10-14T08:06:53Z

Change log description

Fixes the bug that decreasing replicas will make zookeeper unrecoverable when zookeeper not running.

Purpose of the change

Fixes #398

What the code does

Add protection for setting Stateful when zookeeper not running.
If zookeeper not running, we will prohibited to update replicas status until zookeeper resume.
When user decrease replicas value, it will remove node with reconfig firstly.
Keep do that remove node with reconfig on preStop before pod exit.

How to verify it

Create an cluster that size is 3 (kubectl create -f zk.yaml).
Wait all pod running, named: zk-0\zk-1\zk-2.
Delete zk-1\zk-2 pod, make cluster of zookeeper unable to provide services.
"kubectl edit zk" that change replicas to 1 immediately.
Wait some time, replicas will decrease to 1.
Now, checking that:
Is zk-0 is all right?

codecov · 2021-10-14T08:13:08Z

Codecov Report

Merging #399 (15e58cf) into master (e6d84b8) will decrease coverage by 0.07%.
The diff coverage is 85.18%.

@@            Coverage Diff             @@
##           master     #399      +/-   ##
==========================================
- Coverage   84.11%   84.04%   -0.08%     
==========================================
  Files          12       12              
  Lines        1643     1667      +24     
==========================================
+ Hits         1382     1401      +19     
- Misses        177      185       +8     
+ Partials       84       81       -3

Impacted Files	Coverage Δ
pkg/zk/zookeeper_client.go	`82.50% <77.77%> (-1.38%)`	⬇️
...er/zookeepercluster/zookeepercluster_controller.go	`63.50% <88.88%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6d84b8...15e58cf. Read the comment docs.

anishakj · 2021-10-18T06:50:48Z

pkg/controller/zookeepercluster/zookeepercluster_controller.go

-	if instance.Spec.Replicas == instance.Status.ReadyReplicas && (!instance.Status.MetaRootCreated) {
+	// instance.Spec.Replicas is just an expected value that we set it, but it maybe not take effect by k8s.
+	// So we should check that instance.Status.Replicas is equal to ReadyReplicas, which means true true status of pods.
+	if instance.Status.Replicas == instance.Status.ReadyReplicas && (!instance.Status.MetaRootCreated) {


If we wont compare with spec.Replicas status wont be shown correctly rt, in the case of scaling

I am afraid replacing instance.Spec.Replicas with instance.Status.Replicas won't work for initial ZK cluster deployment and scale up scenarios.
The very next line logging Cluster is Ready would not be true if we only compare to the status.

Remember, when the cluster is initially created, the pods are created one by one.
If we adopt instance.Status.Replicas == instance.Status.ReadyReplicas condition, then we will be moving to the next step after the first pod is created, not waiting for the rest of the replicas.

Yes, I get it.
This condition only take effect on scale up, so It won't affect shrinking.
Is it more appropriate that judging if the cluster is ready with the condition of
instance.Spec.Replicas == instance.Status.ReadyReplicas and instance.Spec.Replicas == instance.Status.Replicas ?

anishakj · 2021-10-19T09:22:51Z

@stop-coding have one more question. How you made the service of zookeeper cluster to stop? will it be possible to add a test for the same

…make cluster of zookeeper Unrecoverable Signed-off-by: hongchunhua <[email protected]>

…recoverable when zookeeper not running. Signed-off-by: hongchunhua <[email protected]>

Signed-off-by: hongchunhua <[email protected]>

stop-coding · 2021-10-20T01:49:14Z

@stop-coding have one more question. How you made the service of zookeeper cluster to stop? will it be possible to add a test for the same

@anishakj
Delete more than half pods of zookeeper after editing the replicas on zk immediately.
It will also happen that some node which pod is running have network failure when we edit the replicas.

…ator into fix_issue_398 Signed-off-by: hongchunhua <[email protected]>

* add disableFinalizer flag and skip appending finalizers if set to true. Update charts Signed-off-by: Aaron <[email protected]> * set DisableFinalizer in main Signed-off-by: Aaron <[email protected]> * add README Signed-off-by: Aaron <[email protected]> * add UTs Signed-off-by: Aaron <[email protected]>

…ator into fix_issue_398

stop-coding force-pushed the fix_issue_398 branch from 019b3ec to 955f7dc Compare October 14, 2021 08:08

stop-coding closed this Oct 14, 2021

stop-coding reopened this Oct 14, 2021

anishakj reviewed Oct 18, 2021

View reviewed changes

anishakj requested review from maddisondavid and jkhalack October 18, 2021 08:51

hongchunhua added 4 commits October 19, 2021 18:15

Issue 398:if zookeeper is not running, then decreasing replicas will …

4fa237f

…make cluster of zookeeper Unrecoverable Signed-off-by: hongchunhua <[email protected]>

Issue#398:fix the bug that decreasing replicas will make zookeeper un…

c5c8dbe

…recoverable when zookeeper not running. Signed-off-by: hongchunhua <[email protected]>

fix: modify then condition which indicate cluster is reader

02e842b

Signed-off-by: hongchunhua <[email protected]>

add test for scale down

34d265f

Signed-off-by: hongchunhua <[email protected]>

stop-coding force-pushed the fix_issue_398 branch from 9bdf061 to 34d265f Compare October 19, 2021 10:15

Merge branch 'master' into fix_issue_398

15e27a7

fix:add some testcase of scale down

55677aa

stop-coding force-pushed the fix_issue_398 branch 2 times, most recently from 72ac80d to 10102fe Compare October 20, 2021 06:38

Merge branch 'fix_issue_398' of github.com:stop-coding/zookeeper-oper…

a96373d

…ator into fix_issue_398 Signed-off-by: hongchunhua <[email protected]>

stop-coding force-pushed the fix_issue_398 branch from 10102fe to a96373d Compare October 20, 2021 06:42

realAaronWu and others added 2 commits October 20, 2021 17:10

Merge branch 'fix_issue_398' of github.com:stop-coding/zookeeper-oper…

15e58cf

…ator into fix_issue_398

stop-coding closed this Oct 20, 2021

stop-coding deleted the fix_issue_398 branch October 21, 2021 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes issue#398 that decreasing replicas will make zookeeper unrecoverable when zookeeper not running. #399

Fixes issue#398 that decreasing replicas will make zookeeper unrecoverable when zookeeper not running. #399

stop-coding commented Oct 14, 2021 •

edited

Loading

codecov bot commented Oct 14, 2021 •

edited

Loading

anishakj Oct 18, 2021

jkhalack Oct 18, 2021

stop-coding Oct 19, 2021

anishakj commented Oct 19, 2021

stop-coding commented Oct 20, 2021

Fixes issue#398 that decreasing replicas will make zookeeper unrecoverable when zookeeper not running. #399

Fixes issue#398 that decreasing replicas will make zookeeper unrecoverable when zookeeper not running. #399

Conversation

stop-coding commented Oct 14, 2021 • edited Loading

Change log description

Purpose of the change

What the code does

How to verify it

codecov bot commented Oct 14, 2021 • edited Loading

Codecov Report

anishakj Oct 18, 2021

Choose a reason for hiding this comment

jkhalack Oct 18, 2021

Choose a reason for hiding this comment

stop-coding Oct 19, 2021

Choose a reason for hiding this comment

anishakj commented Oct 19, 2021

stop-coding commented Oct 20, 2021

stop-coding commented Oct 14, 2021 •

edited

Loading

codecov bot commented Oct 14, 2021 •

edited

Loading