Ensure high availability with 2 regions #11012
-
Hello, I have a questions on how to configure my Kafka Strimzi cluster to ensure high availability. The client I'm working with has a 2 zones, A and B. The Kubernetes cluster has node on both of these regions. However, the specificity of this cluster is that the storage is bound to a zone, meaning that one PVC on zone A cannot be assigned on zone B (underneath, there is 2 distinct vcenter with isolated datastore) How can I configure my Strimzi Kafka cluster with 3 Zookeeper (I haven't migrated to Kraft yet, but I guess this would be the same to achieve a quorum ?) to ensure that, if I lost a region, I can still achieve a valid quorum ? From what I've tried yet, the zookeeper repartition is 2/1 on zone A/B, if I lose the region A, the only zookeeper left cannot take leadership, and if I lose the region B, the 2 zookeeper left can't decide which node should take leadership since it is an odd number. I can accept manual intervention but if an automated solution is possible, it would be better. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
In short, you cannot have fully automatic availability and reliability with 2 zones only. This applies to ZooKeeper, KRaft or the Kafka brokers themselves. You might always need some manual interventions in some scenarios. In the particular scenario you described:
There should be no issue when you lose the zone with 1 ZooKeeper node. The zone with 2 ZooKeeper nodes will be able to elect the leader and continue to work. But if you lose the zone with 2 ZooKeeper nodes, you will have a problem, because:
So you might need to intervene manually for example by manually moving the nodes to the remaining zone (for example by deleting the old unavailable nodes and starting new ones in the remaining node etc.). I do not think KRaft will give you any advantage here -> the only real difference will be that the KafkaNodePools would make it easier to schedule the nodes and decide what zone should they be in. The same problem also applies to Kafka brokers (regardless of ZooKeeper or KRaft). You can use something like replication factor 4 and min in-sync replicas set to 3 and spread the replicas across the zones. That will make sure you will have at least one in-sync replica in each zone. But if you lose a zone, you might need to change the min in-sync replicas to add some new replicas in the remaining zone. Or you can use a smaller replication factor, but that will not give you the no-message-loss guarantee. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your answer ! I tried what you suggested and with a manual intervention to add new Zookeeper nodes, I'm able to get back my Kafka cluster with Zookeeper nodes after loosing a zone. For the Kafka broker, I have set a replication factor of 3 and a min in-sync replica of 1, this seems to be ok when 2 brokers are unavailable, but I have to test it again and might make some adjustments on this. |
Beta Was this translation helpful? Give feedback.
In short, you cannot have fully automatic availability and reliability with 2 zones only. This applies to ZooKeeper, KRaft or the Kafka brokers themselves. You might always need some manual interventions in some scenarios.
In the particular scenario you described:
There should be no issue when you lose the zone with 1 ZooKeeper node. The zone with 2 ZooKeeper nodes will be able to elect the leader and continue to work. But i…