[occm] Ability to delay node removal #2213

zifeo · 2023-04-23T16:11:41Z

/kind feature

What happened:

Currently, occm monitors nodes with a providerID and deletes the node from the cluster when the corresponding instance in OpenStack is not existing.

What you expected to happen:

In most of our setups, we persist the data of Kubernetes folder (RKE2 in our case) for worst case scenario and/or single master cluster. This means that during an update of the nodes, the OpenStack instance will be destroy and a new one respawned with the same disk/data. The new node cannot then rejoin the cluster as it is identified as a deleted node (e.g. etcd does not accept deleted member).

How to reproduce it:

NA

Anything else we need to know?:

The best implementation would be to have a configurable delay before a node is delete and support the providerID to be updated.

Environment:

openstack-cloud-controller-manager(or other related binary) version: -
OpenStack version: -
Others: -

The text was updated successfully, but these errors were encountered:

mdbooth · 2023-04-24T09:38:41Z

How does this work? Which component is going to force the update of providerID on a node which already exists? The providerID has the property of being unique, so once set it's treated as canonical everywhere I'm aware of. What mechanism would we use to indicate that this should not happen?

Incidentally, anything involving a delay isn't going to be a robust solution. If this use case is accepted then we would need something explicit.

The single-node master case is interesting, though. I wonder if there could be some way to cause the node lifecycle manager to ignore specific nodes, allowing the administrator to delete them manually if required. Looking at the code there doesn't seem to be one currently: https://github.com/kubernetes/kubernetes/blob/08755fe2490beb91b5c2866c3413dec043fd72f9/staging/src/k8s.io/cloud-provider/controllers/nodelifecycle/node_lifecycle_controller.go#L130-L197

This is not an OpenStack-specific issue. I don't think we would address it in occm. There appears to be some related discussion in kubernetes/cloud-provider#35. I would recommend closing this issue and re-opening the discussion there.

zifeo · 2023-04-24T18:09:35Z

@mdbooth Thanks for the feedback. I dig deeper into the topic:

providerID is immutable once set
delay is indeed not robust enough for all potential use cases

Instead, I would suggest that the occm does not remove nodes annotated with occm.openstack.org/automatic-node-removal=false or similar. This needs to be fixed here in occm because occm is the one implementing that deletion logic.

kubernetes/cloud-provider#35 sounds different to me (nodes with the suggested annotations should still be supervised, just not deleted).

jichenjc · 2023-04-25T00:58:47Z

occm.openstack.org/automatic-node-removal=false

I think you propose a way to keep the original node even though the underlying instance is gone (with the flag above = flase)
then after create a new VM instance that corresponding to the node will switch back to true (optional) but overall goal is to
keep the node for further usage and avoid rejoin issue?

The new node cannot then rejoin the cluster as it is identified as a deleted node (e.g. etcd does not accept deleted member).

but I don't fully understand why we need rejoin as deleted member, maybe the new node has exactly same member name etc?
but k8s allows create a new resource with identical name to previous deleted resource, do they?

mdbooth · 2023-04-25T09:03:42Z

@mdbooth Thanks for the feedback. I dig deeper into the topic:

providerID is immutable once set

delay is indeed not robust enough for all potential use cases

Instead, I would suggest that the occm does not remove nodes annotated with occm.openstack.org/automatic-node-removal=false or similar. This needs to be fixed here in occm because occm is the one implementing that deletion logic.

kubernetes/cloud-provider#35 sounds different to me (nodes with the suggested annotations should still be supervised, just not deleted).

We don't actually implement most of the 'architectural' behaviour in occm. For consistency between cloud providers the majority of the controller, including all its high level control flow, is implemented in cloud-provider. We essentially just implement some interfaces that provide cloud-specific behaviour. You can see how that works in the code I linked above. This is the node lifecycle control loop, and you can see that it calls cloud-specific functions via an interface.

This use case does sound interesting, but it's not specific to OpenStack. The logic to implement it would almost certainly have to be in cloud-provider, and would apply to all cloud providers. It might have to extend beyond just 'don't delete this node', because for the cloud provider to continue managing node addresses after the replacement is created we would have to change provider ID, and we'd also need a mechanism to safely do that.

It's also possible that with more discussion at the right level it might be possible to do what you need at a different level.

If you find the right place to ask the question (I suggest trying cloud-provider next, but they may forward you again), I wouldn't be surprised to discover that there's already an accepted way to achieve it, or at least a number of other people with the same requirements.

jichenjc · 2023-04-26T01:38:45Z

It's also possible that with more discussion at the right level it might be possible to do what you need at a different level.

I think this is the right direction and suggestion

zifeo · 2023-04-26T19:34:31Z

but I don't fully understand why we need rejoin as deleted member, maybe the new node has exactly same member name etc?

The root issue is that an excluded etcd member cannot rejoin the same etcd cluster. In short, this disallow to have any persistence at the node replacement level.

I agree this is out of scope for this repo. However going further with that architecture, I am not convinced this is strong enough need to open an upstream issue, I will rather adapt the logic to work around it. Thanks for the comments.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 23, 2023

zifeo closed this as completed Apr 26, 2023

sergelogvinov mentioned this issue May 12, 2024

node-lifecycle-controller: Wait uninitialized nodes kubernetes/kubernetes#124825

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[occm] Ability to delay node removal #2213

[occm] Ability to delay node removal #2213

zifeo commented Apr 23, 2023

mdbooth commented Apr 24, 2023

zifeo commented Apr 24, 2023

jichenjc commented Apr 25, 2023

mdbooth commented Apr 25, 2023

jichenjc commented Apr 26, 2023

zifeo commented Apr 26, 2023

[occm] Ability to delay node removal #2213

[occm] Ability to delay node removal #2213

Comments

zifeo commented Apr 23, 2023

mdbooth commented Apr 24, 2023

zifeo commented Apr 24, 2023

jichenjc commented Apr 25, 2023

mdbooth commented Apr 25, 2023

jichenjc commented Apr 26, 2023

zifeo commented Apr 26, 2023