-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud-controller-manager should be able to ignore nodes #35
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
/lifecycle frozen |
@andrewsykim There are some scenarios that ccm should ignore nodes. eg. virtual-kubelet , edge node, datacenter nodes in hybrid cluster . |
@timoreimann I recall having a conversation about supporting multiple CCMs in a cluster, this is somewhat related. Are you interested in doing this work? |
@andrewsykim yes, I'm very interested as it'd help us at DigitalOcean to ease testing. Though my intent is to go beyond just nodes and include load balancers as well. kubernetes/kubernetes#88820 is the ticket I filed for the wider purpose, and kubernetes/kubernetes#88820 (comment) has the summary of our discussion in one of the SIG meetings. Feel free to assign me to either / all tickets. |
hi everybody. I'm also looking at the way how to ignore some nodes on AWS. May I ask, do you know the solution to that? |
AFAIK there's still none! |
Bumping this up as it hasn't seen any love in a while. This is super useful to my company, as we would like to be able to operate hybrid clusters (openstack and bare-metal in our case) while still being able to use cloud-controller-manager. I'd be happy to contribute to this effort, just don't know where to start. A KEP, perhaps? |
It comes down to how to identify when a node is owned by which CCM. AWS has some notion that nodes should be prefixed either with It may be that a KEP is needed to introduce a flag to kubelet that will add something to the created node object hinting at what CCM should own it, and that all CCMs then implement support for ignoring the hint if set to another value than itself. Not too unrelated to this is the ability to run multiple AWS CCMs for having nodes in multiple regions or accounts. |
That's a good point, it does mesh really nicely with allowing multiple CCMs (AWS or otherwise) to manage a single cluster. I was more approaching the idea of having an annotation on a node that indicates which CCM it should belong to, but we'd need a reproducible(?) way to identify CCMs... could be done as a simple argument to the CCM, or...? |
Sounds like something similar to LoadBalancerClass and IngressClass. |
Yeah, feels very similar. I like that parallel a lot. |
Hi all, any update on this issue? |
How are people doing multi-cloud kubernetes clusters without this solved?
Why attempt to do it based on node name? Instead do it based on label or annotation:
|
I don't think the underlying machine should be trusted to set this correctly for the same reason other k8s-namespaced labels are not allowed. It should be done by the provisioning/installer mechanism that handles things like the role labels.
If one would want multi-region AWS, one would need multiple AWS CCMs, so this doesn't quite work. But some similar flag certainly. |
How i solved this issue:
I did not try to use routing/loadbalancing thought kubernetes resources. And I think, it will be very complicated. |
Interesting idea @sergelogvinov; I'm already using talos so trying to figure out how that would work.
Looking at least the aws v2 code But in the cloud-node-lifecycle controller, doesn't it proceed to delete the node as soon as
Do you actually have this working? |
I did not try AWS, this is in my nearest plan-list. |
Nevermind the v2 code in AWS CCM. That one is on ice and probably should be removed. But I doubt v1 is any better. But I am happy to support changes in this direction. However, the more generic support (the mentioned flag and logic for whether the CCM interface is being interacted with) should be added to this repos. If we are lucky, it might be that all CCMs using this lib doesn't need any changes then. |
oh? I didn't realise it wasn't ready for use. Could you share some info on that?
Indeed. if the instance is not found for the current cloud provider then cloud-provider/controllers/nodelifecycle/node_lifecycle_controller.go Lines 235 to 236 in 97fdc45
The logic of |
v2 was an idea to make CCM more modern using CRDs for configuration and such. But as you can see from the git history, pretty much nothing has happened to it, while v1 is more actively maintained. AWS CCM should absolutely be used in favour of the in-tree provider. kOps has been using by default since 1.24.
I am thinking this should not be called if the node has a different label/class than what's passed in the flag. |
I have been thinking about this issue as I currently run an on prem with vultr setup and every so often the vultr ccm deletes all the on prem nodes. I want to add additional providers / regions. I think core issue comes down to node provance and attestation of the in the node lifecycle controller. Ie can it trust node supplied data about being in cloud and managed via another ccm or manually configured and to be left alone. This lead me to thinking of using spire/spiffe attestation however if pre joined spire-agent may not be deployed and creating a hard dependency on spiffe might not fit all environments... However attaching information to the node object and validating it I think can be done with existing machinery of addmission/validation webhook with object filtering (ie route delete requests via a finializer) possibly hijacking the token review to validate the node object was modified by a trusted ccm Thoughts? |
I’ve got a stretch cluster running across AWS and on-prem. The AWS CCM keeps deleting the on-prem nodes when they join to the cluster because they’re not part of the cloud provider. Currently, I'm patching the AWS CCM DaemonSet to stop the CCM from running in the cluster when an on-prem node wants to joins the cluster then re-patch to start the CCM back in the cluster. e.g., # Stop CCM
kubectl -n kube-system patch daemonset aws-cloud-controller-manager -p '{"spec": {"template": {"spec": {"nodeSelector": {"non-existing": "true"}}}}}'
# Start CCM
kubectl -n kube-system patch daemonset aws-cloud-controller-manager --type json -p='[{"op": "remove", "path": "/spec/template/spec/nodeSelector/non-existing"}]' |
Continuing the discussion from kubernetes/kubernetes#73171, the CCM should have a mechanism to "ignore" a node in a cluster, either because it doesn't belong to a cloud provider or is not a node in the traditional sense (e.g. virtual kubelet). See the PR for more discussion
The text was updated successfully, but these errors were encountered: