-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calico and HCC #641
Comments
When you use the private networks from Hetzner Cloud with hcloud-cloud-controller-manager and enable the routes-controller (default), then you should be able to use Calico without any additional overlay networks. You can configure this in Calico with I have never personally tested this configuration though. |
I am also interested in this topic, if you have any knowledge @medicol69 please let me now :) |
Yes, it works fine with calico. To run a quick test use hetzner-k3s. Important warning when running cloud together when baremetal with private networking. Calico requires a /24 vlan address per node which means when you're creating a subnet make sure the vlan subnet is at minimum a /23 (1 nodes max) or ideally /17 (127 nodes max) allocating first half to cloud instances and the second half to baremetal instances. |
thanks, but I don't think that the hetzner private network interfaces are stable enough to use them in production. If anyone got them to work and give out an example of how to use it in prod I'm all ears. |
I am currently running it just fine with calico and even have ceph working over vlan with pretty good performance. You cannot advertise nodeip with internal so define hostendpoint instead for metrics and etcd to be protected. Load balancers also require you to use public net in this case. |
I am using calico without encapsulation and hccm with routes enabled. Calico uses BPF and replaces kube-proxy. I think this works well, but I haven't tested it enough to be 100% sure. If you have any feedback on this configuration, I would love to discuss it :)
|
I am not sure why, but when using hetzner-k3s the internal network works just fine, however, a manually bootstrapped cluster has an issue with the cloud controller where it does not recognize the internal ip address so it never gets the taint removed and the labels added. I spent few hours trying to figure out why without being able to find any difference between the two configurations. My only guess is that it is some internal order of configuration where the metadata/private network endpoints are not being parsed in order. So to recap: allocate at least /16 vlan range and do not use the hcloud controller (will not be able to use the load balancer or resolve labels automatically). |
What kubernetes version do you use? Kubernetes 1.29 had a change that the node ip will be left empty if cloud-provider is set to external and --node-ip is not set manually. Maybe this is the case here. From CHANGELOG-1.29: |
I was thinking on private networking on hetzner, if anyone is doing that in production please share your config, and what are your experiences. |
I am currently testing this. You can see my calico values above. HCCM configuration is normal with networks enabled. |
I tried both 1.29 and 1.30, here's my init script:
EDIT: added node-ip=$PRIVATE_IP, the configuration before is what I am currently using to get around the issue.
Yes, it does work including networking and routes out of the box when using hetzner-k3s tool. But I had issues with getting HCCM to recognize the nodes when defining an internal ip as the node network when attempting to bootstrap the cluster manually. However, using the public ip works fine (and routes are still created for internal communication). Robot does not support networking from HCCM. |
I am using kubeadm only on hcloud nodes (currently no dedicated / robot nodes, maybe i will add them later) and this works fine. |
Alright, here's the full guide to replicate the issue:
bash init_master.sh test-cluster cluster.local IP_ADDRESS 10.224.0.0 10.222.0.0/16 10.223.0.0/16 10.223.0.10 IP_ADDRESS kubectl config set-context test-cluster Install calico: Create HCCM secret with the network cidr and hcloud token. Install hcloud:
Observe the following error:
edit: the actual name doesn't matter for the hostname since providerid is specified, usually the hostname would be a domain matching the name of the node and the calico step is optional. |
If you see |
Ah, that makes sense! You can't enable robot & network at the same time (refuses to start). However, if you change the label to get it to load it does work fine so it's still a weird limitation. |
What needs to be done to enable route controllers with robot support? Is this generally supported by the underlying network and does the support need to be implemented in the hccm or are there any changes required to the Hetzner Cloud network? (see https://github.com/hetznercloud/hcloud-cloud-controller-manager/blob/main/docs/robot.md#unsupported) Edit: We can move this to a new issue if needed, I am interested in this feature and could try to implement (parts of) it. |
As far as I know the routes table in the network configuration is not compatible with vSwitch. |
But I think it should be possible to use private ip addresses for the nodes (so this currently needs route controller enabled) and vswitch WITHOUT cidr routing. |
Yep, it's possible (with calico at least in VXLANCrossSubnet configuration). I've hacked it to recognize the nodes by setting the label alpha.kubernetes.io/provided-node-ip which was working for a short while before it got updated to the real one and broke pod scheduling. |
I found the following configuration: https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs/resources/network#expose_routes_to_vswitch. |
It's the option here: https://luk.cat/24/9L1WVG.png, but they are not assignable which is required for cni's to function: https://luk.cat/24/LJtyTr.png |
The main problem with Robot & Routing is, that there is no way to get the private IPs of the Robot server through the API (see #676 for an example). IIUC there is also no way to have a Route with the Gateway being a private IP of a Robot server behind the vswitch. It is possible to get the private IP info on the Cloud Servers without using the Routes feature. You need to set |
But is it possible to skip check for robot nodes? These errors are very annoying
|
Which check are you talking about? Do you have Robot nodes in your cluster and |
This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs. |
TL;DR
This is more of an inquiry, since it's not that clear from the documentation, does the hetzner cloud controller work with the Calico CNI when using the private interfaces on Hetzner? Thanks
Expected behavior
this is an inquiry on the documentation.
The text was updated successfully, but these errors were encountered: