Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: unable to create a cluster with 3 control plane replicas #383

Open
pli01 opened this issue Oct 29, 2024 · 7 comments
Open

[Bug]: unable to create a cluster with 3 control plane replicas #383

pli01 opened this issue Oct 29, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@pli01
Copy link
Contributor

pli01 commented Oct 29, 2024

What happened

Unable to add 3 control plane with all templates provided in templates or example directory

Only configuration with 1 ctrl plane are working

Step to reproduce

Choose any templates, or default https://github.com/outscale/cluster-api-provider-outscale/blob/main/templates/cluster-template.yaml
Choose any images ubuntu-2204-2204-kubernetes-v1xxxx
Add 3 replicas in control-plane section

Expected to happen

a cluster with 3 ctrl plane

Add anything

Second control plane failed to to join the cluster

...
[  388.990481] cloud-init[1077]: [2024-10-29 14:39:23] {"level":"warn","ts":"2024-10-29T14:39:23.610631Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001f8e00/10.0.4.234:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
[  388.990591] cloud-init[1077]: [2024-10-29 14:39:25] [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[  388.990708] cloud-init[1077]: [2024-10-29 14:39:25] The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[  388.990820] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[  388.990986] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[  388.991056] cloud-init[1077]: [2024-10-29 14:39:58] [kubelet-check] Initial timeout of 40s passed.
[  388.991170] cloud-init[1077]: [2024-10-29 14:41:25] error execution phase control-plane-join/mark-control-plane: error applying control-plane label and taints: nodes "ip-10-0-4-95" not found
[  388.991285] cloud-init[1077]: [2024-10-29 14:41:25] To see the stack trace of this error execute with --v=5 or higher
[  388.991403] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
[  388.991514] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

cluster-api output

# logs capi-controller-manager
I1029 14:36:39.251111       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="484c2cb6-c494-4601-9107
-1c965c79ee2a" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"
I1029 14:44:32.319060       1 machine_controller_phases.go:306] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" recon
cileID="cd3f6a37-ecb7-4354-bdcb-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6
dxf4"
I1029 14:44:32.319170       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="cd3f6a37-ecb7-4354-bdcb
-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"

Environment

- Kubernetes version: (use `kubectl version`): 
- OS (e.g. from `/etc/os-release`):
- Kernel (e.g. `uname -a`): ubuntu
- cluster-api-provider-outscale version: v0.3.1
- cluster-api version: v1.8.4
- Install tools:
- Kubernetes Distribution:
- Kubernetes Distribution version:
@pli01 pli01 added the bug Something isn't working label Oct 29, 2024
@pierreozoux
Copy link
Contributor

I work with @pli01 and I came to same conclusion.

I think it is linked to this issue:
#380

With my tests, when I add a public IP to the first node and/or the second node, at some point, it starts to work.
I didn't maange to find the failing curl :/

@outscale-hmi, I'd love to pair program with you to debug this :)

@rouja
Copy link

rouja commented Nov 5, 2024

Hello,

The bug seems fixed in the main branch. Is-it possible to make an official release ?

@outscale-hmi
Copy link
Contributor

Hello
Yes we will release ASAP
I still have some work on progress regarding the reconcile of the lbu and some test optimization and then we can release

@pli01
Copy link
Contributor Author

pli01 commented Jan 19, 2025

Hello

Waiting for the new release #403

@Joseph94m
Copy link
Contributor

Joseph94m commented Feb 13, 2025

#454 Fixed it for us.

@alistarle
Copy link
Contributor

@pli01 can you retest and we can close this issue ?

@pli01
Copy link
Contributor Author

pli01 commented Feb 15, 2025

Yes , security group is ok
but works only with the latest image of cluster-api-provider-outscale (not the old release version)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

6 participants