Modified from comments in #3471
This should be the easiest.
You can use --limit=node1
to limit Kubespray to avoid disturbing other nodes in the cluster.
Before using --limit
run playbook facts.yml
without the limit to refresh facts cache for all nodes.
kubectl drain NODE_NAME
With the old node still in the inventory, run remove-node.yml
. You need to pass -e node=NODE_NAME
to the playbook to limit the execution to the node being removed.
That's it.
For some reason, Kubespray will not update the apiserver certificate.
Edit /etc/kubernetes/kubeadm-config.yaml
, include new host in certSANs
list.
Use kubeadm to recreate the certs.
cd /etc/kubernetes/ssl
mv apiserver.crt apiserver.crt.old
mv apiserver.key apiserver.key.old
cd /etc/kubernetes
kubeadm init phase certs apiserver --config kubeadm-config.yaml
Check the certificate, new host needs to be there.
openssl x509 -text -noout -in /etc/kubernetes/ssl/apiserver.crt
Add the new host to the inventory and run cluster.yml.
In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted in order to reload.
# run in every host
docker ps | grep k8s_nginx-proxy_nginx-proxy | awk '{print $1}' | xargs docker restart
If you are replacing a node, remove the old one from the inventory, and remove from the cluster runtime.
kubectl drain NODE_NAME
kubectl delete node NODE_NAME
After that, the old node can be safely shutdown. Also, make sure to restart nginx-proxy in all remaining nodes (step 3)
From any active master that remains in the cluster, re-upload kubeadm-config.yaml
kubeadm config upload from-file --config /etc/kubernetes/kubeadm-config.yaml
You need to make sure there are always an odd number of etcd nodes in the cluster. In such a way, this is always a replace or scale up operation. Either add two new nodes or remove an old one.
Update the inventory and run cluster.yml
passing --limit=etcd,kube-master -e ignore_assert_errors=yes
.
Run upgrade-cluster.yml
also passing --limit=etcd,kube-master -e ignore_assert_errors=yes
. This is necessary to update all etcd configuration in the cluster.
At this point, you will have an even number of nodes. Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node. Even so, running applications should continue to be available.
With the node still in the inventory, run remove-node.yml
passing -e node=NODE_NAME
as the name of the node that should be removed.
In each etcd host that remains in the cluster:
cat /etc/etcd.env | grep ETCD_INITIAL_CLUSTER
Only active etcd members should be in that list.
Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member.
# list all members
etcdctl member list
# remove old member
etcdctl member remove MEMBER_ID
# careful!!! if you remove a wrong member you will be in trouble
# note: these command lines are actually much bigger, since you need to pass all certificates to etcdctl.
In every master node, edit /etc/kubernetes/manifests/kube-apiserver.yaml
. Make sure only active etcd nodes are still present in the apiserver command line parameter --etcd-servers=...
.