Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

post-start silently ignores uncordon failures: should fail fast instead #78

Open
gberche-orange opened this issue Nov 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@gberche-orange
Copy link
Member

Expected behavior

As an operator
In order to notice failure to uncordon nodes at startup
I need bosh job status to surface failure

Current behavior

post-start silently ignores uncordon failures

#uncordon
/var/vcap/packages/k3s/k3s kubectl --kubeconfig=/var/vcap/data/k3s-agent/drain-kubeconfig.yaml uncordon $K3S_NODE_NAME \
>> $JOB_DIR/post-start.log \
2>> $JOB_DIR/post-start-stderr.log

#wait for k8s api to be available, wait for 5 min max
<% if_p('k3s.master_vip_api') do |vip| %>
timeout 300 sh -c 'until nc -z <%= vip %> 6443; do sleep 1; done' /var/vcap/packages/k3s/k3s kubectl --kubeconfig=/var/vcap/store/k3s-server/kubeconfig.yml get pods --all-namespaces
<% end %>
#uncordon
/var/vcap/packages/k3s/k3s kubectl --kubeconfig=/var/vcap/store/k3s-server/kubeconfig.yml uncordon $K3S_NODE_NAME

@poblin-orange poblin-orange added the bug Something isn't working label Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants