Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroutable worker node #3230

Closed
LavredisG opened this issue Dec 1, 2024 · 11 comments
Closed

Unroutable worker node #3230

LavredisG opened this issue Dec 1, 2024 · 11 comments
Assignees
Labels
bug Something isn't working need-info

Comments

@LavredisG
Copy link

LavredisG commented Dec 1, 2024

What happened:
I have a setup of 3 local kind clusters: host (1 node), member1 (1 node), member2 (2 nodes), where host acts as the broker. Commands subctl show all and subctl diagnose all seem to be working fine, stating that all connections are established (however member2 cluster somehow fails on the Checking that gateway metrics are accessible from non-gateway nodes step of the diagnosing once I try to curl from member1 to member2, however uninstalling submariner and rejoining the cluster to the broker will get rid of this error on diagnosing, only for it to show once I curl again). I have created a service on member2 to expose a 1-pod deployment, but when I try to curl it from member1, no results are returned (name resolution works). However, after further investigation I noticed that this only happened if the pod was deployed on the worker node of the member2 cluster (non-gateway node). If I manually assign it to control-plane node (gateway), then I can curl it as expected from member1.

What you expected to happen:
I would expect to be able to curl the service from member1 after exposing it on member2 regardless of the underlying pod being deployed on a gateway node or not.

How to reproduce it (as minimally and precisely as possible):
Create 3 kind clusters with non-overlapping cidrs, deploy nginx on member2 (be sure for the pod to be assigned on the non gateway node), create a service for it and expose it. Create a simple pod in member1 that can curl.

Anything else we need to know?:

image

image

@LavredisG LavredisG added the bug Something isn't working label Dec 1, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Backlog Dec 3, 2024
@dfarrell07
Copy link
Member

@LavredisG Can you add the subctl diagnose and subctl gather outputs? Which CNI are you using?

@LavredisG
Copy link
Author

The first screenshot appended in the issue creation is the failing part of the subctl diagnose all command, anything else was green ticked. I am using kindnet as CNI. Here is the subctl gather output.

image
image
image
image

@yboaron
Copy link
Contributor

yboaron commented Dec 11, 2024

Hi,

Does things work for you when you use 2 clusters?
Can you try deploying Kind clusters using Submariner and check if it helps?

@LavredisG
Copy link
Author

Submariner will use https://github.com/submariner-io/shipyard/blob/devel/scripts/shared/lib/clusters_kind to create the clusters if I am not mistaken, but I have different yaml files for my clusters specific to my use case, so I wouldn't want to use the test clusters.

@yboaron
Copy link
Contributor

yboaron commented Dec 15, 2024

What CNI do you use?
can you elaborate on your Kind configuration?

@LavredisG
Copy link
Author

What CNI do you use? can you elaborate on your Kind configuration?

kindnet used by default when creating kind clusters. What info would further help you?

@yboaron
Copy link
Contributor

yboaron commented Dec 16, 2024

Ack,

It seems like some environment configuration error.

A. Can you check if you get the same behavior when deploying Kind clusters using Submariner ?,we'll have a reference point
B. I usually apply [1] script on my host before deploying Submariner on Kind, can you check it?

[1]
`sudo setenforce 0
sudo sysctl fs.inotify.max_user_watches=524288
sudo sysctl fs.inotify.max_user_instances=512

sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo dnf install -y iptables-services
sudo touch /etc/sysconfig/iptables
sudo touch /etc/sysconfig/ip6tables
sudo systemctl start iptables
sudo systemctl start ip6tables
sudo systemctl enable iptables
sudo systemctl enable ip6tables
sudo iptables -t filter -F
sudo iptables -t filter -X
sudo sysctl net.bridge.bridge-nf-call-iptables=0
sudo sysctl net.bridge.bridge-nf-call-arptables=0
sudo sysctl net.bridge.bridge-nf-call-ip6tables=0
sudo systemctl restart docker`

@LavredisG
Copy link
Author

If I have to recreate my clusters I could try this, atm I can't do that because it would take down my whole setup.

@dfarrell07
Copy link
Member

@LavredisG Any update? Thanks!

@LavredisG
Copy link
Author

Not yet, I can close this issue if you want to and re-open it when needed again.

@dfarrell07
Copy link
Member

@LavredisG Ok, I'll close this one. Please do re-open or open a new one if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need-info
Projects
Status: Backlog
Development

No branches or pull requests

3 participants