-
Notifications
You must be signed in to change notification settings - Fork 753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Github Nightly Tests. Ensure Nodes are spread across AZs. #2787
Conversation
group: amazon-vpc-cni-k8s-x86 | ||
availabilityZones: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With only 3 nodes, won't we still have the issue of a node missing in one AZ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is these only carve out the subnets; and node groups will pick in the linear order from each subnet. Since we have 2 node groups with 3 nodes each, I was assuming if it will satisfy and I wanted to test this.
If it did not, then the desirable nodes need to be set to 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the node groups is for ARM64, while the other is for AMD64, so only one will ever be rendered. Instead of setting desirable nodes to 4, can we just remove one of the AZs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove one of the AZs; the static canary tests will fail - The test wanted one node in each AZ.
One of the node groups is for ARM64, while the other is for AMD64, so only one will ever be rendered.
The cluster config has two node groups, I assume two (one with ARM and another with AMD) will be created). ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am testing some changes locally. Will update the PR once with the results of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My theory was correct. This config will launch two managed nodes (one with ARM and one with AMD); and the nodes will be across AZs.
kubectl get nodes -o wide | cut -d " " -f1 | xargs kubectl describe node |grep topology| grep zone |uniq
topology.kubernetes.io/zone=us-west-2a
topology.kubernetes.io/zone=us-west-2d
topology.kubernetes.io/zone=us-west-2a
topology.kubernetes.io/zone=us-west-2b
topology.kubernetes.io/zone=us-west-2c
Cluster Create log
2024-02-06 13:03:47 [ℹ] subnets for us-west-2a - public:192.168.0.0/19 private:192.168.128.0/19
2024-02-06 13:03:47 [ℹ] subnets for us-west-2b - public:192.168.32.0/19 private:192.168.160.0/19
2024-02-06 13:03:47 [ℹ] subnets for us-west-2c - public:192.168.64.0/19 private:192.168.192.0/19
2024-02-06 13:03:47 [ℹ] subnets for us-west-2d - public:192.168.96.0/19 private:192.168.224.0/19
2024-02-06 13:03:47 [ℹ] nodegroup "cni-test-arm64-mng" will use "" [AmazonLinux2/1.28]
2024-02-06 13:03:47 [ℹ] nodegroup "cni-test-x86-mng" will use "" [AmazonLinux2/1.28]
2024-02-06 13:03:47 [ℹ] using Kubernetes version 1.28
2024-02-06 13:03:47 [ℹ] creating EKS cluster "senthilx-test-cluster" in "us-west-2" region with managed nodes
2024-02-06 13:03:47 [ℹ] 2 nodegroups (cni-test-arm64-mng, cni-test-x86-mng) were included (based on the include/exclude rules)
...
2 sequential tasks: { create cluster control plane "senthilx-test-cluster",
2 sequential sub-tasks: {
4 sequential sub-tasks: {
wait for control plane to become ready,
associate IAM OIDC provider,
2 sequential sub-tasks: {
create IAM role for serviceaccount "kube-system/aws-node",
create serviceaccount "kube-system/aws-node",
},
restart daemonset "kube-system/aws-node",
},
2 parallel sub-tasks: {
create managed nodegroup "cni-test-arm64-mng",
create managed nodegroup "cni-test-x86-mng",
},
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it, I did not realize that, thanks for confirming. Approving now
Nightly Tests are successful after this change was merged - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/7809321113 |
Github Nightly tests were failing on Node AZ presence test introduced recently for static canary. One way to ignore these tests in Github runs. However, just adding AZ subnet requirements will spread the nodes across AZs and can satisfy the test requirements.
This will modify the test cluster used nightly runs and ensure nodes are present across AZs.