Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Github Nightly Tests. Ensure Nodes are spread across AZs. #2787

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

orsenthil
Copy link
Member

Github Nightly tests were failing on Node AZ presence test introduced recently for static canary. One way to ignore these tests in Github runs. However, just adding AZ subnet requirements will spread the nodes across AZs and can satisfy the test requirements.

This will modify the test cluster used nightly runs and ensure nodes are present across AZs.

@orsenthil orsenthil requested a review from a team as a code owner February 6, 2024 20:37
group: amazon-vpc-cni-k8s-x86
availabilityZones:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With only 3 nodes, won't we still have the issue of a node missing in one AZ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is these only carve out the subnets; and node groups will pick in the linear order from each subnet. Since we have 2 node groups with 3 nodes each, I was assuming if it will satisfy and I wanted to test this.

If it did not, then the desirable nodes need to be set to 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the node groups is for ARM64, while the other is for AMD64, so only one will ever be rendered. Instead of setting desirable nodes to 4, can we just remove one of the AZs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove one of the AZs; the static canary tests will fail - The test wanted one node in each AZ.

One of the node groups is for ARM64, while the other is for AMD64, so only one will ever be rendered.

The cluster config has two node groups, I assume two (one with ARM and another with AMD) will be created). ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am testing some changes locally. Will update the PR once with the results of the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My theory was correct. This config will launch two managed nodes (one with ARM and one with AMD); and the nodes will be across AZs.

kubectl get nodes -o wide | cut -d " " -f1 | xargs kubectl describe node |grep topology| grep zone |uniq
                    topology.kubernetes.io/zone=us-west-2a
                    topology.kubernetes.io/zone=us-west-2d
                    topology.kubernetes.io/zone=us-west-2a
                    topology.kubernetes.io/zone=us-west-2b
                    topology.kubernetes.io/zone=us-west-2c

Cluster Create log

2024-02-06 13:03:47 [ℹ]  subnets for us-west-2a - public:192.168.0.0/19 private:192.168.128.0/19
2024-02-06 13:03:47 [ℹ]  subnets for us-west-2b - public:192.168.32.0/19 private:192.168.160.0/19
2024-02-06 13:03:47 [ℹ]  subnets for us-west-2c - public:192.168.64.0/19 private:192.168.192.0/19
2024-02-06 13:03:47 [ℹ]  subnets for us-west-2d - public:192.168.96.0/19 private:192.168.224.0/19
2024-02-06 13:03:47 [ℹ]  nodegroup "cni-test-arm64-mng" will use "" [AmazonLinux2/1.28]
2024-02-06 13:03:47 [ℹ]  nodegroup "cni-test-x86-mng" will use "" [AmazonLinux2/1.28]
2024-02-06 13:03:47 [ℹ]  using Kubernetes version 1.28
2024-02-06 13:03:47 [ℹ]  creating EKS cluster "senthilx-test-cluster" in "us-west-2" region with managed nodes
2024-02-06 13:03:47 [ℹ]  2 nodegroups (cni-test-arm64-mng, cni-test-x86-mng) were included (based on the include/exclude rules)
...
2 sequential tasks: { create cluster control plane "senthilx-test-cluster",
    2 sequential sub-tasks: {
        4 sequential sub-tasks: {
            wait for control plane to become ready,
            associate IAM OIDC provider,
            2 sequential sub-tasks: {
                create IAM role for serviceaccount "kube-system/aws-node",
                create serviceaccount "kube-system/aws-node",
            },
            restart daemonset "kube-system/aws-node",
        },
        2 parallel sub-tasks: {
            create managed nodegroup "cni-test-arm64-mng",
            create managed nodegroup "cni-test-x86-mng",
        },
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it, I did not realize that, thanks for confirming. Approving now

@jdn5126 jdn5126 merged commit b6734b0 into aws:master Feb 6, 2024
4 checks passed
@orsenthil
Copy link
Member Author

Nightly Tests are successful after this change was merged - https://github.com/aws/amazon-vpc-cni-k8s/actions/runs/7809321113

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants