Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kindnet network plugin #17158

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add kindnet network plugin #17158

wants to merge 3 commits into from

Conversation

aojea
Copy link
Member

@aojea aojea commented Dec 29, 2024

Kindnet has been running in Kubernetes CI for a while and there are some people that uses it, I've been adding new features like dns cache or kernel bypass or admin network policies that are not present in all the other common cnis.

Integration with kops will help to improve its testing and also benefits users that are looking for a more minimalistic CNI plugin

https://github.com/aojea/kindnet

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 29, 2024
@k8s-ci-robot k8s-ci-robot added area/addons cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/api area/documentation size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 29, 2024
@aojea
Copy link
Member Author

aojea commented Dec 29, 2024

/assign @justinsb @hakman

@aojea
Copy link
Member Author

aojea commented Dec 30, 2024

Ok, it is working now

 kops validate cluster --wait 10m
Using cluster from kubectl context: myclustername.kindnet.io

Validating cluster myclustername.kindnet.io

I1230 18:02:56.188347 3334459 gce_cloud.go:307] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-c     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-c             Node            e2-medium       1       1       us-central1

NODE STATUS
NAME                                    ROLE            READY
control-plane-us-central1-c-bw8j        control-plane   True
nodes-us-central1-c-3rkx                node            True

Your cluster myclustername.kindnet.io is ready

@aojea aojea changed the title [WIP] add kindnet network plugin add kindnet network plugin Dec 31, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 31, 2024
@aojea
Copy link
Member Author

aojea commented Dec 31, 2024

failed job is unrelated

Kubernetes e2e suite: [It] [sig-storage] In-tree Volumes [Driver: hostPathSymlink] [Testpattern: Inline-volume (default fs)] subPath should support readOnly directory specified in the volumeMount expand_less

This is ready for review

@hakman hakman changed the title add kindnet network plugin Add kindnet network plugin Dec 31, 2024
Copy link
Member

@hakman hakman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few comments. Thanks @aojea!
I will send a PR for a pre-submit for kindnet.

cmd/kops/integration_test.go Outdated Show resolved Hide resolved
pkg/apis/kops/networking.go Show resolved Hide resolved
Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman ?

pkg/model/components/kindnet.go Outdated Show resolved Hide resolved
@aojea
Copy link
Member Author

aojea commented Dec 31, 2024

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman ?

kindnet is already used in CI for other kubernetes jobs and projects https://grep.app/search?q=aojea/kindnet so breaking compatibility within kindnet is something I don't expect, however, how to get the better integration with Kops is something I need your advice

@hakman
Copy link
Member

hakman commented Dec 31, 2024

We haven't used it in a while but we may want to put this behind a kops feature gate to make it clear that it is experimental and we may make breaking changes. WDYT @hakman?

I think we cab skip the flag, as long as we add a warning in the doc file.
Also, may be a good idea to add a mention here:
https://kops.sigs.k8s.io/networking/#supported-networking-options

@aojea
Copy link
Member Author

aojea commented Jan 3, 2025

finally progress , thanks @rifelpet for the tip, it was the srcDst check thing

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kops/17158/pull-kops-e2e-cni-kindnet/1875096705037242368 now to debug those 3 test failing :)

@hakman
Copy link
Member

hakman commented Jan 3, 2025

finally progress , thanks @rifelpet for the tip, it was the srcDst check thing

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kops/17158/pull-kops-e2e-cni-kindnet/1875096705037242368 now to debug those 3 test failing :)

The CSI test may be a flake, it's quite common.
For the rest of tests to pass, I think you need to rebase and re-run hack/update-expected.sh

@aojea
Copy link
Member Author

aojea commented Jan 3, 2025

For the rest of tests to pass,

the 2 network tests Services should be able to handle large requests: should not fail ... I want to investigate them

pkg/model/iam/iam_builder.go Outdated Show resolved Hide resolved
@hakman
Copy link
Member

hakman commented Jan 3, 2025

For the rest of tests to pass,

the 2 network tests Services should be able to handle large requests: should not fail ... I want to investigate them

I meant the non-e2e test failures 😄

@hakman
Copy link
Member

hakman commented Jan 3, 2025

/test pull-kops-e2e-cni-kindnet

@aojea
Copy link
Member Author

aojea commented Jan 3, 2025

other pod scheduling timeout error

/test pull-kops-e2e-cni-kindnet

@aojea
Copy link
Member Author

aojea commented Jan 4, 2025

/test pull-kops-e2e-cni-kindnet

@aojea
Copy link
Member Author

aojea commented Jan 4, 2025

/test pull-kops-e2e-cni-kindnet

@hakman
Copy link
Member

hakman commented Jan 4, 2025

/test pull-kops-e2e-cni-kindnet

@aojea
Copy link
Member Author

aojea commented Jan 4, 2025

ok, fastpath ruled out, it keeps failing the same two tests

I0104 17:16:15.391703 1 main.go:284] Skipping fastpathAgent

now to try to understand what is the problem, it does not fail in other amazon jobs https://testgrid.k8s.io/amazon-ec2-al2023#Conformance%20-%20EC2/EKS/AL2023%20-%20master

@hakman
Copy link
Member

hakman commented Jan 5, 2025

now to try to understand what is the problem, it does not fail in other amazon jobs https://testgrid.k8s.io/amazon-ec2-al2023#Conformance%20-%20EC2/EKS/AL2023%20-%20master

In AWS jobs the focus is on Conformance. The 2 tests that fail are not running.

@aojea
Copy link
Member Author

aojea commented Jan 5, 2025

there are some jobs that replicate the gce ones and should be running those tests, but it seems that those are skipped https://testgrid.k8s.io/amazon-ec2#ec2-arm64-ubuntu-master-containerd&include-filter-by-regex=%20be%20able%20to%20handle%20large%20requests%3A ... I need to investigate more, revisting the test logic the connectivity seems to work, otherwise it should log an error IIUIC, is just that there is no message received 🤔

@hakman
Copy link
Member

hakman commented Jan 6, 2025

@aojea Not sure if Kindnet need the CNI Network Plugin binaries.
If it doesn't require those binaries in /opt/cni/bin/ please add Kindness to this function:

func (c *Cluster) InstallCNIAssets() bool {
return c.Spec.Networking.AmazonVPC == nil &&
c.Spec.Networking.Calico == nil &&
c.Spec.Networking.Cilium == nil
}

Besides the few remaining nits and the 3 failing tests, I think this is pretty good as it is.
The 3 tests can also be fixed at a later time, as Kindnet is marked as experimental.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hakman. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member Author

aojea commented Jan 7, 2025

@aojea Not sure if Kindnet need the CNI Network Plugin binaries. If it doesn't require those binaries in /opt/cni/bin/ please add Kindness to this function:

func (c *Cluster) InstallCNIAssets() bool {
return c.Spec.Networking.AmazonVPC == nil &&
c.Spec.Networking.Calico == nil &&
c.Spec.Networking.Cilium == nil
}

Besides the few remaining nits and the 3 failing tests, I think this is pretty good as it is. The 3 tests can also be fixed at a later time, as Kindnet is marked as experimental.

great, I will address last comments and work on this today

/test pull-kops-e2e-cni-kindnet

@aojea aojea force-pushed the kindnet branch 2 times, most recently from 2bdbd59 to a36c140 Compare January 7, 2025 15:18
@aojea
Copy link
Member Author

aojea commented Jan 7, 2025

/test pull-kops-e2e-cni-kindnet

get more networking information usefult to troubleshoot network issues.
aojea added 2 commits January 7, 2025 18:22
add kindnet as an experimental network addon

containerd adds the requirement to use the loopback cni plugin,
kindnet provides that capability and containerd does not require it
since containerd/containerd/pull/10238
Change-Id: I2db75ddc530d16d49da17744291dd79a697c81aa
@k8s-ci-robot
Copy link
Contributor

@aojea: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kops-e2e-cni-cilium-eni 3827c06 link true /test pull-kops-e2e-cni-cilium-eni
pull-kops-e2e-cni-kindnet 3827c06 link false /test pull-kops-e2e-cni-kindnet

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/addons area/api area/documentation area/nodeup cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants