Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to allocate ip to pod #2814

Closed
songhohoon opened this issue Feb 27, 2024 · 7 comments
Closed

failed to allocate ip to pod #2814

songhohoon opened this issue Feb 27, 2024 · 7 comments
Labels

Comments

@songhohoon
Copy link

songhohoon commented Feb 27, 2024

What happened:

pod stuck in init or container creating status.

Attach logs

sent log file to [email protected] with email [email protected]

Events:
  Type     Reason                     Age                   From                     Message
  ----     ------                     ----                  ----                     -------
  Normal   Scheduled                  49m                   default-scheduler        Successfully assigned watch/watch-api-79574c44db-klk7z to ip-10-8-58-221.ap-northeast-2.compute.internal
  Warning  FailedCreatePodSandBox     49m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5499422a7c0dd169f1600782ef8b49976c15fa11caad0577706595922ab685e1": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     49m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4595c377d1f382e07dc64fd2a6919f3a6f5dea69c3930c176fe20abf31a0ce16": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     49m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c12bc8fba125d47b071e61ba2184adb1b0acff961d0f21d6702ad6ea3e74ec42": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     48m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "03574e55f179fa15cbcc8023f135ff90f7d4685a36128ffe40376baf0b438d09": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     48m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0309c2cf07a3484c2d011dab35a5774a48a23a1da1172aac2a46ae04347ccac": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     48m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3146fff70aa3f8441385d5636db9c0aa707c117b5ea128a834989481b8ab2ef6": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     47m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "198c432e48b115c50c802f59c724e622d611ec4ff4bd4508e59a53a110b59bf2": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     47m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3ab9b2a117619c880b6e0e00a94d065b79802e25833bbe3e881c82837be34837": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Warning  FailedCreatePodSandBox     47m                   kubelet                  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "71660542696f0e8ab9eb37a3d6d5b5e77692b3646f4f258e0f6542b708dac4a5": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  Normal   SecurityGroupRequested     5m29s (x21 over 49m)  vpc-resource-controller  Pod will get the following Security Groups [sg-0b91edea7758a450e sg-038ae5f385bc8e045 sg-062ebf6eaba912cb5 sg-0a3384ea78dc93e15 sg-05a98d50e24a79f41]
  Warning  BranchENIAnnotationFailed  5m28s (x21 over 49m)  vpc-resource-controller  failed to annotate pod with branch ENI details: Pod "watch-api-79574c44db-klk7z" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{
    Volumes:        {{Name: "aws-iam-token", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{Audience: "sts.amazonaws.com", ExpirationSeconds: 86400, Path: "token"}}}, DefaultMode: &420}}}, {Name: "apmsocketpath", VolumeSource: {HostPath: &{Path: "/var/run/datadog/", Type: &""}}}, {Name: "heapdumps", VolumeSource: {EmptyDir: &{}}}, {Name: "kube-api-access-6vsfs", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
    InitContainers: nil,
    Containers: []core.Container{
      {
        ... // 13 identical fields
        ReadinessProbe:           &{ProbeHandler: {HTTPGet: &{Path: "/actuator/info", Port: {IntVal: 8080}, Scheme: "HTTP"}}, InitialDelaySeconds: 180, TimeoutSeconds: 2, PeriodSeconds: 5, ...},
        StartupProbe:             nil,
-       Lifecycle:                nil,
+       Lifecycle:                &core.Lifecycle{PreStop: &core.LifecycleHandler{Exec: &core.ExecAction{Command: []string{...}}}},
        TerminationMessagePath:   "/dev/termination-log",
        TerminationMessagePolicy: "File",
        ... // 5 identical fields
      },
    },
    EphemeralContainers: nil,
    RestartPolicy:       "Always",
    ... // 28 identical fields
  }
  Warning  FailedCreatePodSandBox  4m26s (x196 over 47m)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "303a2d86590b842a1fea740469947b598964a911914d533558c82c51710bd05f": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

What you expected to happen: I expected pod created generally

How to reproduce it (as minimally and precisely as possible):

  • schedule node group asg start at work time and stop at after work time
  • pod will try to create at node start up
  • it happens some of pods.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.27.9-eks-5e0fdde
  • CNI Version : v1.15.1
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): Linux ip-10-8-58-221.ap-northeast-2.compute.internal 5.10.199-190.747.amzn2.x86_64 Initial commit of amazon-vpc-cni-k8s #1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
@songhohoon songhohoon added the bug label Feb 27, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Feb 27, 2024

@songhohoon from:

  Warning  BranchENIAnnotationFailed  5m28s (x21 over 49m)  vpc-resource-controller  failed to annotate pod with branch ENI details: Pod "watch-api-79574c44db-klk7z" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)

it looks like the VPC Resource Controller (https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/provider/branch/provider.go#L385) failed to annotate the pod with a branch ENI.

Based on the error message from the k8s API call, it sounds like this patch operation was blocked. Are you installing any pod validation or admission webhooks in your cluster? Are you running any tools that are modifying the ClusterRole objects installed by EKS? Have you ever had this Security Groups for Pods solution working?

@songhohoon
Copy link
Author

hi. @jdn5126
thanks for reply.

Are you installing any pod validation or admission webhooks in your cluster?
-> yes. I installed kyverno and using it for pod validation and mutate some config like prestop hook.

Are you running any tools that are modifying the ClusterRole objects installed by EKS?
-> no, I don't.

Have you ever had this Security Groups for Pods solution working?
-> yes. I'am using SGP(Security Groups for Pod) for most of my pod.

additional info
When this situation occurs, it is usually resolved by observing it and deleting and recreating the pod. However, if I leave it without deleting it, the situation will persist.

@jdn5126
Copy link
Contributor

jdn5126 commented Feb 28, 2024

@songhohoon Judging from the error message, it seems very likely that the patch operation is being blocked by a pod validation webhook. It is possible that Kyverno is playing that role, but since this is all happening in the control plane and not in the AWS VPC CNI, I think the best path forward is for you to create an AWS support case. Then we can investigate the control plane logs and figure out what is blocking this patching operation from time to time.

@songhohoon
Copy link
Author

@jdn5126
thank you for reply.
I deep dived into the problem and I figured out there is admission controller order.
in my case some of admission controller failed to inject config. after the failed admission controller the pod manifest is not annotatable. so CNI controller cannot annotate the allocated ip address in pod.
the failure is aws api limit exceed. because in my case it was development environment and it was scheduled every morning. so a lot of pods created at a time.
I tried kubectl annotate pods ${pod_name} test=test and it failed with stuck pod. but succeed with regular pod.

Copy link

github-actions bot commented Mar 5, 2024

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

@jdn5126
Copy link
Contributor

jdn5126 commented Mar 5, 2024

@songhohoon ah I see, thank you for explaining, and glad you figured it out!

@nyunyunyunyu
Copy link

nyunyunyunyu commented Mar 26, 2024

@jdn5126 In README.md, https://github.com/aws/amazon-vpc-cni-k8s/blame/87115cf204dafd148c765ea3c8d184ba73c3a09a/README.md#L498 still mentions:

Setting ENABLE_POD_ENI to true will allow IPAMD to add the vpc.amazonaws.com/has-trunk-attached label to the node if the instance has the capacity to attach an additional ENI.

Is this expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants