Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional retrieving IMDS metadata failed on AL2023 #2262

Open
brianrowlett opened this issue Dec 11, 2024 · 9 comments
Open

Occasional retrieving IMDS metadata failed on AL2023 #2262

brianrowlett opened this issue Dec 11, 2024 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@brianrowlett
Copy link

/kind bug

We currently have AL2 nodes and have never had a problem with this.

When switching to AL2023 nodes, occasionally the ebs-csi-node will fail to retrieve metadata from IMDS. This only appears to happen at node startup time, if we restart the ebs-csi-node daemonset, it is able to retrieve metadata from IMDS reliably.

It does appear to successfully fallback to getting metadata from Kubernetes, but we think IMDS should not be failing like this.

What happened?

I1211 20:07:09.634316       1 main.go:157] "Initializing metadata"
E1211 20:07:14.635517       1 metadata.go:51] "Retrieving IMDS metadata failed, falling back to Kubernetes metadata" err="could not get EC2 instance identity metadata: operation error ec2imds: GetInstanceIdentityDocument, request canceled, context deadline exceeded"
I1211 20:07:14.645753       1 metadata.go:55] "Retrieved metadata from Kubernetes"
I1211 20:07:14.646110       1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.34.0"
I1211 20:07:16.167040       1 node.go:941] "CSINode Allocatable value is set" nodeName="ip-100-64-153-121.ec2.internal" count=31

What you expected to happen?

I1211 20:24:41.226237       1 main.go:157] "Initializing metadata"
I1211 20:24:42.479940       1 metadata.go:48] "Retrieved metadata from IMDS"
I1211 20:24:42.480783       1 driver.go:69] "Driver Information" Driver="ebs.csi.aws.com" Version="v1.34.0"
I1211 20:24:43.497952       1 node.go:941] "CSINode Allocatable value is set" nodeName="ip-100-64-251-153.ec2.internal" count=31

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:

Our launch template looks like:

  NodeLaunchTemplate2023:
    Type: AWS::EC2::LaunchTemplate
    Condition: CreateManagedNodegroup2023
    DependsOn:
    - Cluster
    Properties:
      LaunchTemplateData:
        BlockDeviceMappings:
        - DeviceName: /dev/xvda
          Ebs:
            DeleteOnTermination: true
            Encrypted: true
            VolumeSize: !Ref WorkerVolumeSize
            VolumeType: gp3
        MetadataOptions:
          HttpEndpoint: enabled
          HttpPutResponseHopLimit: 2
          HttpTokens: required
          InstanceMetadataTags: disabled
        NetworkInterfaces:
        - DeviceIndex: 0
          Groups:
          - !GetAtt Cluster.ClusterSecurityGroupId

And our managed nodegroup looks like:

  ManagedNodegroup2023a:
    Type: AWS::EKS::Nodegroup
    Condition: CreateManagedNodegroup2023
    DependsOn:
    - Cluster
    - NodeInstanceRole
    - NodeLaunchTemplate2023
    Properties:
      AmiType: AL2023_x86_64_STANDARD
      CapacityType: ON_DEMAND
      ClusterName: !Ref Cluster
      InstanceTypes:
      - !Ref WorkerInstanceType
      LaunchTemplate:
        Id: !Ref NodeLaunchTemplate2023
        Version: !GetAtt NodeLaunchTemplate2023.LatestVersionNumber
      NodeRole: !GetAtt NodeInstanceRole.Arn
      ScalingConfig:
        DesiredSize: !Ref NodegroupSizeDesired
        MaxSize: !Ref NodegroupSizeMaximum
        MinSize: !Ref NodegroupSizeMinimum
      Subnets:
      - Fn::ImportValue:
          !Sub "${VpcName}-private-a"
      UpdateConfig:
        MaxUnavailable: 1

Environment

  • Kubernetes version (use kubectl version): v1.30.6-eks-7f9249a
  • Driver version: v1.34.0
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 11, 2024
@AndrewSirenko
Copy link
Contributor

AndrewSirenko commented Dec 12, 2024

Hi @brianrowlett,

I wonder if there's a race where imds.GetInstanceIdentityDocument times out before pod networking is fully setup on the node.

Will try to reproduce and bring this up with the team. Perhaps there's a more robust way to attempt IMDS metadata retrieval.

If not, we can consider exposing a parameter to NOT fallback to K8s_metadata. With this parameter, ebs-csi-node would keep restarting until IMDS is ready, instead of requiring manual intervention. Would this kind of imdsMetadataOnly parameter be useful to you?

Thanks for raising the issue!

@brianrowlett
Copy link
Author

Hi @AndrewSirenko , thank you for the quick response.

My intuition was that maybe this was a race condition, but I'm not familiar enough with the codebase to say for sure. It's reassuring that you might be thinking the same thing.

To clarify, manually restarting the pods is not required, and falling back to Kubernetes metadata is likely acceptable for us (we just didn't like seeing imds fail without knowing why), so I don't think an imdsMetadataOnly parameter is necessary at this time.

Please let me know if there is anything I can do to help you reproduce the issue or test a fix.

@AndrewSirenko
Copy link
Contributor

@brianrowlett 3 more questions for you to help us reproduce:

  1. What CNI plugin are you relying on?
  2. If you're relying on VPC CNI, are you using strict mode?
  3. Is hostNetwork enabled/disabled?

Thank you!

@brianrowlett
Copy link
Author

@AndrewSirenko

  1. We do use the VPC CNI for attaching ENIs and assigning IP addresses to pods, but we don't use it for network policy enforcement (we use Calico instead; however there are no network policies restricting the ebs-csi-node)
  2. We are not using strict mode
  3. hostNetwork is disabled

@AndrewSirenko
Copy link
Contributor

Thanks @brianrowlett, we'll dive into the current IMDS SDK retry logic and see if there's an improvement we can make in our EC2MetadataInstanceInfo path.

Final question, how often does this happen on your cluster? 1 in how many node startups?

Appreciate you spotting this, will also mention this AL2 vs AL23 behavior to the IMDSv2 team.

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Dec 17, 2024
@brianrowlett
Copy link
Author

Thank you @AndrewSirenko, I was seeing it relatively frequently, maybe 1 in 3 or so (but unfortunately, I didn't actually keep a record).

@asher-lab
Copy link

asher-lab commented Jan 3, 2025

We are actually seeing this issue as well on node creation, this had caused us to receive a false alert the pod ebs-csidriver restarts. It only happens intermittently. @brianrowlett @AndrewSirenko Is it possible that it have a retry or wait mechanism rather than the pod restarting?

@radirobi
Copy link

radirobi commented Jan 21, 2025

@AndrewSirenko
This has some major impacts on me since I am using Nitro instances where the ENIs occupy volumeattachment slots from the shared pool. Because the driver falls back to Kubernetes metadata it reports the wrong number of available vollumeAttachments. Quoted from the docs here:

Kubernetes metadata does not provide information about the number of ENIs or EBS volumes attached to an instance.
Thus, when performing volume limit calculations, node pods using Kubernetes metadata will assume one ENI and one EBS volume (the root volume) is attached.

My machine type can handle 28 volumes (including disks, ENIs), and because of the way EBS-CSI driver fetches information from the Kubernetes metadata it reports 26 free slots. However, that is not true because my CNI configuration is running with WARM_ENI_TARGET = 2 . Meaning, by default every EKS node is provisioned with 3 ENIs and one root disk, leaving 28-3-1=24 free slots. So the Kubernetes scheduler will try to schedule stateful workloads on this node because it sees that 26 stateful workloads can be scheduled on that node, however in reality it can handle only 24.

@AndrewSirenko
Copy link
Contributor

@asher-lab @radirobi, thank you for your +1s and noting that there is impact to your stateful workloads.

I'll escalate the priority of this issue internally. Worst case I will add a short-term workaround of retrying IMDS one additional time if my team does not have bandwidth.

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

5 participants