Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2-3x increase in GetMetadata API calls from 1.16.2 -> 1.16.3 release #2813

Closed
diranged opened this issue Feb 26, 2024 · 5 comments
Closed

2-3x increase in GetMetadata API calls from 1.16.2 -> 1.16.3 release #2813

diranged opened this issue Feb 26, 2024 · 5 comments
Labels

Comments

@diranged
Copy link

diranged commented Feb 26, 2024

What happened:
This morning we upgraded from 1.16.2 to 1.16.3 - and while there are no errors or problems, I noticed a sharp increase in the latency of GetMetadata API calls coming from the CNI pods (and the count):

(Graph is a rate ...)

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_sum{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

image

Here's the count of total calls:

sum by (api, error, status) (
    rate(
        awscni_aws_api_latency_ms_count{
            job=~"$daemonset",
            node=~"$node",
            error="false"
        }[$__rate_interval]
    )
)

Screenshot 2024-02-26 at 1 06 42 PM

Is this expected or known?

Environment:
Kubernetes: 1.28
CNI: 1.16.3
OS: BottleRocket 1.17.0

@diranged diranged added the bug label Feb 26, 2024
@jdn5126
Copy link
Contributor

jdn5126 commented Feb 26, 2024

@diranged this is definitely not a known issue. Did the volume of these calls change at all? Does reverting to v1.16.2 immediately resolve this issue?

@jdn5126
Copy link
Contributor

jdn5126 commented Mar 5, 2024

@diranged v1.16.4 is now released. Can you try this release?

@jdn5126 jdn5126 closed this as completed Mar 5, 2024
@jdn5126 jdn5126 reopened this Mar 5, 2024
@aws aws deleted a comment from github-actions bot Mar 5, 2024
@jdn5126 jdn5126 added question and removed bug labels Mar 5, 2024
@orsenthil
Copy link
Member

@diranged - Do you see the same behavior with v1.16.4 release? This call volume could have come from transitive dependency call, and we wanted to verify that.

@orsenthil
Copy link
Member

Later version v1.16.4 and now v1.17.1 is available, and we haven't seen any of reports of this. Closing and fixed.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants