Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid duplicates at PromQL level #11

Open
rptaylor opened this issue May 27, 2021 · 3 comments
Open

avoid duplicates at PromQL level #11

rptaylor opened this issue May 27, 2021 · 3 comments

Comments

@rptaylor
Copy link
Owner

rptaylor commented May 27, 2021

The 'instance' and 'pod' labels represent the KSM instance and need to be filtered out to avoid mismatch errors ("many-to-many matching not allowed: matching labels must be unique on one side") , in the situation where KSM restarts (or runs >1 replica).

This works for the one side of the CPU query:

max without (instance,pod) (max_over_time(kube_pod_container_resource_requests_cpu_cores{node != ""}[48h]))

But no matter what I tried I could not get a vector result from the other side; this is where the mismatch error is coming from:

max_over_time(kube_pod_completion_time[48h]) - on (exported_pod) max_over_time(kube_pod_start_time[48h])

I don't understand that because 'on (exported_pod)' should mean only the exported_pod label is used for matching.

Particularly odd: both of these queries work, which each ignore just one label:

max_over_time(kube_pod_completion_time[48h]) - ignoring(pod) max_over_time(kube_pod_start_time[48h])

max_over_time(kube_pod_completion_time[48h]) - ignoring(instance) max_over_time(kube_pod_start_time[48h])

But ignoring both labels causes a many-to-many matching error, just like using on(exported_pod):

max_over_time(kube_pod_completion_time[48h]) - ignoring (pod, instance) max_over_time(kube_pod_start_time[48h])

And this works but it has problematic duplicate entries:

max_over_time(kube_pod_completion_time[48h]) - max_over_time(kube_pod_start_time[48h])

so that is how it is working currently, and it relies on the rearrange function ignoring duplicates.
It might be preferable to avoid duplicates at the PromQL level instead of in the python code - however the prometheus queries are subject to complex vagaries and occasional syntax changes so maybe deduplicating in python is safer.

@rptaylor
Copy link
Owner Author

Currently the cputime query is anyway somewhat redundant with the endtime,starttime,cores queries.
They are used as a check against each other to ensure that cputime = (endtime - starttime) * cores.

Good to have a cross check, though it takes a bit more time to run the extra queries.

@rptaylor
Copy link
Owner Author

rptaylor commented May 31, 2022

With the updates for KSM 2.0 #22
exported_pod is gone and pod is the right label to use, so it would only be necessary to use ignoring(instance).
However the Prometheus behaviour seems different now.

max_over_time(kube_pod_completion_time[48h]) - ignoring(instance) max_over_time(kube_pod_start_time[48h])

Shortly after redeploying KSM to trigger the multi-instance issue, now this gives an error "many-to-many matching not allowed: matching labels must be unique on one side". Or maybe I didn't test it correctly before.
So the situation remains the same, still relying on the rearrange function in python to remove the duplicates.

@rptaylor
Copy link
Owner Author

Here is an example of the same pod record, duplicated for each KSM instance that Prom collected the record from:

{container="kube-state-metrics", endpoint="http", instance="10.233.96.214:8080", job="kube-prometheus-kube-state-metrics", namespace="harvester", pod="grid-job-14869085-4hdv5", service="kube-prometheus-kube-state-metrics", uid="97ff9f95-3c7c-4306-9509-7838147fce65"}

{container="kube-state-metrics", endpoint="http", instance="10.233.87.156:8080", job="kube-prometheus-kube-state-metrics", namespace="harvester", pod="grid-job-14869085-4hdv5", service="kube-prometheus-kube-state-metrics", uid="97ff9f95-3c7c-4306-9509-7838147fce65"}

@rptaylor rptaylor changed the title improve PromQL query to avoid duplicates avoid duplicates at PromQL level Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant