Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s metrics for jobs and cronjobs #1660

Merged
merged 7 commits into from
Jan 9, 2025

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Dec 9, 2024

Part of #1032

Changes

This PR adds metrics for k8s jobs and cronjobs that are already in use by the Opentelemetry Collector (k8scluster receiver).

Merge requirement checklist

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these should all be UpDownCounters, rather than gauges.

@ChrsMark
Copy link
Member Author

Good catch @dashpole! My only concern is about the k8s.job.desired_successful_pods and
k8s.job.max_parallel_pods. Should those be also UpDownCounters since they are set by users and are mostly desired-state values?

@dashpole
Copy link
Contributor

The main question is how they should be aggregated. UpDownCounters aggregate by summing, gauges aggregate by averaging. If I was aggregating k8s.job.desired_successful_pods k8s.job.max_parallel_pods` across a cluster, I would probably want to see the sum of all desired_successful_pods or the sum of max_parallel_pods, rather than the average of either. So IMO those should both be UpDownCounters.

@jinja2
Copy link
Contributor

jinja2 commented Dec 10, 2024

This is a very interesting point. These metrics being additive makes sense to me. But the data point type from the receiver is gauge. I think most instrumentation libraries export UpDownCounter as non-monotonic sums but I am not sure if this a standard. Would this cause some confusion for users? Should there be a way to clarify export and instrument type in cases as this?

@dashpole
Copy link
Contributor

I think most instrumentation libraries export UpDownCounter as non-monotonic sums but I am not sure if this a standard.

Yes, that is correct. It is a bit odd to specify an instrument type for metrics that are not recorded using an instrument, but that seems mostly editorial here.

IMO we should update the data point type in the receiver to be a non-monotonic sum.

@trask
Copy link
Member

trask commented Dec 10, 2024

I think most instrumentation libraries export UpDownCounter as non-monotonic sums but I am not sure if this a standard.

I think this is the closest we have:

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk_exporters/otlp.md#additional-environment-variable-configuration

which specifies exporting UpDownCounter with Cumulative aggregation in all cases

@ChrsMark
Copy link
Member Author

@dashpole @jinja2 @open-telemetry/semconv-k8s-approvers types are now switched to updowncounters. Anything else missing here?

@lmolkova
Copy link
Contributor

lmolkova commented Dec 17, 2024

Do @open-telemetry/semconv-system-approvers want to take a look?

@lmolkova lmolkova merged commit f0c1087 into open-telemetry:main Jan 9, 2025
14 checks passed
lmolkova added a commit to lmolkova/semantic-conventions that referenced this pull request Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Archived in project
Development

Successfully merging this pull request may close these issues.

7 participants