Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus exporter should allow conflicting help and unit metadata #28617

Closed
dashpole opened this issue Oct 25, 2023 · 16 comments · Fixed by #36356
Closed

Prometheus exporter should allow conflicting help and unit metadata #28617

dashpole opened this issue Oct 25, 2023 · 16 comments · Fixed by #36356
Labels
bug Something isn't working exporter/prometheus help wanted Extra attention is needed

Comments

@dashpole
Copy link
Contributor

Component(s)

No response

What happened?

Description

Original report: istio/istio#47397

The prometheus exporter fails to gather metrics if the description differs between two metrics with the same name.

Steps to Reproduce

With a prometheus receiver, scrape two endpoints with the same metric name, but different HELP (description). Use the prometheus exporter to export the metrics.

Expected Result

No error. Per the OTel specification:

Exporters MUST drop entire metrics to prevent conflicting TYPE comments, but SHOULD NOT drop metric points as a result of conflicting UNIT or HELP comments. Instead, all but one of the conflicting UNIT and HELP comments (but not metric points) SHOULD be dropped. If dropping a comment or metric points, the exporter SHOULD warn the user through error logging.

I.e. we should be accepting the conflicting description, choosing one description to keep, and one to drop, and warning the user.

Actual Result

The prometheus exporter fails incoming scrapes.

Collector version

Don't know

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

2023-09-29T15:24:02.049Z        error   [email protected]/log.go:23    error gathering metrics: 2 error(s) occurred:
* collected metric istio_build label:{name:"app"  value:"istiod"}  label:{name:"cluster"  value:"glooplatform-central"}  label:{name:"component"  value:"pilot"}  label:{name:"install_operator_istio_io_owning_resource"  value:"unknown"}  label:{name:"instance"  value:"10.108.2.7:15014"}  label:{name:"istio"  value:"istiod"}  label:{name:"istio_io_rev"  value:"1-18"}  label:{name:"job"  value:"istiod-metrics"}  label:{name:"namespace"  value:"istio-system"}  label:{name:"operator_istio_io_component"  value:"Pilot"}  label:{name:"pod_name"  value:"istiod-1-18-5f7964bccd-jlfwh"}  label:{name:"pod_template_hash"  value:"5f7964bccd"}  label:{name:"sidecar_istio_io_inject"  value:"false"}  label:{name:"tag"  value:"1.18.2-solo"}  gauge:{value:1} has help "Istio component build info" but should have ""
* collected metric istio_build label:{name:"app"  value:"istiod"}  label:{name:"cluster"  value:"glooplatform-east"}  label:{name:"component"  value:"pilot"}  label:{name:"install_operator_istio_io_owning_resource"  value:"unknown"}  label:{name:"instance"  value:"10.104.3.14:15014"}  label:{name:"istio"  value:"istiod"}  label:{name:"istio_io_rev"  value:"1-18"}  label:{name:"job"  value:"istiod-metrics"}  label:{name:"namespace"  value:"istio-system"}  label:{name:"operator_istio_io_component"  value:"Pilot"}  label:{name:"pod_name"  value:"istiod-1-18-7d66654c46-c5frb"}  label:{name:"pod_template_hash"  value:"7d66654c46"}  label:{name:"sidecar_istio_io_inject"  value:"false"}  label:{name:"tag"  value:"1.18.2-solo"}  gauge:{value:1} has help "Istio component build info" but should have ""
        {"kind": "exporter", "data_type": "metrics", "name": "prometheus"}
github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter.(*promLogger).Println
        github.com/open-telemetry/opentelemetry-collector-contrib/exporter/[email protected]/log.go:23
github.com/prometheus/client_golang/prometheus/promhttp.HandlerForTransactional.func1
        github.com/prometheus/[email protected]/prometheus/promhttp/http.go:144
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2109
net/http.(*ServeMux).ServeHTTP
        net/http/server.go:2487
go.opentelemetry.io/collector/config/confighttp.(*decompressor).wrap.func1
        go.opentelemetry.io/collector/config/[email protected]/compression.go:111
net/http.HandlerFunc.ServeHTTP
        net/http/server.go:2109
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*Handler).ServeHTTP
        go.opentelemetry.io/contrib/instrumentation/net/http/[email protected]/handler.go:212
go.opentelemetry.io/collector/config/confighttp.(*clientInfoHandler).ServeHTTP
        go.opentelemetry.io/collector/config/[email protected]/clientinfohandler.go:28
net/http.serverHandler.ServeHTTP
        net/http/server.go:2947
net/http.(*conn).serve
        net/http/server.go:1991


### Additional context

_No response_
@dashpole dashpole added bug Something isn't working needs triage New item requiring triage exporter/prometheus and removed needs triage New item requiring triage labels Oct 25, 2023
@github-actions
Copy link
Contributor

Pinging code owners for exporter/prometheus: @Aneurysm9. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@krisztianfekete
Copy link

Hey @Aneurysm9, I am happy to take a look at this if a maintainer can assist me!

@jroper
Copy link

jroper commented Dec 8, 2023

To work around this bug, does anyone know if it's possible to define a processor that can update the description of a metric? I can't see anything on any processor that allows editing it.

@krisztianfekete
Copy link

I think you can enforce a description with a transformprocessor, although I would prefer to solve this properly.

Copy link
Contributor

github-actions bot commented Feb 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 7, 2024
@mjarosie
Copy link

mjarosie commented Feb 8, 2024

Hi, I believe this issue is still relevant (pinging code owner: @Aneurysm9). It's causing issues with using one of the OTel libraries (linked above). Same as @krisztianfekete, I would also be happy to take a look at this, but would need guidance.

@github-actions github-actions bot removed the Stale label Feb 9, 2024
@martinrw
Copy link

martinrw commented Apr 2, 2024

We are having a similar issue... We are collecting metrics from some python and some Java Otel agents and depending on which one we get conflicting errors in our log:

the Python metrics give this error:
http_client_request_size_bytes has help "Measures the size of HTTP request messages." but should have "The size of HTTP request messages"

the Java ones give:
http_client_request_size_bytes has help "The size of HTTP request messages" but should have "Measures the size of HTTP request messages."

These errors appear to contradict each other.
At one point we added the following to our collector but it doesn't seem to have helped for this particular metrics

    transform:
      error_mode: ignore
      metric_statements:
        - context: metric
          statements:
            - set(description, "The size of HTTP request messages") where name == "http.client.request.size_bytes"
            - set(description, "The size of HTTP response messages") where name == "http.client.response.size_bytes"

It seems strange to me the a tiny difference in the description would cause metrics to be dropped... but if that's the way it has to be does anyone know another workaround or the reason I am getting this problem?

Copy link
Contributor

github-actions bot commented Jun 3, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 3, 2024
@martinrw
Copy link

martinrw commented Jun 3, 2024

Commenting to keep this open as it's still a problem for us

@dashpole dashpole removed the Stale label Jun 3, 2024
Copy link
Contributor

github-actions bot commented Aug 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 5, 2024
@martinrw
Copy link

martinrw commented Aug 5, 2024

Commenting to keep this alive

@dashpole dashpole removed the Stale label Aug 5, 2024
@eemehmet
Copy link

eemehmet commented Aug 8, 2024

I 've faced the same issue.

@oncilla
Copy link

oncilla commented Sep 5, 2024

FWIW, the problem actually happens in prometheus/client_golang.

https://github.com/prometheus/client_golang/blob/89f21b2cba4ea0f9cf449b454945f53047697158/prometheus/registry.go#L638

We are looking into forking client_golang to allow a clash of the description. But to solves this long term, I think we will need to file an issue with prometheus/client_golang instead.

@lukedirtwalker
Copy link

Created an upstream PR prometheus/client_golang#1621 to fix it client_golang.

Copy link
Contributor

github-actions bot commented Nov 5, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 5, 2024
@martinrw
Copy link

martinrw commented Nov 5, 2024

commenting to keep it open

@github-actions github-actions bot removed the Stale label Nov 6, 2024
@dashpole dashpole added the help wanted Extra attention is needed label Nov 12, 2024
RutvikS-crest pushed a commit to RutvikS-crest/opentelemetry-collector-contrib that referenced this issue Dec 9, 2024
open-telemetry#36356)


#### Description
Fixes bug where exporting fails due to different help messages for the
same metric. With this solution, the exporter will always export metrics
of the same name with the first description it receives. This also
rejects metrics whose types have changed. These changes follow the
[spec](https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#metric-metadata-1):
>Exporters MUST drop entire metrics to prevent conflicting TYPE
comments, but SHOULD NOT drop metric points as a result of conflicting
UNIT or HELP comments. Instead, all but one of the conflicting UNIT and
HELP comments (but not metric points) SHOULD be dropped. If dropping a
comment or metric points, the exporter SHOULD warn the user through
error logging.

Based on open-telemetry/opentelemetry-go#3469

#### Link to tracking issue
Fixes
open-telemetry#28617

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Unit test cases added.

---------

Co-authored-by: David Ashpole <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this issue Dec 17, 2024
open-telemetry#36356)


#### Description
Fixes bug where exporting fails due to different help messages for the
same metric. With this solution, the exporter will always export metrics
of the same name with the first description it receives. This also
rejects metrics whose types have changed. These changes follow the
[spec](https://opentelemetry.io/docs/specs/otel/compatibility/prometheus_and_openmetrics/#metric-metadata-1):
>Exporters MUST drop entire metrics to prevent conflicting TYPE
comments, but SHOULD NOT drop metric points as a result of conflicting
UNIT or HELP comments. Instead, all but one of the conflicting UNIT and
HELP comments (but not metric points) SHOULD be dropped. If dropping a
comment or metric points, the exporter SHOULD warn the user through
error logging.

Based on open-telemetry/opentelemetry-go#3469

#### Link to tracking issue
Fixes
open-telemetry#28617

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Unit test cases added.

---------

Co-authored-by: David Ashpole <[email protected]>
Co-authored-by: Alex Boten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/prometheus help wanted Extra attention is needed
Projects
None yet
8 participants