Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrometheusTargetMissingWithWarmupTime check issue with multiple jobs on the same target #424

Open
gskornowicz opened this issue Jun 19, 2024 · 5 comments

Comments

@gskornowicz
Copy link

Hi, I'm wondering if I can do something to avoid this error:

Error executing query: found duplicate series for the match group {instance="server.company.com"} on the left hand-side of the operation: [{__name__="up", instance="server.company.com", job="bind", type="dns"}, {__name__="up", instance="server.company.com", job="node", os="Linux", type="vm"}];many-to-many matching not allowed: matching labels must be unique on one side

It's realated to PrometheusTargetMissingWithWarmupTime alert and it's expression sum by (instance, job) ((up == 0) * on (instance) group_right(job) (node_time_seconds - node_boot_time_seconds > 600)) which if I understand correctly can match multiple up==0 if I have more than one job at the same target? Any way to avoid/fix that?

@samber
Copy link
Owner

samber commented Jun 19, 2024

Do you have multiple Prometheus instances in federation mode or a remote-write setup ?

In that case, add a label to differentiate both jobs/prometheus.

@gskornowicz
Copy link
Author

Hi @samber
No multiple Prometheus instances nor the remote-write setup.
It seems that the problem is due to multiple jobs per one instance?

@samber
Copy link
Owner

samber commented Jun 20, 2024

Yes, you may have multiple series with identical labels.

Do you use service discovery? Did you check if an exporter endpoint is declared twice in prometheus.yml ?

@gskornowicz
Copy link
Author

I don't use service discovery.

I don't see any duplicated exporter endpoints

It's not clear to me what caused it, because labels are not identical:
{__name__="up", instance="server.company.com", job="bind", type="dns"} <- it's blackbox-dns exporter
{__name__="up", instance="server.company.com", job="node", os="Linux", type="vm"} <- it's node exporter
The job and the type are different, it is because instance label is the same?

attaching prometheus.yml

global:
  scrape_interval: 20s
  scrape_timeout: 20s
  evaluation_interval: 15s

  external_labels:
    environment: prometheus.company.com




rule_files:
  - /etc/prometheus/rules/*.rules

alerting:
  alertmanagers:
  - scheme: http
    static_configs:
    - targets:
      - prometheus.company.com:9093


scrape_configs:
  - job_name: prometheus
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com:9090
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: grafana
    static_configs:
    - targets:
      - prometheus.company.com:3000
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: alertmanager
    static_configs:
    - targets:
      - prometheus.company.com:9093
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: metrics-snmp
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: metrics-vmware
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9272
  - job_name: metrics-blackbox
    metrics_path: /metrics
    static_configs:
    - targets:
      - prometheus.company.com
    relabel_configs:
    - target_label: instance
      replacement: exporter
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: node
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/node.yml
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: bind
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/bind.yml
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
  - job_name: snmp-idrac
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-idrac.yml
    params:
      module:
      - idrac
      auth:
      - idrac
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-synology
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-synology.yml
    params:
      module:
      - synology
      auth:
      - synology
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-wlan
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-wlan.yml
    params:
      module:
      - cisco
      auth:
      - cisco
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-firewall
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-firewall.yml
    params:
      module:
      - barracuda
      auth:
      - barracuda
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - source_labels:
      - instance
      regex: (.*).*1$
      replacement: primary
      target_label: boxrole
    - source_labels:
      - instance
      regex: (.*).*2$
      replacement: secondary
      target_label: boxrole
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-powerwalker
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-powerwalker.yml
    params:
      module:
      - powerwalker
      auth:
      - powerwalker
    relabel_configs:
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-switch
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-switch.yml
    params:
      module:
      - switch
      auth:
      - switch
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: snmp-brocade
    metrics_path: /snmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/snmp-brocade.yml
    scrape_interval: 30s
    scrape_timeout: 30s
    params:
      module:
      - brocade
      auth:
      - brocade
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9116
  - job_name: blackbox-http
    metrics_path: /probe
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/blackbox-http.yml
    relabel_configs:
    - source_labels:
      - module
      target_label: __param_module
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: blackbox-icmp
    metrics_path: /probe
    params:
      module:
      - icmp
    file_sd_configs:
    - files:
      - /etc/prometheus/file_sd/blackbox-icmp.yml
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __address__
      target_label: instance
    - source_labels:
      - __address__
      regex: (.*):.*$
      replacement: $1
      target_label: instance
    - source_labels:
      - __address__
      regex: .*:(.*)$
      replacement: $1
      target_label: __param_target
    - target_label: __address__
      replacement: 127.0.0.1:9115
  - job_name: blackbox-dns
    metrics_path: /probe
    params:
      module:
      - dns
    static_configs:
    - targets:
      - one.one.one.one
      - ns1.company.com
      - ns2.company.com
    relabel_configs:
    - source_labels:
      - __address__
      target_label: __param_target
    - source_labels:
      - __param_target
      target_label: instance
    - target_label: __address__
      replacement: 127.0.0.1:9115

@W1zzardTPU
Copy link

Ran into the same issue, using the admin API to delete the problematic series solved it.

something like: curl -X POST -g 'http://10.0.0.1:9090/api/v1/admin/tsdb/delete_series?match[]={instance="10.0.0.22:9090"}'

You have to enable the admin API first in Prometheus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants