unable to load specified CA cert in target allocator #3572

oszlak · 2024-12-23T09:40:52Z

Component(s)

target allocator

What happened?

Description

I'm trying to run TA with Prom CR, while using autoGenerateCert true and certManager false.
I see the secret is populated:
apiVersion: v1 data: ca.crt: ++++++++ tls.crt: ++++++++ tls.key: ++++++++ kind: Secret metadata: annotations: helm.sh/hook: 'pre-install,pre-upgrade' helm.sh/hook-delete-policy: before-hook-creation kubectl.kubernetes.io/last-applied-configuration: >- {"apiVersion":"v1","data":{"ca.crt":"++++++++","tls.crt":"++++++++","tls.key":"++++++++"},"kind":"Secret","metadata":{"annotations":{"helm.sh/hook":"pre-install,pre-upgrade","helm.sh/hook-delete-policy":"before-hook-creation"},"labels":{"app.kubernetes.io/component":"webhook","app.kubernetes.io/instance":"<cluster_name>-opentelemetry-operator","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"opentelemetry-operator","app.kubernetes.io/version":"0.94.0","argocd.argoproj.io/instance":"<cluster_name>-opentelemetry-operator","helm.sh/chart":"opentelemetry-operator-0.48.0"},"name":"<cluster_name>-opentelemetry-operator-controller-manager-service-cert","namespace":"opentelemetry"},"type":"kubernetes.io/tls"} creationTimestamp: '2024-12-23T08:55:47Z' labels: app.kubernetes.io/component: webhook app.kubernetes.io/instance:<cluster_name>-opentelemetry-operator app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: opentelemetry-operator app.kubernetes.io/version: 0.94.0 argocd.argoproj.io/instance: <cluster_name>-opentelemetry-operator helm.sh/chart: opentelemetry-operator-0.48.0 name: >- <cluster_name>-opentelemetry-operator-controller-manager-service-cert namespace: opentelemetry resourceVersion: '665456594' uid: a5c19d0f-414c-40b5-a4da-7da52cde746a type: kubernetes.io/tls
but still can't get it to work.
Also tried to mount it in the collector crd:
volumes: - name: prometheus-certs secret: secretName: {{ .Values.scraper.prometheusSecretName }} items: - key: ca.crt path: {{ .Values.scraper.prometheusSecretPath }} containers: - name: otel-scraper volumeMounts: - name: prometheus-certs mountPath: /etc/prometheus/certs/ readOnly: true

and still getting the same error

Steps to Reproduce

Install operator and enable target allocator with self signed certs

Expected Result

I'm able to scrapte targets over https

Actual Result

Getting error creating new scrape pool

Kubernetes Version

1.30.0

Operator version

0.94.0

Collector version

0.94.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

Log output

2024-12-23T09:19:20.836Z	error	scrape/manager.go:219	error creating new scrape pool	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "error creating HTTP client: unable to load specified CA cert /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: open /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: no such file or directory", "errorVerbose": "unable to load specified CA cert /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: open /etc/prometheus/certs/secret_monitoring_<cluster_name>-admission_ca: no such file or directory\nerror creating HTTP client\ngithub.com/prometheus/prometheus/scrape.newScrapePool\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:293\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:217\ngithub.com/prometheus/prometheus/scrape.(*Manager).reloader\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:199\nruntime.goexit\n\truntime/asm_amd64.s:1650", "scrape_pool": "serviceMonitor/monitoring/<cluster_name>-operator/0"}

Additional context

No response

The text was updated successfully, but these errors were encountered:

mtthwcmpbll · 2025-01-06T22:09:50Z

I'm noticing this error too, and I think it's related to enabling the Target Allocator's mTLS feature flag. I have kube-state-metrics deployed, which deploys a ServiceMonitor with the TLS section filled out. When the Target Allocator discovers this, it's throwing this error at the receiver trying to discover the certificate described in the ServiceMonitor.

I had this halfway configured from some earlier work on this feature, and I missed a few key steps:

I was missing the cert-manager RBAC shown in the mTLS documentation on my operator (I had incorrectly done that on the target allocator itself and not the operator controller)
I was seeing an error in the operator stating "Cert-Manager is not available to the operator, skipping adding to scheme.". This issue pointed me toward setting specific environment variables on the operator so that cert-manager was successfully autodiscovered.

After fixing those two issues, I no longer see the periodic issue when the TA discovers a ServiceMonitor with a TLS block.

oszlak · 2025-01-07T11:07:48Z

thank you @mtthwcmpbll
I have followed the steps above
The role now have

 certificaterequests.cert-manager.io                  []                 []              [create get list watch update patch delete]
  certificates.cert-manager.io                         []                 []              [create get list watch update patch delete]
  issuers.cert-manager.io                              []                 []              [create get list watch update patch delete]

and also in the operator logs I can see:

{"level":"INFO","timestamp":"2025-01-07T10:57:54Z","logger":"setup","message":"Cert-Manager is available to the operator, adding to scheme."}
{"level":"INFO","timestamp":"2025-01-07T10:57:54Z","logger":"setup","message":"Securing the connection between the target allocator and the collector"}

but I still see the issue in the collectors:

2025-01-07T11:06:17.598Z	warn	internal/transaction.go:129	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1736247977598, "target_labels": "{__name__=\"up\", instance=\"{ip}:10249\", job=\"kube-proxy\"}"}
2025-01-07T11:06:17.645Z	info	Metrics	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 5, "data points": 5}
2025-01-07T11:06:18.600Z	warn	internal/transaction.go:129	Failed to scrape Prometheus endpoint	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1736247968598, "target_labels": "{__name__=\"up\", endpoint=\"https-metrics\", instance=\"{ip}:10250\", job=\"kubelet\", metrics_path=\"/metrics\", namespace=\"kube-system\", node=\"ip-{ip}.ec2.internal\", service=\"test-chart-prom-rw-kube-pr-kubelet\"}"}

can it be related to the fact that the operator has this log?

{"level":"INFO","timestamp":"2025-01-07T10:58:30Z","logger":"collector-upgrade","message":"no instances to upgrade"}

oszlak · 2025-01-07T11:18:23Z

and the main issue still exists:

2025-01-07T11:16:44.095Z	error	scrape/manager.go:180	error creating new scrape pool	{"kind": "receiver", "name": "prometheus", "data_type": "metrics", "error": "error creating HTTP client: unable to read CA cert: unable to read file /etc/prometheus/certs/0_monitoring_{cluster}-admission_ca: open /etc/prometheus/certs/0_monitoring_isr-playground-k8s-cen-dev-admission_ca: no such file or directory", "scrape_pool": "serviceMonitor/monitoring/i{cluster}-operator/0"}

oszlak added bug Something isn't working needs triage labels Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unable to load specified CA cert in target allocator #3572

unable to load specified CA cert in target allocator #3572

oszlak commented Dec 23, 2024

mtthwcmpbll commented Jan 6, 2025

oszlak commented Jan 7, 2025

oszlak commented Jan 7, 2025

unable to load specified CA cert in target allocator #3572

unable to load specified CA cert in target allocator #3572

Comments

oszlak commented Dec 23, 2024

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Environment

Log output

Additional context

mtthwcmpbll commented Jan 6, 2025

oszlak commented Jan 7, 2025

oszlak commented Jan 7, 2025