Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #484

raghavan20 · 2022-07-28T06:47:26Z

raghavan20
Jul 28, 2022

Hello. We have an execution model which is typical of a producer-consumer pattern with a queue/topic in the middle. Currently, the queue holds work of same type from multiple customers/tenants. The consumers/workers are non-http based applications that pick/pop a message from the queue and execute work. These consumers are Kubernetes pods that are spun up using a Deployment. They are configured to autoscale based on the work available on the queue. We would like to know at least two metrics about the performance of the workers/backlog burn up

number of executions processed for each customer/tenant
number of success or failures for each customer/tenant

We were trying to send through Prometheus gateway but looks it has design philosophy which for us
a) can either result in metric overwrites from multiple workers/pods if they try to use same job/group name
b) can result in garbage build up if job/group name is based on instance/pod name as pods come and go over longer periods of time.

What is the best way to publish metrics from these ephemeral workers or offline processing pattern? Could not find an arch/design pattern documented for this problem.

Possible options:

Use a group name based on instance even though it is documented as anti-pattern. We could then run a cron job to delete older groups based on push_time_seconds gauge metric
We could additionally introduce a mini HTTP web server for each consumer and expose a scrape metric endpoint. It is possibly a bit overkill but it would work with Prometheus PodMonitor. Please suggest.

beorn7 · 2022-08-09T16:37:00Z

beorn7
Aug 9, 2022
Maintainer

This smells like the "distributed counter" use case, see the non-goals section in the README.md. Quoting from there: "If you need distributed counting, you could either use the actual statsd in combination with the Prometheus statsd exporter, or have a look at Weavework's aggregation gateway."

One could also argue that the workers aren't actually short-lived. "autoscaled" doesn't mean "short-lived" per se. IIUC the workers aren't terminating themselves after each processed workload, but they are only shut down if the queue holds not enough work for the current number of workers. Therefore, you could just scrape the workers normally and accept that you'll underreport a bit whenever a downscaling is happening.

If you need exact reporting, one might argue that Prometheus isn't the right system for that. You would rather need an event processing story with some guarantees of completeness (which could then be monitored by Prometheus, in turn).

Another way would be to instrument the binaries constituting the queueing service themselves.

In any case, the Pushgateway as is is not really made for this kind of use case.

0 replies

raghavan20 · 2022-08-16T06:25:50Z

raghavan20
Aug 16, 2022
Author

Thank you for pointing towards the 'non-goals' section. I will also look into the 'weaveworks aggregation gateway' and check for correctness with distributed metric reporting. Do not want to introduce another system, statsd as it increases operational complexity and information overload for ops and dev.

What I also understand from your comment >

we could run an additional HTTP server on an async consumer(reasonably long-living process) to expose metrics endpoint to be scraped by Prometheus. The discovery of the async consumer pods can be facilitated using PodMonitor feature of Prometheus operator.

In the meanwhile, we have chosen to run a cronjob to delete groups that have not been updated in some threshold time. This helps us to avoid building up large number of stale metric groups over time based on push_time_seconds metric

#!/bin/python                                                                                                                                   
import requests, os, re                                                                                                                                 
from datetime import datetime, timedelta                                                                                                        
THRESHOLD = timedelta(hours=1)                                                                                                                  
                                                                                                                                               
PUSH_GATEWAY_ENDPOINT = 'http://prometheus-monitoring-pushgateway.monitoring.svc.cluster.local:9091'                                            
                                                                                                                                               
def delete_jobs(jobs):                                                                                                                          
   for job in jobs:                                                  
       try:            
           url = f'{PUSH_GATEWAY_ENDPOINT}/metrics/job/{job}'
                                
           response = requests.request("DELETE", url)
       except Exception as ex:                                                                                                                 
           print(f'Exception occured while deleting: {ex}')
       else:                  
           print(f'Deleted {job}')               
                                
                                
def run():
   url = f'{PUSH_GATEWAY_ENDPOINT}/metrics'
                                
   response = requests.request("GET", url)
                                
   text = response.text
   job_record_pattern = re.compile(r'push_time_seconds{[^,]+,job=\"(?P<job_name>[\w-]+)\"}\s(?P<timestamp>[\w.+]+)')
   jobs = re.finditer(job_record_pattern, text)   
   now = datetime.now()                                                                                                                                
   jobs_to_delete = []                                                                                                                         
   for job_match in jobs:
       job = job_match.groupdict()
       last_job_executed_time = datetime.fromtimestamp(float(job['timestamp']))
       job_age = now - last_job_executed_time
       if job_age > THRESHOLD:
           jobs_to_delete.append(job["job_name"])
   delete_jobs(jobs_to_delete)


if __name__ == '__main__':
   run()

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #484

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Architecture suggestion for publishing metrics to Prometheus from many ephemeral workers #484

raghavan20 Jul 28, 2022

Replies: 2 comments

beorn7 Aug 9, 2022 Maintainer

raghavan20 Aug 16, 2022 Author

raghavan20
Jul 28, 2022

beorn7
Aug 9, 2022
Maintainer

raghavan20
Aug 16, 2022
Author