-
We are running multiple pushgateway pods in our k8s clusters for redundancy and use k8s service to load balance the requests. However, this makes querying metrics difficult because each pushgateway expose the metrics it received and pushing metrics only update one metrics. Is there recommended way to run multiple pushgateway instances for redundancy and get only lastly-pushed metrics like running one pushgateway instance. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
tl:dr: Sadly, there is no good way of running multiple Pushgateway instances in a proper HA mode. Details: I guess you could just push to n Pushgateway instances, scrape them all, and then do some PromQL magic to extract the relevant metric. But this is quite specific to what you are querying and quite cumbersome. Generally, the Pushgateway is meant for things like infrequently occurring batch jobs (e.g. your daily DB backup), for which you would set up merely ticketing alerts and not pages that wake someone up. In those cases, it's also OK to not make a broken Pushgateway a page-worthy issue. Then it's kind of OK to rely on K8s to replace a dead instance of the Pushgateway eventually, and manually intervene during work hours if something breaks hard and brings the Pushgateway down for good. This is of course not great. But the Pushgateway is fundamentally designed for a fairly niche use case in a fairly simplistic way. The many problems users run into with it are often caused by using the Pushgateway for more involved use cases for which it was never intended, including but not limited to monitoring serverless applications, pushing metrics with the expectation of true persistence/guaranteed delivery, turning Prometheus into a push-based metrics collection system, distributed counting, … |
Beta Was this translation helpful? Give feedback.
-
Relevant discussion (which happened in an issue before we had GH discussions for this repo): #241 |
Beta Was this translation helpful? Give feedback.
tl:dr: Sadly, there is no good way of running multiple Pushgateway instances in a proper HA mode.
Details:
I guess you could just push to n Pushgateway instances, scrape them all, and then do some PromQL magic to extract the relevant metric. But this is quite specific to what you are querying and quite cumbersome.
Generally, the Pushgateway is meant for things like infrequently occurring batch jobs (e.g. your daily DB backup), for which you would set up merely ticketing alerts and not pages that wake someone up. In those cases, it's also OK to not make a broken Pushgateway a page-worthy issue. Then it's kind of OK to rely on K8s to replace a dead instance of the Pushgateway eventually, an…