Periodic restarts #510

vobukh · 2022-11-09T09:37:44Z

vobukh
Nov 9, 2022

Our Pushgateway deployment is restarted periodically (ones in 1-2 hours).
The liveness probe has http endpoint /-/healthy and timeout 10 seconds.
Does that means our write queue (with hardcoded capacity == 1000 in the code) is full?

// Healthy implements the MetricStore interface.
func (dms *DiskMetricStore) Healthy() error {
	// By taking the lock we check that there is no deadlock.
	dms.lock.Lock()
	defer dms.lock.Unlock()

	// A pushgateway that cannot be written to should not be
	// considered as healthy.
	if len(dms.writeQueue) == cap(dms.writeQueue) {
		return fmt.Errorf("write queue is full")
	}

	return nil
}

Our clients send metric families every 10 seconds, there are up to 130 running jobs. It results in ~ 13 WriteRequests per second.
The CPU consumption of our Pushgateway deployment is ~ 0.4 core according to k8s metrics, container allocation bounds are [1,4] (min, max).

What could be the reason why the Pushgateway cannot keep up with inbound requests?

beorn7 · 2022-11-14T13:24:14Z

beorn7
Nov 14, 2022
Maintainer

You can play with lower push rates to see if things improve.

Generally, the Pushgateway is designed for rarely running batch jobs to push at the end of their runtime, so more like a handful of pushes per hour or per day, not regular pushes every 10 seconds.

If you are pushing that often, you might investigate other ways of delivering your metrics (like enable the usual pull based scrape somehow, e.g. via https://github.com/prometheus-community/PushProx , or investigate remote write to a long term storage).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Periodic restarts #510

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Periodic restarts #510

vobukh Nov 9, 2022

Replies: 1 comment

beorn7 Nov 14, 2022 Maintainer

vobukh
Nov 9, 2022

beorn7
Nov 14, 2022
Maintainer