Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: Decrease LFC metrics fetch frequency from 1/15hz -> 1/60hz #1187

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sharnoff
Copy link
Member

@sharnoff sharnoff commented Dec 25, 2024

I.e., instead of fetching LFC metrics every 15 seconds, we will now fetch them every 60s.

The reason for this is because 15s intervals cause oscillations due to the 1-minute buckets in the metrics we're fetching. Second-to-second changes in load can cause small, periodic changes as they shift their location within the minute-long buckets. Fetching metrics every minute should fix this, because we get better alignment (although of course it's still possible for something to straddle the boundary between buckets and cause strange results, it's much less likely than what we're currently seeing in practice).

However, going up to a minute without LFC metrics wouldn't be good, because we require all sources of metrics to be available to downscale. So this commit adds new functionality to the metrics fetching config to guarantee that the first metrics fetch is done within a smaller time interval, even if the rest are evenly distributed over the full range of the refresh period.

Broadly part of neondatabase/cloud#22214.


Another option here is smoothing -- e.g., using the average of the LFC goal sizes from the last minute, rather than just the most recent one. There's trade-offs either way; I'm not sure which one is better.

I wrote about this a little bit here: https://www.notion.so/neondatabase/162f189e004780baa0f2f2c982735554?pvs=4#167f189e0047801aa2dde11c06c432bb

I.e., instead of fetching LFC metrics every 15 seconds, we will now
fetch them every 60s.

However, going up to a minute without LFC metrics wouldn't be good,
because we require all sources of metrics to be available to downscale.
So this commit adds new functionality to the metrics fetching config to
guarantee that the *first* metrics fetch is done within a smaller time
interval, even if the rest are evenly distributed over the full range of
the refresh period.
Copy link

github-actions bot commented Dec 25, 2024

No changes to the coverage.

HTML Report

Click to open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant