Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pressure Stall Information Metrics #3649

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

xinau
Copy link

@xinau xinau commented Jan 26, 2025

issues: #3052, #3083, kubernetes/enhancements#4205

This change adds metrics for pressure stall information, that indicate
why some or all tasks of a cgroupv2 have waited due to resource
congestion (cpu, memory, io). The change exposes this information by
including the PSIStats of each controller in it's stats, i.e.
CPUStats.PSI, MemoryStats.PSI and DiskStats.PSI.

The information is additionally exposed as Prometheus metrics. The
metrics follow the naming outlined by the prometheus/node-exporter,
where stalled eq full and waiting eq some.

container_pressure_cpu_stalled_seconds_total
container_pressure_cpu_waiting_seconds_total
container_pressure_memory_stalled_seconds_total
container_pressure_memory_waiting_seconds_total
container_pressure_io_stalled_seconds_total
container_pressure_io_waiting_seconds_total

This change is a rebase and resolve of the comments the work done in #3083.

This adds 2 new set of metrics:
- `psi_total`: read total number of seconds a resource is under pressure
- `psi_avg`: read ratio of time a resource is under pressure over a
  sliding time window.

For more details about these definitions, see:
- https://www.kernel.org/doc/html/latest/accounting/psi.html
- https://facebookmicrosites.github.io/psi/docs/overview

Signed-off-by: Daniel Dao <[email protected]>
This adds support for reading PSI metrics via prometheus. We exposes the
following for `psi_total`:

```
container_cpu_psi_total_seconds
container_memory_psi_total_seconds
container_io_psi_total_seconds
```

And for `psi_avg`:

```
container_cpu_psi_avg10_ratio
container_cpu_psi_avg60_ratio
container_cpu_psi_avg300_ratio

container_memory_psi_avg10_ratio
container_memory_psi_avg60_ratio
container_memory_psi_avg300_ratio

container_io_psi_avg10_ratio
container_io_psi_avg60_ratio
container_io_psi_avg300_ratio
```

Signed-off-by: Daniel Dao <[email protected]>
@xinau
Copy link
Author

xinau commented Jan 26, 2025

@rexagod, @SuperQ Could you please give this a review and advise me how to get this change merged.

issues: google#3052, google#3083, kubernetes/enhancements#4205

This change adds metrics for pressure stall information, that indicate
why some or all tasks of a cgroupv2 have waited due to resource
congestion (cpu, memory, io). The change exposes this information by
including the _PSIStats_ of each controller in it's stats, i.e.
_CPUStats.PSI_, _MemoryStats.PSI_ and _DiskStats.PSI_.

The information is additionally exposed as Prometheus metrics. The
metrics follow the naming outlined by the prometheus/node-exporter,
where stalled eq full and waiting eq some.

```
container_pressure_cpu_stalled_seconds_total
container_pressure_cpu_waiting_seconds_total
container_pressure_memory_stalled_seconds_total
container_pressure_memory_waiting_seconds_total
container_pressure_io_stalled_seconds_total
container_pressure_io_waiting_seconds_total
```

Signed-off-by: Felix Ehrenpfort <[email protected]>
@xinau xinau force-pushed the xinau/add-psi-metrics branch from 8b41ec5 to 103b4be Compare January 26, 2025 17:30
cmd/go.mod Outdated Show resolved Hide resolved
metrics/prometheus.go Outdated Show resolved Hide resolved
@SuperQ
Copy link
Contributor

SuperQ commented Jan 26, 2025

Looking great so far, the metric names and other conventions look fine.

@xinau
Copy link
Author

xinau commented Jan 26, 2025

@SuperQ Thanks for the quick review. I've added the improvements.

metrics/prometheus.go Outdated Show resolved Hide resolved
@xinau xinau requested a review from SuperQ January 26, 2025 20:34
@xinau
Copy link
Author

xinau commented Jan 27, 2025

@SuperQ I'm going take a look at the CPU PSI metrics again today. It seems that the CPU PSI full metric can be neq 0. I've stumbled upon this while reading kubernetes/enhancements#5062

@xinau
Copy link
Author

xinau commented Jan 27, 2025

@SuperQ I'm going to re-add the CPU full metric, as it's actively being reported by the kernel for cgroups.

* Naturally, the FULL state doesn't exist for the CPU resource at the
* system level, but exist at the cgroup level, means all non-idle tasks
* in a cgroup are delayed on the CPU resource which used by others outside
* of the cgroup or throttled by the cgroup cpu.max configuration.

See
https://lore.kernel.org/all/[email protected]/
https://lore.kernel.org/all/[email protected]/

@rexagod
Copy link

rexagod commented Jan 27, 2025

Thank you for your work (and investigation) on this, @xinau!

Not sure but after a quick look I can see we dropped container_%s_psi_avg%s_ratio here, was this intentional?

Ah, nevermind. I believe these can be derived.

@SuperQ
Copy link
Contributor

SuperQ commented Jan 27, 2025

@rexagod Yup. With Prometheus we can derive arbitrary averages as they're just rate(container_..._total[Xm]).

Signed-off-by: Felix Ehrenpfort <[email protected]>
@xinau
Copy link
Author

xinau commented Jan 27, 2025

@rexagod, @SuperQ all good from my side now.

Copy link

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dims Could you please approve the pending workflow here, or ping someone who could? The patch builds on top of the original PR while additionally following the community guidelines, and looks good to go in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants