check_systemd
is a Nagios /
Icinga monitoring plugin to check
systemd. This Python script will report a
degraded system to your monitoring solution. It can also be used to
monitor individual systemd services (with the -u, --unit
parameter)
and timers units (with the -t, --dead-timers
parameter). The only
dependency the plugin needs is the Python library
nagiosplugin.
pip3 install check_systemd
- Debian (package, source code): in unstable
- NixOS
(package,
source
code):
nix-env -iA nixos.check_systemd
- Fedora
(package,
source code):
dnf install nagios-plugins-systemd
usage: check_systemd [-h] [-v] [-V] [-I REGEXP] [-u UNIT_NAME] [--include-type UNIT_TYPE [UNIT_TYPE ...]] [-e REGEXP] [--exclude-unit UNIT_NAME [UNIT_NAME ...]] [--exclude-type UNIT_TYPE] [--required REQUIRED_STATE] [-t] [-W SECONDS] [-C SECONDS] [-n] [-w SECONDS] [-c SECONDS] [--dbus | --cli] [--user] [-P | -p] Copyright (c) 2014-18 Andrea Briganti <[email protected]> Copyright (c) 2019-23 Josef Friedrich <[email protected]> Nagios / Icinga monitoring plugin to check systemd. options: -h, --help show this help message and exit -v, --verbose Increase output verbosity (use up to 3 times). -V, --version show program's version number and exit Options related to unit selection: By default all systemd units are checked. Use the option '-e' to exclude units by a regular expression. Use the option '-u' to check only one unit. -I REGEXP, --include REGEXP Include systemd units to the checks. This option can be applied multiple times, for example: -I mnt-data.mount -I task.service. Regular expressions can be used to include multiple units at once, for example: -i 'user@\d+\.service'. For more informations see the Python documentation about regular expressions (https://docs.python.org/3/library/re.html). -u UNIT_NAME, --unit UNIT_NAME, --include-unit UNIT_NAME Name of the systemd unit that is being tested. --include-type UNIT_TYPE [UNIT_TYPE ...] One or more unit types (for example: 'service', 'timer') -e REGEXP, --exclude REGEXP Exclude a systemd unit from the checks. This option can be applied multiple times, for example: -e mnt- data.mount -e task.service. Regular expressions can be used to exclude multiple units at once, for example: -e 'user@\d+\.service'. For more informations see the Python documentation about regular expressions (https://docs.python.org/3/library/re.html). --exclude-unit UNIT_NAME [UNIT_NAME ...] Name of the systemd unit that is being tested. --exclude-type UNIT_TYPE One or more unit types (for example: 'service', 'timer') --required REQUIRED_STATE Set the state that the systemd unit must have (for example: active, inactive) Timers related options: -t, --timers, --dead-timers Detect dead / inactive timers. See the corresponding options '-W, --dead-timer-warning' and '-C, --dead- timers-critical'. Dead timers are detected by parsing the output of 'systemctl list-timers'. Dead timer rows displaying 'n/a' in the NEXT and LEFT columns and the time span in the column PASSED exceeds the values specified with the options '-W, --dead-timer-warning' and '-C, --dead-timers-critical'. -W SECONDS, --timers-warning SECONDS, --dead-timers-warning SECONDS Time ago in seconds for dead / inactive timers to trigger a warning state (by default 6 days). -C SECONDS, --timers-critical SECONDS, --dead-timers-critical SECONDS Time ago in seconds for dead / inactive timers to trigger a critical state (by default 7 days). Startup time related options: -n, --no-startup-time Don’t check the startup time. Using this option the options '-w, --warning' and '-c, --critical' have no effect. Performance data about the startup time is collected, but no critical, warning etc. states are triggered. -w SECONDS, --warning SECONDS Startup time in seconds to result in a warning status. The default is 60 seconds. -c SECONDS, --critical SECONDS Startup time in seconds to result in a critical status. The default is 120 seconds. Monitoring data acquisition: --dbus Use the systemd’s D-Bus API instead of parsing the text output of various systemd related command line interfaces to monitor systemd. At the moment the D-Bus backend of this plugin is only partially implemented. --cli Use the text output of serveral systemd command line interface (cli) binaries to gather the required data for the monitoring process. --user Also show user (systemctl --user) units. Performance data: -P, --performance-data Attach no performance data to the plugin output. -p, --no-performance-data Attach performance data to the plugin output. Performance data: - count_units - startup_time - units_activating - units_active - units_failed - units_inactive
- on github.com
- on icinga.com
- on nagios.org
To detect failed units this monitoring script runs:
systemctl list-units --all
To get the startup time it executes:
systemd-analyze
To find dead timers this plugin launches:
systemctl list-timers --all
To learn how systemd
produces the text output on the command line,
it is worthwhile to take a look at systemd
’s source code. Files
relevant for text output are:
basic/time-util.c,
analyze/analyze.c.
pyenv install 3.6.12 pyenv install 3.7.9 pyenv local 3.6.12 3.7.9 pip3 install tox tox
Test a single test case:
tox -e py38 -- test/test_scope_timers.py:TestScopeTimers.test_all_n_a
Edit the version number in check_systemd.py (without v
). Use the
-s
option to sign the tag (required for the Debian package).
git tag -s v2.0.11 git push --tags