-
Notifications
You must be signed in to change notification settings - Fork 245
Ganglia GMond Python Modules
One of the new features of Ganglia 3.1.x is the ability to create C/Python metric gathering modules. These modules can be plugged directly into gmond to monitor user-specified metrics.
In previous versions (2.5.x, 3.0.x), the only way to add user-specified metrics is via a command line tool called gmetric and the way to inject metrics into gmond is simply to run gmetric via a cronjob or some other process. While this works for most people, it makes user-specified metrics difficult to manage.
This document will dive into the specifics for writing a Python metric monitoring module.
The following are prerequisites for building/using Python module support:
- Ganglia 3.1.x
- Python 2.3.4+ (this is the oldest tested version which comes with Red Hat Enterprise Linux 4, older 2.3 versions should work as well)
- Python development headers (usually in the form of
python-devel
binary packages)
If you are trying to install Python metric modules support on a
RPM-based system, install the ganglia-gmond-modules-python
RPM. This
includes everything needed for Python metric modules support to work.
apt-get install ganglia-monitor
Also see additional notes below.
If you are building from source, please make sure that you include the
--with-python
option during configure. If the Python interpreter is
detected, this option will be added automatically.
To confirm that your Ganglia installation has Python support correctly setup, double check the following:
-
gmond.conf
has a line which reads something along the lines ofinclude ("/etc/ganglia/conf.d/*.conf")
. This is the directory where you should place configuration files for your Python modules as.pyconf
files -
modpython.conf
exists in/etc/ganglia/conf.d
- it contains a directive which will include the pyconf files - You have
modpython.so
in/usr/lib{64}/ganglia
- The directory
/usr/lib{64}/ganglia/python_modules
exists. This is the directory where Python modules should be placed as.py
files.
These things should be automatically done for you if you installed Python modules support via binary packages. If that is not the case please file a bug at the distribution's corresponding bug tracker.
Ubuntu 10.10 does not come with Python support for gmond fully setup. You will need to:
-
Create /etc/ganglia/conf.d/modpython.conf and make it look like https://sourceforge.net/apps/trac/ganglia/browser/trunk/monitor-core/gmond/modules/conf.d/modpython.conf.in
- for instance:
modules { module { name = "python_module" path = "/usr/lib(64)/ganglia/modpython.so" params = "/usr/lib(64)/ganglia/python_modules" } } include('/etc/ganglia/conf.d/*.pyconf')
-
Create the directory /usr/lib(64)/ganglia/python_modules
-
Ensure that /usr/lib(64)/ganglia/modpython.so already exists (Ubuntu 10.10 gets this one right when you install ganglia via apt)
Writing a Python module is very simple. You just need to write it following a template and put the resulting Python module (.py) in /usr/lib(64)/ganglia/python_modules. A corresponding Python Configuration (.pyconf) file needs to reside in /etc/ganglia/conf.d/.
If your Python module needs to access certain files on the server, keep
in mind that the module will be executed as the user which runs gmond
.
In other words, if gmond
runs as user nobody
then your module will
also run as nobody
. So make sure that the user which runs gmond
has
the correct permissions to access the files in question.
The Ganglia distribution comes with an example Python module in /usr/lib(64)/ganglia/python_modules/example.py. Alternatively, this file is also viewable from our SVN repository: http://ganglia.svn.sourceforge.net/viewvc/ganglia/branches/monitor-core-3.1/gmond/python_modules/example/example.py?view=markup. There are many more modules you can look at for inspiration in the github repo: https://github.com/ganglia/gmond_python_modules.
Let's look at a real-life example of a Python module which monitors the temperature of the host, by reading a file in the /proc file system, let's call this temp.py:
acpi_file = "/proc/acpi/thermal_zone/THRM/temperature"
def temp_handler(name):
try:
f = open(acpi_file, 'r')
except IOError:
return 0
for l in f:
line = l.split()
return int(line[1])
def metric_init(params):
global descriptors, acpi_file
if 'acpi_file' in params:
acpi_file = params['acpi_file']
d1 = {'name': 'temp',
'call_back': temp_handler,
'time_max': 90,
'value_type': 'uint',
'units': 'C',
'slope': 'both',
'format': '%u',
'description': 'Temperature of host',
'groups': 'health'}
descriptors = [d1]
return descriptors
def metric_cleanup():
'''Clean up the metric module.'''
pass
#This code is for debugging and unit testing
if __name__ == '__main__':
metric_init({})
for d in descriptors:
v = d['call_back'](d['name'])
print 'value for %s is %u' % (d['name'], v)
There are three functions that must exist in every python metric module. These functions are:
- def metric_init(params):
- def metric_cleanup():
- def metric_handler(name):
While the first two functions above must exist explicitly (ie. they must be named as specified above), the metric_handler() function can actually be named anything. The functions are explored in detail below.
This function must exist and explicitly named 'metric_init' in your module. It will be called once at initialization time - that is, once when gmond starts up. It can be used to do any kind of initialization that the module requires in order to properly gather the intended metric.
metric_init() also takes a single dictionary type parameter which contains configuration directives that were designated for this module in the gmond.conf file. In addition to any other initialization that is done, the function must also create, populate and return the metric description dictionary or a dictionary list. Each description dictionary must contain the following elements:
- name: name of the metric
- call_back: The function in your module to call when collecting
metric data
- If your metric module supports multiple metrics, each being defined through their own metric descriptor, your module may actually implement more than one metric_handler function.
- time_max: maximum time in seconds between metric collection calls
- The exact nature of this element is unclear, as is its relationship to the 'collect_every' configuration directive in your pyconf for the module. For all intents and purposes, this element seems... useless.
- value_type: string | uint | float | double
- units: unit of your metric
- slope: zero | positive | negative | both
- This value maps to the data source types defined for RRDTool
- If 'positive', RRD file generated will be of COUNTER type (calculating the rate of change)
- If 'negative', ????
- 'both' will be of GAUGE type (no calculations are performed, graphing only the value reported)
- If 'zero', the metric will appear in the "Time and String
Metrics" or the "Constant Metrics" depending on the
value_type
of the metric
- format: format string of your metric
- Must correspond to value_type otherwise value of your metric will be unpredictable (reference: http://docs.python.org/library/stdtypes.html#string-formatting)
- description: description of your metric
- Visible in web frontend if you hover over host metric graph
- groups (optional): groups your metric belongs to
- The group(s) in the web frontend with which this metric will be associated
These elements are basically the same type of data that must be supplied to the gmetric commandline utility with the exception of the call_back function. See the gmetric help document for more information.
The metric descriptor can also include additional attributes and values which will be attached to the metric metadata as extra data. The extra data will be ignored by Ganglia itself but can be used by the web front as additional display or metric handling data. (The use of SPOOF_HOST and SPOOF_NAME extra attributes are examples that will be described in a later version.)
This function must exist and explicitly named 'metric_cleanup' in your module. It will be called only once when gmond is shutting down. Any module clean up code can be executed here and the function must not return a value.
The 'metric_handler' function can actually be called anything you want, as long as it matches the name of the function you defined in the corresponding 'call_back' element in your metric descriptor. It takes one parameter, 'name', which is the value defined in the 'name' element in your metric descriptor.
The corresponding config file for the module, temp.pyconf, lives in
/etc/ganglia/conf.d/temp.pyconf
and looks like this:
modules {
module {
name = "temp"
language = "python"
# The following params are examples only
# They are not actually used by the temp module
param RandomMax {
value = 600
}
param ConstantValue {
value = 112
}
}
}
collection_group {
collect_every = 10
time_threshold = 50
metric {
name = "temp"
title = "Temperature"
value_threshold = 70
}
}
The above configuration file contains two major sections with various sub-sections: modules and collection_group.
The modules section contains configuration data that is specific to each module being loaded. It may contain either a single module sub-section or multiple sub-sections. Within each module sub-section is the name of the metric module, the language in which the module was written and zero or more module specific param'(s)".
The name of the module corresponds to the filename of the module you created (without the ".py").
Unless you've written your module in C/C++, you MUST explicitly declare the language of your module in the pyconf. Declaring 'python' as your language instructs gmond to look in the python_modules directory for your module.
Each param sub-section has a name and a value. The name and value make up the name/value pair that is passed into the metric_init() function as a params list as described above. The parameters defined here are passed to your module in the metric_init function as a dictionary, where the 'name' of the parameter is the key, and the value is... the value. Therefore you can access your custom params with something like this:
RandomMax = 500
def metric_init(params):
global RandomMax
if 'RandomMax' in params:
RandomMax = params['RandomMax']
...
The rest of the configuration file follows the same format as for any other collection_group or metric. Looking at the man page for gmond.conf is particularly instructive, but we'll go over the example collection_group directives here:
collect_every
tells gmond the frequency (in seconds) with which to
collect data from the metrics defined in this collection_group. In the
example, the 'temp' metric will be collected every 10 seconds.
You can also instruct gmond to collect 'static metrics', which should be
collected only once (at gmond startup), with collect_once=yes
. This is
useful for things that shouldn't change on the server between reboots
(eg number of CPUs).
The maximum frequency (in seconds) with which to report metric data to Ganglia. In the case of the example, the temp module will report to Ganglia at least every 50 seconds.
This directive is superseded in the event that the value of a collected metric is greater than the metric's defined 'value_threshold' (see below).
This is where you define metric-specific settings:
- name: The name of a specific metric, as defined in the descriptor dictionary in your module
- title: An optionally human-readable title for your metric that will be displayed in the Ganglia front-end
- value_threshold: If your metric reports a value above the value (in the units defined in your metric descriptor) defined here, it will be reported to Ganglia regardless of the 'time_threshold' defined for the collection_group
Additional information about Python modules can be found in the README file: http://ganglia.svn.sourceforge.net/viewvc/ganglia/branches/monitor-core-3.1/gmond/modules/python/README.in?view=markup
Some helpful user-contributed resources: