-
Notifications
You must be signed in to change notification settings - Fork 245
Riemann Integration
Riemann is a powerful event stream processor that accepts metric events and processes them in real-time. It can be used to threshold and alert on Ganglia metric data, at scale.
Ganglia metrics map to Riemann events as follows:
Ganglia | Riemann |
---|---|
grid | grid |
cluster | cluster |
host | host* |
ip | ip |
metric | service* |
value(int,float) | metric* |
type | (internal) |
units | description* |
value(string) | state* |
reported | time* |
tags(comma-sep) | tags* |
location | location |
tmax | ttl* |
Note: attributes with a star (*) are standard Riemann fields.
The native Riemann protocol is based on Google's Protocol Buffers so the protobuf
tools and libraries need to be installed before gmetad
will build with Riemann support.
For Debian ...
$ sudo apt-get install autoconf automake libtool
$ sudo apt-get install libapr1-dev libconfuse-dev librrd-dev
$ sudo apt-get install protobuf-c-compiler libprotobuf-c0-dev # <= required for Riemann
On RedHat, CentOS ...
$ sudo yum install autoconf automake libtool
$ sudo yum install apr-devel libconfuse-devel rrdtool-devel expat-devel pcre-devel
$ sudo yum install protobuf-compiler protobuf-c-devel # <= required for Riemann
Note: Add the EPEL Repo for the protobuf
RPM packages.
Then configure and build using the --with--riemann
option ...
$ ./bootstrap
$ ./configure --with-gmetad --with-riemann
$ make
$ sudo make install
Riemann is based on clojure which runs in a JVM so an up-to-date version of Java is required:
$ sudo apt-get install openjdk-7-jre # => Debian, Ubuntu
$ sudo yum install java-1.7.0-openjdk # => RedHat, CentOS
To get a minimal installation of Riemann running use the following commands:
$ wget http://aphyr.com/riemann/riemann-0.2.2.tar.bz2
$ tar xvfj riemann-0.2.2.tar.bz2
$ cd riemann-0.2.2
$ sudo mkdir /var/log/riemann
$ bin/riemann etc/riemann.config
The following basic gmetad.conf
configuration will forward Ganglia metrics to Riemann running on my.riemann.box
and add the two attributes customer
and environment
to all metrics:
riemann_server "my.riemann.box"
riemann_port 5555
riemann_protocol udp
riemann_attributes "customer=Acme Corp,environment=PROD"
Modify the example riemann.config
file that comes with the installation as follows:
; -*- mode: clojure; -*-
; vim: filetype=clojure
(logging/init :file "riemann.log")
; Listen on the local interface over TCP (5555), UDP (5555), and websockets
; (5556)
(let [host "127.0.0.1"]
(tcp-server :host host)
(udp-server :host host)
(ws-server :host host))
; Expire old events from the index every 5 seconds.
(periodically-expire 5)
; Keep events in the index for 5 minutes by default.
(let [index (default :ttl 300 (update-index (index)))]
; Inbound events will be passed to these streams:
(streams
; Index all events immediately.
index
; Calculate an overall rate of events.
(with {:metric 1 :host nil :state "ok" :service "events/sec"}
(rate 5 index))
; Log expired events.
(expired
(fn [event] (info "expired" event)))
; Ganglia-Riemann demo config.
(where (not (state "expired"))
; Compare ganglia agent heartbeats against sliding time window
(match :service "heartbeat"
(splitp < (- (unix-time) metric)
120 (with :state "critical" prn)
60 (with :state "major" prn)
(with :state "normal" prn)))
; Alert on different values of string metric (using state).
(match :service "gexec"
(where (= state "OFF")
(with {:state "warning" :description "gexec is OFF"} prn)
(else (with {:state "normal" :description "gexec is ON"} prn))))
; Various thresholds against disk space utilisation.
(match :service "part_max_used"
(splitp < metric
95 (with :state "critical" prn)
90 (with :state "major" prn)
80 (with :state "minor" prn)
(with :state "normal" prn)))
)
))
Note: The only difference is the addition of the Ganglia-Riemann demo config near the end. Make sure you don't miss any matching brackets while cut-and-pasting.
To test the above configuration run the following command in a terminal session ...
$ bin/riemann etc/riemann.config
You should see output similar to the following ...
#riemann.codec.Event{:host "localhost", :service "heartbeat", :state "normal", :description "seconds", :metric 1384182379, :tags nil, :time 692091191493/500, :ttl 80.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}
#riemann.codec.Event{:host "localhost", :service "gexec", :state "warning", :description "gexec is OFF", :metric nil, :tags nil, :time 692091191551/500, :ttl 300.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}
#riemann.codec.Event{:host "localhost", :service "part_max_used", :state "normal", :description "%", :metric 11.6, :tags nil, :time 692091191559/500, :ttl 180.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}
For more information regarding Riemann, please refer to the Riemann website.
This work relied heavily on the C examples of Google Protocol Buffers.
The Riemann protobuf buffer .proto
file was originally sourced from the Riemann GitHub repo.