Skip to content

Riemann Integration

Nick Satterly edited this page Jul 16, 2014 · 4 revisions

Integrating Ganglia with Riemann

Introduction

Riemann is a powerful event stream processor that accepts metric events and processes them in real-time. It can be used to threshold and alert on Ganglia metric data, at scale.

Ganglia metrics map to Riemann events as follows:

Ganglia Riemann
grid grid
cluster cluster
host host*
ip ip
metric service*
value(int,float) metric*
type (internal)
units description*
value(string) state*
reported time*
tags(comma-sep) tags*
location location
tmax ttl*

Note: attributes with a star (*) are standard Riemann fields.

Building Gmetad with Riemann Support

The native Riemann protocol is based on Google's Protocol Buffers so the protobuf tools and libraries need to be installed before gmetad will build with Riemann support.

For Debian ...

$ sudo apt-get install autoconf automake libtool
$ sudo apt-get install libapr1-dev libconfuse-dev librrd-dev
$ sudo apt-get install protobuf-c-compiler libprotobuf-c0-dev  # <= required for Riemann

On RedHat, CentOS ...

$ sudo yum install autoconf automake libtool
$ sudo yum install apr-devel libconfuse-devel rrdtool-devel expat-devel pcre-devel
$ sudo yum install protobuf-compiler protobuf-c-devel  # <= required for Riemann

Note: Add the EPEL Repo for the protobuf RPM packages.

Then configure and build using the --with--riemann option ...

$ ./bootstrap
$ ./configure --with-gmetad --with-riemann
$ make
$ sudo make install

Setting up Riemann

Riemann is based on clojure which runs in a JVM so an up-to-date version of Java is required:

$ sudo apt-get install openjdk-7-jre  # => Debian, Ubuntu

$ sudo yum install java-1.7.0-openjdk  # => RedHat, CentOS

To get a minimal installation of Riemann running use the following commands:

$ wget http://aphyr.com/riemann/riemann-0.2.2.tar.bz2
$ tar xvfj riemann-0.2.2.tar.bz2
$ cd riemann-0.2.2
$ sudo mkdir /var/log/riemann
$ bin/riemann etc/riemann.config

Ganglia Configuraiton

The following basic gmetad.conf configuration will forward Ganglia metrics to Riemann running on my.riemann.box and add the two attributes customer and environment to all metrics:

riemann_server "my.riemann.box"
riemann_port 5555
riemann_protocol udp
riemann_attributes "customer=Acme Corp,environment=PROD"

Riemann Configuration

Modify the example riemann.config file that comes with the installation as follows:

; -*- mode: clojure; -*-
; vim: filetype=clojure

(logging/init :file "riemann.log")

; Listen on the local interface over TCP (5555), UDP (5555), and websockets
; (5556)
(let [host "127.0.0.1"]
  (tcp-server :host host)
  (udp-server :host host)
  (ws-server  :host host))

; Expire old events from the index every 5 seconds.
(periodically-expire 5)

; Keep events in the index for 5 minutes by default.
(let [index (default :ttl 300 (update-index (index)))]

  ; Inbound events will be passed to these streams:
  (streams

    ; Index all events immediately.
    index

    ; Calculate an overall rate of events.
    (with {:metric 1 :host nil :state "ok" :service "events/sec"}
      (rate 5 index))

    ; Log expired events.
    (expired
      (fn [event] (info "expired" event)))

    ; Ganglia-Riemann demo config.
    (where (not (state "expired"))

      ; Compare ganglia agent heartbeats against sliding time window
      (match :service "heartbeat"
        (splitp < (- (unix-time) metric)
          120 (with :state "critical" prn)
          60 (with :state "major" prn)
          (with :state "normal" prn)))

      ; Alert on different values of string metric (using state).
      (match :service "gexec"
        (where (= state "OFF")
          (with {:state "warning" :description "gexec is OFF"} prn)
          (else (with {:state "normal" :description "gexec is ON"} prn))))

      ; Various thresholds against disk space utilisation.
      (match :service "part_max_used"
        (splitp < metric
          95 (with :state "critical" prn)
          90 (with :state "major" prn)
          80 (with :state "minor" prn)
          (with :state "normal" prn)))
    )
))

Note: The only difference is the addition of the Ganglia-Riemann demo config near the end. Make sure you don't miss any matching brackets while cut-and-pasting.

Testing & Debug

To test the above configuration run the following command in a terminal session ...

$ bin/riemann etc/riemann.config

You should see output similar to the following ...

#riemann.codec.Event{:host "localhost", :service "heartbeat", :state "normal", :description "seconds", :metric 1384182379, :tags nil, :time 692091191493/500, :ttl 80.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}
#riemann.codec.Event{:host "localhost", :service "gexec", :state "warning", :description "gexec is OFF", :metric nil, :tags nil, :time 692091191551/500, :ttl 300.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}
#riemann.codec.Event{:host "localhost", :service "part_max_used", :state "normal", :description "%", :metric 11.6, :tags nil, :time 692091191559/500, :ttl 180.0, :environment "PROD", :customer "Acme Corp", :location "unspecified", :ip "::1", :cluster "unspecified", :grid "unspecified"}

Additional Info

For more information regarding Riemann, please refer to the Riemann website.

This work relied heavily on the C examples of Google Protocol Buffers.

The Riemann protobuf buffer .proto file was originally sourced from the Riemann GitHub repo.