Skip to content

Commit

Permalink
rsyslogd-aggregation: Add revised rsyslogd-aggregation article
Browse files Browse the repository at this point in the history
Prior article is https://www.fluentd.org/guides/recipes/rsyslogd-aggregation.
Related to fluent#566.

Signed-off-by: Hiroshi Hatake <[email protected]>
  • Loading branch information
cosmo0920 committed Feb 6, 2019
1 parent c206eb4 commit bae3ee7
Show file tree
Hide file tree
Showing 4 changed files with 131 additions and 0 deletions.
130 changes: 130 additions & 0 deletions docs/v1.0/rsyslogd-aggregation.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Aggregating Rsyslogd Output into a Central Fluentd

rsyslogd is a tried and true piece of middleware to collect and aggregate syslogs.

Once aggregated into the central server (which is also running rsyslogd), the syslog data
is periodically bulk loaded into various data backends like databases, search indexers
and object storage systems.

<img src="/images/before-fluentd-rsyslogd.png"/>

The above architecture can be improved in a few ways:

1. **Adding a new data consumer requires scripting**: each new data source requires a data load script
that needs to be written and maintained. This means an engineering overhead that grows linearly with the
number of data consumers.
2. **Data is pulled, not pushed**: because data is pulled by data consumers and not
pushed by the aggregator rsyslogd, the scripts need to run very frequently to get fresh data.
A better alternative is to have the aggregator push data to each data consumer, but there is
no out-of-the-box way to do this with rsyslogd.

By replacing the central rsyslogd aggregator with Fluentd addresses both 1. and 2.

<img src="/images/after-fluentd-rsyslogd.png"/>

1. Fluentd supports many data consumers out of the box. By installing an appropriate output plugin,
one can add a new data source with a few configuration changes.
2. Fluentd pushes data to each consumer with tunable frequency and buffering settings.

The rest of the article shows how to set up Fluentd as the central syslog aggregator to
stream the aggregated logs into Elasticsearch.

## Prerequisites

- A basic understanding of Fluentd and rsyslogd
- A running instance of Elasticsearch

**In this guide, we assume we are running [td-agent](/download) on Ubuntu Xenial.**

## Setup: rsyslogd

If remote rsyslogd instances are already collecting data into the aggregator rsyslogd,
the settings for rsyslog should remain unchanged. However, if this is a brandnew setup,
start forward syslog output by adding the following line to `/etc/rsyslogd.conf`

```
*.* @182.39.20.2:42185
```

You should replace "182.39.20.2" with the IP address of your aggregator server. Also,
there is nothing special about port 42185 (do make sure this port is open though).

## Setup: Fluentd

On your aggregator server, set up Fluentd. [See here](/download) for the details.

```
$ curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-xenial-td-agent3.sh | sh
```

Next, the Elasticsearch output plugin needs to be installed. Run

```
/usr/sbin/td-agent-gem install fluent-plugin-elasticsearch
```

If you are using vanilla Fluentd, run

```
fluent-gem install fluent-plugin-elasticsearch
```

You might need to `sudo` to install the plugin.

Finally, configure `/etc/td-agent/td-agent.conf` as follows.

```
<source>
@type syslog
port 42185
tag rsyslog
</source>

<match rsyslog.**>
@type copy
<store>
# for debug (see /var/log/td-agent.log)
@type stdout
</store>
<store>
@type elasticsearch
logstash_format true
<buffer>
@type memory
flush_interval 10s # for testing.
flush_thread_count 2
</buffer>
host YOUR_ES_HOST
port YOUR_ES_PORT
</store>
</match>
```

## Restart and Confirm That Data Flow into Elasticsearch

Restart td-agent with `sudo systemctl restart td-agent`. Then, run `tail` against /var/log/td-agent.log. You should see the following lines:

```
2014-06-01 19:41:28 +0000 rsyslog.kern.info: {"host":"precise64","ident":"kernel","message":"[49851.032200] docker0: port 2(veth6091) entering disabled state"}
```

Then, query Elasticsearch to make sure the data is in there. For example, one can aggregate and filter data based on hostname.

## What's Next?

In production, you might want to remove writing output into stdout. So, use the following output configuration:

```
<match rsyslog.*>
@type elasticsearch
logstash_format true
host YOUR_ES_HOST
port YOUR_ES_PORT
<buffer>
@type memory
flush_thread_count 2 # or more number upto logical cpu cores.
</buffer>
</match>
```

Do you wish to store rsyslogd logs into other systems? Check out other [monitoring service logs!](/categories/monitoring-service-logs).
1 change: 1 addition & 0 deletions lib/toc.en.v1.0.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
article 'free-alternative-to-splunk-by-fluentd', 'Free Alternative to Splunk by Fluentd + Elasticsearch', ['Splunk', 'Free Alternative']
article 'splunk-like-grep-and-alert-email', 'Email Alerts like Splunk', ['Splunk', 'Alerting']
article 'parse-syslog', 'Parse Syslog Messages Robustly'
article 'rsyslogd-aggregation', 'Aggregating Rsyslogd Output into a Central Fluentd'
end
category 'data-analytics', 'Data Analytics' do
article 'http-to-td', 'Data Analytics with Treasure Data', ['Treasure Data', 'Hadoop', 'Hive']
Expand Down
Binary file added public/images/after-fluentd-rsyslogd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added public/images/before-fluentd-rsyslogd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit bae3ee7

Please sign in to comment.