Skip to content

Commit

Permalink
Initial commit config-manage/consul
Browse files Browse the repository at this point in the history
  • Loading branch information
odyslam committed Oct 22, 2020
1 parent e068b92 commit 7634e10
Show file tree
Hide file tree
Showing 3 changed files with 127 additions and 0 deletions.
69 changes: 69 additions & 0 deletions configuration-management/Consul/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Configuration Management with Consul


Consul is a Hashicorp based tool for discovering and configuring a variety of different services in the infrastructure.

**Main Features:**
- Service Discovery
- Health Check Status
- Key/Value(KV) Store
- Multi Datacenter Deployment
- Web UI

**Documentation:** [Introduction to Consul](https://www.consul.io/docs/intro).

# Consul & Netdata

While Consul has many features and a number of users already use it in their infrastructure, we will be focusing on a particular functionality, that of the `KV store`.

We will populate the `KV store` with the configuration variables that we wish to set and change dynamically in Netdata Agent.

## Consul-Template

While the configuration variables will live inside the Consul agent, in order to populate the configuration files of Netdata Agent, we will need [consul-template](https://github.com/hashicorp/consul-template).

It's a simple CLI tool that populates a template file from the `KV Store` of a Consul Agent and then outputs the file to the pre-defined directory. Any change to the `KV Store`, will be picked up and a new file will be outputed. Netdata Agent will be restarted everytime to make sure that the change is picked up.

**Disclaimer**: Read the documentation of Consul-Template. It's a powerful tool that allows for all sorts of customization and dynamic control of the configuration, not just with simple `Key/Value` combinations. This example is dead-simple and does not make justice to the tool.

## Consul best-practices

According to consul [reference architecture](https://learn.hashicorp.com/tutorials/consul/reference-architecture), it is assumed that every system runs it's own Consul Agent, creating a cluster where a number of Consul Agents are defined as Servers and participate in the quorum of the system.

The cluster shares KV stores and other characteristics, so the changes propagate in the system.

Thus, this tutorial assumes that for every Netdata Agent, there is a consul-template process which manages the configuration for that particular Netdata Agent. Moreover, for every Netdata Agent, there is a Consul Agent that is accessible via `localhost`.

# Instructions

System: Ubuntu 18.04.4 LT
Scenario: Dynamically change the warning/critical levels for the `10min_cpu_usage` alarm of `health.d/cpu.conf`
Configuration for Consul-Template:
```
template {
source = "<absolute_path>/template.ctmpl"
destination = "/etc/netdata/health.d/cpu.conf"
command = "systemctl restart netdata "
}
log_level = "debug"
```
**Comment:** The `command` attribute states what command should `consul-template` run after every change of the file. Depending on how you run Netdata, you may wish to modify this.


1. Install [Netdata Agent](https://learn.netdata.cloud/docs/agent/packaging/installer)
3. Install [Consul](https://www.consul.io/docs/install#install-consul)
4. Install [Consul-Template](https://github.com/hashicorp/consul-template)
5. Run consul in `dev` mode: `consul -dev`
6. Clone this repository
7. Navigate to navigate-community/configuration-management/consul
8. Change the `<absolute_path>` placeholder inside `configuration.hcl`
9. Populate the KV Store by running `consul KV put warning_value_low 10`
1. Repeat this step for every variable you see in `template.ctmpl`
10. Run `sudo ./consul-template -config "./configuration.hcl"`
1. As per our [documentation](https://learn.netdata.cloud/guides/step-by-step/step-04), `sudo` is required to edit the Netdata configuration files.
11. Navigate to `localhost:19999` and see that the alarms variables for this alarm are not the default ones but are the ones we added in the `KV Store`
12. Change a value by running the same command as in step (8) but with a different value, observe that it propagates to the Netdata Agent.
13. Share your setup back with the Community by [making a PR](https://github.com/netdata/netdata-community/compare) and joining the discussion in the [Netdata Community](https://community.netdata.cloud/topic/162/configuration-management-with-consul).



6 changes: 6 additions & 0 deletions configuration-management/Consul/configuration.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
template {
source = "<absolute_path>/template.ctmpl"
destination = "/etc/netdata/health.d/cpu.conf"
command = "systemctl restart netdata "
}
log_level = "debug"
52 changes: 52 additions & 0 deletions configuration-management/Consul/template.ctmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
template: 10min_cpu_usage
on: system.cpu
os: linux
hosts: *
lookup: average -10m unaligned of user,system,softirq,irq,guest
units: %
every: 1m
warn: $this > (($status >= $WARNING) ? ({{ key "warning_value_low" }}) : ({{ key "warning_value_high" }}))
crit: $this > (($status == $CRITICAL) ? ({{ key "crit_value_low" }}) : ({{ key "crit_value_high" }}))
delay: down 15m multiplier 1.5 max 1h
info: average cpu utilization for the last 10 minutes (excluding iowait, nice and steal)
to: sysadmin

template: 10min_cpu_iowait
on: system.cpu
os: linux
hosts: *
lookup: average -10m unaligned of iowait
units: %
every: 1m
warn: $this > (($status >= $WARNING) ? (20) : (40))
crit: $this > (($status == $CRITICAL) ? (40) : (50))
delay: down 15m multiplier 1.5 max 1h
info: average CPU wait I/O for the last 10 minutes
to: sysadmin

template: 20min_steal_cpu
on: system.cpu
os: linux
hosts: *
lookup: average -20m unaligned of steal
units: %
every: 5m
warn: $this > (($status >= $WARNING) ? (5) : (10))
crit: $this > (($status == $CRITICAL) ? (20) : (30))
delay: down 1h multiplier 1.5 max 2h
info: average CPU steal time for the last 20 minutes
to: sysadmin

## FreeBSD
template: 10min_cpu_usage
on: system.cpu
os: freebsd
hosts: *
lookup: average -10m unaligned of user,system,interrupt
units: %
every: 1m
warn: $this > (($status >= $WARNING) ? (75) : (85))
crit: $this > (($status == $CRITICAL) ? (85) : (95))
delay: down 15m multiplier 1.5 max 1h
info: average cpu utilization for the last 10 minutes (excluding nice)
to: sysadmin

0 comments on commit 7634e10

Please sign in to comment.