Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

Commit

Permalink
Merge pull request #99 from madflojo/develop
Browse files Browse the repository at this point in the history
Release 2017.04
  • Loading branch information
madflojo authored Apr 9, 2017
2 parents 8369cfa + 24b1c71 commit 55da07d
Show file tree
Hide file tree
Showing 27 changed files with 901 additions and 825 deletions.
3 changes: 1 addition & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,5 @@ $ sudo docker-compose up --build mkdocs
To wipe and reset the `docker-compose` environment simply run the following.

```console
$ sudo docker-compose kill automatron redis
$ sudo docker-compose rm automatron redis tests mkdocs
$ sudo docker-compose down
```
134 changes: 62 additions & 72 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,80 +1,70 @@
[![Build Status](https://travis-ci.org/madflojo/automatron.svg?branch=master)](https://travis-ci.org/madflojo/automatron) [![Coverage Status](https://coveralls.io/repos/github/madflojo/automatron/badge.svg?branch=master)](https://coveralls.io/github/madflojo/automatron?branch=master)


![Automatron](https://raw.githubusercontent.com/madflojo/automatron/master/docs/img/logo_huge.png)

Automatron **(Ah-Tom-a-tron)** is an open source framework designed to detect and remediate IT systems issues. Meaning, it can be used to monitor systems and when it detects issues; correct them.
Automatron is a framework for creating self-healing infrastructure. Simply put, it detects system events & takes action to correct them.

The goal of Automatron is to allow users to automate the execution of common tasks performed during system events. These tasks can be as simple as **sending an email** to as complicated as **restarting services across multiple hosts**.

## Features

* Automatically detect and add new systems to monitor
* Monitoring is executed over SSH and completely agent-less
* Policy based Runbooks allow for monitoring policies rather than server specific configurations
* Supports Nagios compliant health check scripts
* Allows arbitrary shell commands for both checks and actions
* Runbook flexibility with **Jinja2** templating support
* Pluggable Architecture that simplifies customization
* Automatically detect and add new systems to monitor
* Monitoring is executed over SSH and completely **agent-less**
* Policy based Runbooks allow for monitoring policies rather than server specific configurations
* Supports Nagios compliant health check scripts
* Allows dead simple **arbitrary shell commands** for both checks and actions
* Runbook flexibility with **Jinja2** templating support
* Pluggable Architecture that simplifies customization

## Runbooks

Automatron's actions are driven by policies called **Runbooks**. These runbooks are used to define what health checks should be executed on a target host and what to do about those health checks when they fail.
The core of Automatron is based around **Runbooks**. Runbooks are policies that define health checks and actions. You can think of them in the same way you would think of a printed runbook. Except with Automatron, the actions are automated.

### A simple Runbook
### A simple Runbook example

The below example is a Runbook that will execute a monitoring plugin to determine the amount of free space on `/var/log` and based on the results execute a corrective action.
The below runbook is a very basic example, it will check if NGINX is running (every 2 minutes) and restart it after 2 unsuccessful checks.

```yaml
name: Verify /var/log
schedule: "*/5 * * * *"
nodes:
- "*"
```yaml+jinja
name: Check NGINX
schedule: "*/2 * * * *"
checks:
mem_free:
# Check for the % of disk free create warning with 20% free and critical for 10% free
nginx_is_running:
execute_from: target
type: plugin
plugin: systems/disk_free.py
args: --warn=20 --critical=10 --filesystem=/var/log
type: cmd
cmd: service nginx status
actions:
logrotate_nicely:
restart_nginx:
execute_from: target
trigger: 0
trigger: 2
frequency: 300
call_on:
- WARNING
type: cmd
cmd: bash /etc/cron.daily/logrotate
logrotate_forced:
execute_from: target
trigger: 5
frequency: 300
call_on:
- CRITICAL
- UNKNOWN
type: cmd
cmd: bash /etc/cron.daily/logrotate --force
cmd: service nginx restart
```

### A Runbook with Jinja2
The above actions will be performed every 300 seconds (5 minutes) until the health check returns an OK status. This delay allows time for NGINX to restart after each execution.

Jinja2 support was added to Runbooks to allow for extensive customization. The below example shows using Jinja2 to determine which `cmd` to execute based on Automatron's **facts** system.
### A complex Runbook with Jinja2

This example will detect if `nginx` is running and if not, restart it.
This next runbook example is a more complex version of the above. In this example we will use Jinja2 and Automatron's Facts to enhance our runbook further.

```yaml
name: Verify nginx is running
```yaml+jinja
name: Check NGINX
{% if "prod" in facts['hostname'] %}
schedule:
second: "*/30"
nodes:
- "*web*"
second: */20
{% else %}
schedule: "*/2 * * * *"
{% endif %}
checks:
nginx_is_running:
# Check if nginx is running
execute_from: target
type: cmd
{% if "Linux" in facts['os'] %}
cmd: service nginx status
{% else %}
cmd: /usr/local/etc/rc.d/nginx status
{% endif %}
actions:
restart_nginx:
execute_from: target
Expand All @@ -83,46 +73,46 @@ actions:
call_on:
- WARNING
- CRITICAL
- UNKNOWN
type: cmd
{% if "Linux" in facts['os'] %}
cmd: service nginx restart
{% else %}
cmd: /usr/local/etc/rc.d/nginx restart
{% endif %}
remove_from_dns:
execute_from: remote
trigger: 0
frequency: 0
call_on:
- WARNING
- CRITICAL
- UNKNOWN
type: plugin
plugin: cloudflare/dns.py
args: remove [email protected] apikey123 example.com --content {{ facts['network']['eth0']['v4'][0] }}
```

For more examples and information on getting started checkout the Automatron [wiki](https://github.com/madflojo/automatron/wiki).
The above example uses **Jinja2** and **Facts** to create a conditional schedule. If our target server has a hostname that contains the word "prod" within it. The schedule for the health check will be every 20 seconds. If not, it will be every 2 minutes.

## Deploying with Docker
Another addition is the `remove_from_dns` action, which will remove the target server's DNS entry using the **CloudFlare DNS** plugin.

Deploying Automatron within Docker is quick and easy. Since Automatron by default uses `redis` as a datastore we must first start a `redis` instance.

```console
$ sudo docker run -d --restart=always --name redis redis
```

Once `redis` is up and running you can start an Automatron instance.

```console
$ sudo docker run -d --link redis:redis -v /path/to/config:/config --restart=always --name automatron madflojo/automatron
```
By using **Facts** and **Jinja2** together you can customize a single runbook to cover unique actions for multiple hosts and environments.

## Stay in the loop

Follow [@Automatronio on Twitter](https://twitter.com/automatronio) for the latest Automatron news and join the community in [#Automatron on Gitter](https://gitter.im/madflojo/automatron).
[![Twitter Follow](https://img.shields.io/twitter/follow/automatronio.svg?style=flat-square)](https://twitter.com/automatronio) [![Gitter](https://badges.gitter.im/madflojo/automatron.svg)](https://gitter.im/madflojo/automatron?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)

## License

Copyright 2016 Benjamin Cane
```
Copyright 2016 Benjamin Cane
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
37 changes: 23 additions & 14 deletions config/config.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -2,61 +2,70 @@
config_path: config
runbook_path: config/runbooks
plugin_path: plugins/
ssh: # SSH Configuration
ssh:
user: root
gateway: False
key: |
-----BEGIN RSA PRIVATE KEY-----
fdlkfjasldjfsaldkjflkasjflkjaflsdlkfjs
-----END RSA PRIVATE KEY-----

monitoring: # Monitoring configuration
## Checks
monitoring:
upload_path: /tmp

actioning: # Actioning configuration
## Actions
actioning:
upload_path: /tmp

logging: # Logging Configurations
## Logging
logging:
debug: True
plugins:
console: True
syslog:
facility: local0

discovery: # Discovery Configurations
## Host Discovery
discovery:
upload_path: /tmp/
vetting_interval: 30
plugins:
webping: # Web Service for HTTP GET or POST requests
# Web Service for HTTP PINGs
webping:
ip: 0.0.0.0
port: 20000
# nmap: # NMAP Scanning for new hosts
port: 8000
# nmap:
# # NMAP Scanning for new hosts
# target: 10.0.0.1/8
# flags: -sP
# interval: 40
# digitalocean: # Query DO's API
# digitalocean:
# # Query DO's API
# url: https://api.digitalocean.com/v2
# api_key: example
# interval: 60
# aws: # Query AWS' API
# aws:
# # Query AWS' API
# aws_access_key_id: example
# aws_secret_access_key: example
# interval: 60
# filter:
# - PublicIpAddress
# - PrivateIpAddress
# linode:
# # Query Linode's API
# url: https://api.linode.com
# api_key: example
# interval: 60


datastore: # Datastore Configurations
## Default Datastore Engine
## Datastore
datastore:
## Default Datastore Engine
engine: redis
## Datastore Specific configuration
plugins:
## Redis
# Redis
redis:
db: 0
host: redis
Expand Down
2 changes: 0 additions & 2 deletions config/runbooks/examples/disk_free/init.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: Verify /var/log
schedule: "*/2 * * * *"
nodes:
- "*"
checks:
disk_free:
# Check for the % of disk free create warning with 20% free and critical for 10% free
Expand Down
2 changes: 0 additions & 2 deletions config/runbooks/examples/docker/clear_dangling_images.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: Clean up dangling images if they are found
schedule: "*/2 * * * *"
nodes:
- "*"
checks:
find_danglers:
execute_from: ontarget
Expand Down
2 changes: 0 additions & 2 deletions config/runbooks/examples/docker/clear_dead_containers.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: Clean up dead containers if they are found
schedule: "*/2 * * * *"
nodes:
- "*"
checks:
find_dead_containers:
execute_from: ontarget
Expand Down
2 changes: 0 additions & 2 deletions config/runbooks/examples/docker/init.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: Verify Docker is running
schedule: "* * * * *"
nodes:
- "*"
checks:
docker_running:
execute_from: ontarget
Expand Down
2 changes: 0 additions & 2 deletions config/runbooks/examples/nginx/init.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
name: Verify nginx is running
schedule: "* * * * *"
nodes:
- "*"
checks:
nginx_is_running:
# Check if nginx is running
Expand Down
7 changes: 4 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ services:
build: .
command: python /tests.py
mkdocs:
image: thinkcube/mkdocs
build:
context: .
dockerfile: docs/Dockerfile
volumes:
- ./:/automatron
- ./:/tmp/mkdocs
ports:
- 8000:8000
working_dir: /automatron
command: mkdocs serve -a 0.0.0.0:8000
coverage:
build: .
Expand Down
4 changes: 4 additions & 0 deletions docs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FROM thinkcube/mkdocs
RUN pip install mkdocs-material pygments>=2.2 pymdown-extensions>=2.0
RUN mkdir -p /tmp/mkdocs
WORKDIR /tmp/mkdocs
Loading

0 comments on commit 55da07d

Please sign in to comment.