Monitoring, Alerting, and Centralised Logging integration support with Chef Automate HA

The Chef Automate HA equates to reliability, efficiency, and productivity, built on Redundancy and Failover. It aids in addressing significant issues like service failure and zone failure. Please refer to the public documentation of Automate HA for more information.

This document provides guided steps on how to build and integrate Monitoring, Alerting, and Centralized logging tools with Chef Automate HA. Based on our analysis we have selected a few tools which is our recommendation.

Abstract:

Monitoring Recommendations

Tools for Monitoring

Datadog
Prometheus
AWS CloudWatch

Tools for Alerting

Pager Duty
Slack
Microsoft Teams

Tools for Centralised Logging

ELK (Elasticsearch, Logstash, and Kibana)
Datadog
AWS CloudWatch

Introduction

The Chef engineering team has comprehensively documented the recommended monitoring metrics to offer visibility into the operational health of the Chef Automate HA solution.

As part of the guided steps of integration for the above-mentioned tools, we will capture the below use cases from an integration perspective:

Agent download and configuration

This use case covers the steps to download and configure the tools agent which will be running on the nodes(of the Automate HA infrastructure) and will be responsible for scraping the metrics and logs from those nodes. This section also covers the type of configurations that need to be stepped to scrap various kinds of component-level metrics.

Agent Installation

This use case covers the steps to install the agent and any other extra setup that is required to ensure the metrics and logs are covered from each node and a component of Automate HA.

Server setup and configuration

This use case covers the steps of server setup installation and configuration recommendations.

Dashboard Setup and Configuration

This use case covers the list of recommended dashboards and how to set them up based on various tools and steps. This also covers the various configuration aspects that are required for bringing up the dashboard.

Metrics Configuration and Monitoring Rules Setup

This use case covers the list of recommended metrics for the Automate HA system and various levels of recommended rules to be applied to creating the monitoring based on these metrics. These are just the recommendations only and based on organizational requirements they can add more rules, update these rules, and alerting mechanisms as required.

Slack Integration with the tool

This use case covers permissions and configuration required for allowing Slack to connect with the tool. This also covers the step-wise setup of alerting groups/channels under monitoring rules to receive alerts based on the threshold logic.

Pager Duty Integration with the tool

This use case covers permissions and configuration required for allowing Slack to connect with the tool. This also covers the step-wise setup of alerting groups/channels under monitoring rules to receive alerts based on the threshold logic.

Datadog integration with Automate HA - Monitoring

Datadog Agent Configuration and Installation for Chef Managed nodes
Datadog Metrics configuration and Integration with AWS for AWS Managed services
Metrics Monitor Configuration and Monitoring Rules Setup
Dashboard Setup and Configuration

Datadog integration with Automate HA - Alerting

Slack Integration
PagerDuty Integration
MS Teams Integration

Datadog integration with Automate HA - Centralized Logging

Datadog Centralized Logs Management

Prometheus integration with Automate HA - Monitoring

Prometheus Server Configuration and Installation
Prometheus Agent Configuration and Installation
Prometheus Metrics and Alertmanager configuration
Dashboard Setup and Configuration

Prometheus integration with Automate HA - Alerting

Slack Integration
PagerDuty Integration
MS Teams Integration

ELK integration with Automate HA - Centralized Logging

ELK - Configuration and Installation
ELK Agent - Filebeat Configuration, Installation, and Logging

CloudWatch integration with Automate HA - Monitoring

Metrics Monitor Configuration and Monitoring Rules Setup
Dashboard Setup and Configuration

CloudWatch integration with Automate HA - Alerting

Slack Integration
PagerDuty Integration

CloudWatch integration with Automate HA - Centralized Logging

AWS CloudWatch Centralized Logs Management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitepaper_AutomateHA_Monitoring_and_Alerting.md

Whitepaper_AutomateHA_Monitoring_and_Alerting.md

Monitoring, Alerting, and Centralised Logging integration support with Chef Automate HA

Abstract:

Monitoring Recommendations

Tools for Monitoring

Tools for Alerting

Tools for Centralised Logging

Introduction

Agent download and configuration

Agent Installation

Server setup and configuration

Dashboard Setup and Configuration

Metrics Configuration and Monitoring Rules Setup

Slack Integration with the tool

Pager Duty Integration with the tool

Datadog integration with Automate HA - Monitoring

Datadog integration with Automate HA - Alerting

Datadog integration with Automate HA - Centralized Logging

Prometheus integration with Automate HA - Monitoring

Prometheus integration with Automate HA - Alerting

ELK integration with Automate HA - Centralized Logging

CloudWatch integration with Automate HA - Monitoring

CloudWatch integration with Automate HA - Alerting

CloudWatch integration with Automate HA - Centralized Logging

Files

Whitepaper_AutomateHA_Monitoring_and_Alerting.md

Latest commit

History

Whitepaper_AutomateHA_Monitoring_and_Alerting.md

File metadata and controls

Monitoring, Alerting, and Centralised Logging integration support with Chef Automate HA

Abstract:

Monitoring Recommendations

Tools for Monitoring

Tools for Alerting

Tools for Centralised Logging

Introduction

Agent download and configuration

Agent Installation

Server setup and configuration

Dashboard Setup and Configuration

Metrics Configuration and Monitoring Rules Setup

Slack Integration with the tool

Pager Duty Integration with the tool

Datadog integration with Automate HA - Monitoring

Datadog integration with Automate HA - Alerting

Datadog integration with Automate HA - Centralized Logging

Prometheus integration with Automate HA - Monitoring

Prometheus integration with Automate HA - Alerting

ELK integration with Automate HA - Centralized Logging

CloudWatch integration with Automate HA - Monitoring

CloudWatch integration with Automate HA - Alerting

CloudWatch integration with Automate HA - Centralized Logging