Reference Metrics List

The following section lists/documents the metrics collected by various exporters used for Chef Managed Automate HA implementation. Similar metrics may be collected from AWS-hosted deployments.

Disclaimer

The following metrics are recommended to monitor Chef Automate HA implementation. These metrics guide how to use and build monitoring rules and dashboards based on these metrics. However, the actual usage and adoption of metrics depend on each organizational infrastructure monitoring policy.

System Metrics

Refer to the following exporters for the metric details.
- Node-Exporter

The following metrics are configured to generate alerts.

Component	Metrics Expr
CPU Usage	100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance,job) * 100) > 95
CPU Steal	(avg(irate(node_cpu_seconds_total{mode="steal"}[5m]) * 100) by(instance,job))> 20
System Memory Usage	100 - (node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100) > 95
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/"}/node_filesystem_size_bytes{mountpoint="/"}*100) > 85
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/"}/node_filesystem_size_bytes{mountpoint="/"}*100) > 90
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/hab"}/node_filesystem_size_bytes{mountpoint="/hab"}*100) > 85
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/hab"}/node_filesystem_size_bytes{mountpoint="/hab"}*100) > 90
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/tmp"}/node_filesystem_size_bytes{mountpoint="/tmp"}*100) > 85
Disk Utilization	100 - (node_filesystem_avail_bytes{mountpoint="/tmp"}/node_filesystem_size_bytes{mountpoint="/tmp"}*100) > 90
Host Monitoring	up == 0

Chef Automate Health Metrics

Refer to the following exporters for the metric details.
- Black-Box Exporter
- Nginx-Exporter

The following metrics are configured to generate alerts.

Component	Metrics Expr
Hab Service Status	probe_http_status_code{job=~"chef-server-services.*
Hab Service Status	probe_http_status_code{job=~"chef-server-services.*
Automate LB 5XX Alert	probe_http_status_code{job=~"chef-server-url
Chef-Server LB 5XX Alert	probe_http_status_code{job=~"chef-server-url

OpenSearch Metrics

Refer to the following OpenSearch plugin for the metric details.
- OpenSearch Plug-in

The following metrics are configured to generate alerts.

Component	Metrics Expr
ES Cluster Health Check	opensearch_cluster_nodes_number < 2
ES Heap Usage Factor	opensearch_jvm_mem_heap_used_percent > 95
ES Performance Alert	opensearch_index_search_fetch_time_seconds > 30
ES Performance Alert	opensearch_index_search_fetch_time_seconds > 60
ES Indexing latency Alert	opensearch_index_indexing_index_time_seconds > 500
Elasticsearch Search latency Alert	opensearch_index_search_query_time_seconds > 60

PostgreSQL Metrics

Refer to the following OpenSearch plugin for the metric details.
- PostgreSQL Exporter

The following metrics are configured to generate alerts:

Component	Metrics Expr
PG Can Connect	pg_up != 1
Connection Exhaustion	(sum(pg_stat_database_numbackends{server="10.100.12.36:5432"}) by(instance,job))/(avg(pg_settings_max_connections{server="10.100.12.36:5432"}) by(instance,job)) * 100 > 90
Connection Exhaustion	(sum(pg_stat_database_numbackends{server="10.100.12.36:5432"}) by(instance))/(avg(pg_settings_max_connections{server="10.100.12.36:5432"}) by(instance)) * 100 > 95
Managed PostgreSQL Write Latency	irate(node_disk_write_time_seconds_total{instance=~".pg."}[5m]) / irate(node_disk_writes_completed_total{instance=~".pg."}[5m]) > 300
Managed PostgreSQL Read Latency	irate(node_disk_read_time_seconds_total{instance=~".pg."}[5m]) / irate(node_disk_reads_completed_total{instance=~".pg."}[5m]) > 300

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus_Reference_Metrics_List.md

Prometheus_Reference_Metrics_List.md

Reference Metrics List

Disclaimer

System Metrics

Chef Automate Health Metrics

OpenSearch Metrics

PostgreSQL Metrics

Files

Prometheus_Reference_Metrics_List.md

Latest commit

History

Prometheus_Reference_Metrics_List.md

File metadata and controls

Reference Metrics List

Disclaimer

System Metrics

Chef Automate Health Metrics

OpenSearch Metrics

PostgreSQL Metrics