Skip to content

Commit

Permalink
Merge pull request #58 from stakater/update-alerting-and-monitoring
Browse files Browse the repository at this point in the history
Update alerting and monitoring
  • Loading branch information
rasheedamir authored May 26, 2023
2 parents c22fed5 + ded4d02 commit 170ac26
Showing 1 changed file with 203 additions and 60 deletions.
263 changes: 203 additions & 60 deletions content/for-developers/enable-alerts-for-your-application.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@

Now that we have enabled metrics for our application the previous section, let's create alerts for it.

Metrics endpoints are scraped via ServiceMonitor by Prometheus
Metrics endpoints are scraped via ServiceMonitor by Prometheus.
Prometheus is already installed on your SAAP cluster.

## Metrics endpoints are scraped via ServiceMonitor by Prometheus

The Prometheus Operator includes a Custom Resource Definition that allows the definition of the ServiceMonitor. The ServiceMonitor is used to define an application you wish to scrape metrics from, the controller will action the ServiceMonitors we define and automatically build the required Prometheus configuration.

To scrape metrics from an endpoint, we use a service monitor.
Example ServiceMonitor:

```yaml
Expand All @@ -28,9 +28,9 @@ spec:
### Defining PrometheusRule CustomResource
PrometheusRule CustomResource will define rules to generate an alert if the metrics values go below/up a certain value (depends on the use case).
If you want to generate alert based on some metric, you will need a PrometheusRule Custom Resource for it. A PrometheusRule defines when an alert should fire.
The Template for the File is as follows:
Example PrometheusRule:
```yaml
apiVersion: monitoring.coreos.com/v1
Expand All @@ -56,65 +56,208 @@ spec:
namespace: < NAME_OF_NAMESPACE >
```
Following Example shows Alerts for PersistentVolumes on the metrics scraped from Kubelets
## Adding Alerts for a Spring Boot Application
Let's take our Spring Boot application again and add alerts for it.
Add a Service Monitor in the namespace in which your Nordmart application is deployed.
Replace <namespace> in the below manifest to the namespace in which your application is deployed and <app-name> to the name of your application.
```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: review-svc-monitor
namespace: <namespace>
spec:
endpoints:
- interval: 5s
path: /actuator/prometheus # path where your metrics are exposed
port: http
namespaceSelector:
matchNames:
- <namespace>
selector:
matchLabels:
app: <app-name>
```
If you have deployed your application using [Stakater's application chart](https://github.com/stakater/application), you need to add the following lines to your helm values file:
```yaml
serviceMonitor:
enabled: true
```
By default, we have set the path for the service monitor to `/actuator/prometheus`. In case you want to change the endpoint to monitor, you can use the endpoint key:

```yaml
serviceMonitor:
enabled: true
endpoints:
- interval: 5s
path: /actuator/prometheus # path where your metrics are exposed
port: http
```

Now let's add a PrometheusRule for the application. In the previous section we added a custom metric that records the review. We are going to use the custom metric to write a `prometheus` rule that fires when we get too many low rating.
Replace the <namespace> to the namespace in which your application is deployed.

```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: stakater-workload-monitoring
role: alert-rules
name: prometheus-workload-rules
namespace: stakater-workload-monitoring
name: review
namespace: <namespace>
spec:
groups:
- name: nordmart-review-low-rating-warning
rules:
- alert: NordmartReviewLowRatingsCritical # Name of alert
annotations:
message: >- # Message that should be added with the alert.
Total ratings below 2 has crossed the threshold 8. Total reviews:
{{ $value }}.
expr: > #condition that fires the alert
sum by (namespace) (nordmart_review_ratings_total{rating="2"} or
nordmart_review_ratings_total{rating="1"}) > 8
labels:
severity: critical
```

If you have deployed your application using [Stakater's application chart](https://github.com/stakater/application), you need to add the following lines to your helm values file:

```yaml
prometheusRule:
enabled: true
additionalLabels:
prometheus: stakater-workload-monitoring
groups:
- name: nordmart-review-low-rating-warning
rules:
- alert: NordmartReviewLowRatingsCritical
annotations:
message: >-
Total ratings below 2 has crossed the threshold 8. Total reviews: {{ $value }}.
expr: >
sum by (namespace) (nordmart_review_ratings_total{rating="2"} or nordmart_review_ratings_total{rating="1"}) > 2
labels:
severity: critical
```

Now we need to tell Alert Manager where to send the alert. For this we will need to add an AlertManagerConfig. If you need to send alert to a slack channel. You will first need to [add a webhook for that channel in Slack](https://docs.stakater.com/saap/managed-addons/monitoring-stack/log-alerts.html)
Once you have the webhook Url, you can proceed to adding the AlertManagerConfig. The Alertmanager uses Kubernetes secret to pick up details of the endpoint to send the alerts to. Let's crate the secret first:
Replace <namespace> to the namespace in which your application is deployed and <api_url> to base64 encoded webhook Url

```yaml
kind: Secret
apiVersion: v1
metadata:
name: review-slack-webhook
namespace: <namespace>
data:
api_url: >-
<api_url>
type: Opaque
```

You can also use application helm chart to deploy the secret.
Let's add the AlertManagerConfig:
Remember to replace <namespace> and <<channel-name>

```yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: review
namespace: <namespace>
spec:
groups:
- name: kubernetes-storage
rules:
- alert: KubePersistentVolumeUsageCritical
annotations:
message: >-
The PersistentVolume claimed by {{ $labels.persistentvolumeclaim
}} in Namespace {{ $labels.namespace }} is only {{ $value |
humanizePercentage }} free.
expr: >-
kubelet_volume_stats_available_bytes{namespace!~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}
/
kubelet_volume_stats_capacity_bytes{namespace!~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}
< 0.03
for: 1m
labels:
severity: critical
- alert: KubePersistentVolumeFullInFourDays
annotations:
message: >-
Based on recent sampling, the PersistentVolume claimed by {{
$labels.persistentvolumeclaim }} in Namespace {{ $labels.namespace
}} is expected to fill up within four days. Currently {{ $value |
humanizePercentage }} is available.
expr: >-
(
kubelet_volume_stats_available_bytes{namespace!~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}
/
kubelet_volume_stats_capacity_bytes{namespace!~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}
) < 0.15
and

predict_linear(kubelet_volume_stats_available_bytes{namespace!~"(openshift-.*|kube-.*|default|logging)",job="kubelet"}[6h],
4 * 24 * 3600) < 0
for: 1h
labels:
severity: critical
- alert: KubePersistentVolumeErrors
annotations:
message: >-
The persistent volume {{ $labels.persistentvolume }} has status {{
$labels.phase }}.
expr: >-
kube_persistentvolume_status_phase{phase=~"Failed|Pending",namespace!~"(openshift-.*|kube-.*|default|logging)",job="kube-state-metrics"}
> 0
for: 5m
labels:
severity: critical
receivers:
- name: nordmart-review-receiver
slackConfigs:
- apiURL:
key: api_url
name: review-slack-webhook #name of the secret that contains details of where to send the alert (Url)"
channel: '<channel-name>' #Slack channel where alert should be sent
httpConfig:
tlsConfig:
insecureSkipVerify: true
sendResolved: true
text: >-
{{ range .Alerts }}
*Alert:* `{{ .Labels.severity | toUpper }}` - {{
.Annotations.summary }}

*Description:* {{ .Annotations.description }}

*Details:*
{{ range .Labels.SortedPairs }} *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
title: >-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{
.Alerts.Firing | len }}{{ end }}] SAAP Alertmanager Event
Notification
route:
groupBy:
- alertname
- severity
groupInterval: 3m
groupWait: 30s
matchers:
- name: alertname
value: NordmartReviewLowRatingsCritical #Name of the alert that trigger that this config is related to
receiver: nordmart-review-receiver #create above in the same manifest
repeatInterval: 1h
```
You can also add this through application helm chart:
```yaml
alertmanagerConfig:
enabled: true
selectionLabels:
alertmanagerConfig: workload
spec:
receivers:
- name: nordmart-review-receiver
slackConfigs:
- apiURL:
key: api_url
name: review-slack-webhook
channel: '#sno8-nordmart-alerts-test'
sendResolved: true
text: |2-
{{ range .Alerts }}
*Alert:* `{{ .Labels.severity | toUpper }}` - {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Details:*
{{ range .Labels.SortedPairs }} *{{ .Name }}:* `{{ .Value }}`
{{ end }}
{{ end }}
title: '[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] SAAP Alertmanager Event Notification'
httpConfig:
tlsConfig:
insecureSkipVerify: true
route:
groupBy:
- alertname
- severity
groupInterval: 3m
groupWait: 30s
repeatInterval: 1h
matchers:
- name: alertname
value: NordmartReviewLowRatingsCritical
receiver: nordmart-review-receiver
secret:
enabled: true
files:
slack-webhook:
data:
api_url: https://hooks.slack.com/services/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
```
Now that we have created everything we need, let's see the alerts firing.
Log in to the SAAP cluster. Change the view to "Developer". You will see the 'Observe' tab in the left Panel.

0 comments on commit 170ac26

Please sign in to comment.