Skip to content

Commit

Permalink
extend webhook data
Browse files Browse the repository at this point in the history
Signed-off-by: Maksim Paskal <[email protected]>
  • Loading branch information
maksim-paskal committed Jan 29, 2024
1 parent a270d83 commit 4128d94
Show file tree
Hide file tree
Showing 21 changed files with 466 additions and 94 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ kubeconfig
dist
/aks-node-termination-handler
simulateEviction
coverage.out
coverage.out
*.tmp
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ run:
-gracePeriodSeconds=0 \
-endpoint=http://localhost:28080/pkg/types/testdata/ScheduledEventsType.json \
-webhook.url=http://localhost:9091/metrics/job/aks-node-termination-handler \
-webhook.template='node_termination_event{node="{{ .Node }}"} 1' \
-webhook.template='node_termination_event{node="{{ .NodeName }}"} 1' \
-telegram.token=${telegramToken} \
-telegram.chatID=${telegramChatID} \
-web.address=127.0.0.1:17923
Expand Down
78 changes: 72 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,21 +72,87 @@ aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical
```

## Alerting
## Send notification events

To make alerts to Telegram or Slack or Webhook
You can compose your payload with markers that described [here](pkg/template/README.md)

<details>
<summary>Send Telegram notification</summary>

```bash
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--set priorityClassName=system-node-critical \
--set args[0]=-telegram.token=<telegram token> \
--set args[1]=-telegram.chatID=<telegram chatid> \
--set args[2]=-webhook.url=http://prometheus-pushgateway.prometheus.svc.cluster.local:9091/metrics/job/aks-node-termination-handler \
--set args[3]=-webhook.template='node_termination_event{node="{{ .Node }}"} 1'
--set 'args[0]=-telegram.token=<telegram token>' \
--set 'args[1]=-telegram.chatID=<telegram chatid>'
```
</details>

<details>
<summary>Send Slack notification</summary>

```bash
# create payload file
cat <<EOF | tee values.yaml
priorityClassName: system-node-critical
args:
- -webhook.url=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
- -webhook.template-file=/files/slack-payload.json
- -webhook.contentType=application/json
- -webhook.method=POST
- -webhook.timeout=30s
configMap:
data:
slack-payload.json: |
{
"channel": "#mychannel",
"username": "webhookbot",
"text": "This is message for {{ .NodeName }}, {{ .InstanceType }} from {{ .NodeRegion }}",
"icon_emoji": ":ghost:"
}
EOF

# install/upgrade helm chart
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--values values.yaml
```
</details>

<details>
<summary>Send Prometheus Pushgateway event</summary>

```bash
cat <<EOF | tee values.yaml
priorityClassName: system-node-critical
args:
- -webhook.url=http://prometheus-pushgateway.prometheus.svc.cluster.local:9091/metrics/job/aks-node-termination-handler
- -webhook.template-file=/files/prometheus-pushgateway-payload.txt
- -webhook.contentType=text/plain
- -webhook.method=POST
- -webhook.timeout=30s
configMap:
data:
prometheus-pushgateway-payload.txt: |
node_termination_event{node="{{ .NodeName }}"} 1
EOF

# install/upgrade helm chart
helm upgrade aks-node-termination-handler \
--install \
--namespace kube-system \
aks-node-termination-handler/aks-node-termination-handler \
--values values.yaml
```
</details>

## Simulate eviction

Expand Down
2 changes: 1 addition & 1 deletion charts/aks-node-termination-handler/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: v2
icon: https://helm.sh/img/helm.svg
name: aks-node-termination-handler
version: 1.1.1
version: 1.1.2
description: Gracefully handle Azure Virtual Machines shutdown within Kubernetes
maintainers:
- name: maksim-paskal # Maksim Paskal
Expand Down
8 changes: 8 additions & 0 deletions charts/aks-node-termination-handler/templates/configmap.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{{ if .Values.configMap.create }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ .Values.configMap.name }}
data:
{{ toYaml .Values.configMap.data | indent 2 }}
{{ end }}
16 changes: 15 additions & 1 deletion charts/aks-node-termination-handler/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,13 @@ spec:
nodeSelector:
{{- toYaml .Values.nodeSelector | nindent 8 }}
{{- end }}
volumes:
- name: files
configMap:
name: {{ .Values.configMap.name }}
{{ if .Values.extraVolumes }}
{{ toYaml .Values.extraVolumes | indent 6 }}
{{ end }}
containers:
- name: aks-node-termination-handler
resources:
Expand Down Expand Up @@ -75,4 +82,11 @@ spec:
ports:
- name: http
containerPort: 17923
protocol: TCP
protocol: TCP
volumeMounts:
- name: files
mountPath: {{ .Values.configMap.mountPath }}
readOnly: true
{{ if .Values.extraVolumeMounts}}
{{ toYaml .Values.extraVolumeMounts | indent 8 }}
{{ end }}
18 changes: 18 additions & 0 deletions charts/aks-node-termination-handler/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,24 @@ priorityClassName: ""
annotations: {}
labels: {}

configMap:
create: true
name: aks-node-termination-handler-files
mountPath: /files
data: {}
# slack-payload.json: |
# {
# "channel": "#mychannel",
# "username": "webhookbot",
# "text": "This is message for {{ .NodeName }}, {{ .InstanceType }} from {{ .NodeRegion }}",
# "icon_emoji": ":ghost:"
# }
# prometheus-pushgateway-payload.txt: |
# node_termination_event{node="{{ .NodeName }}"} 1

extraVolumes: []
extraVolumeMounts: []

metrics:
addAnnotations: true

Expand Down
4 changes: 2 additions & 2 deletions e2e/main_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import (
"github.com/maksim-paskal/aks-node-termination-handler/pkg/types"
)

var ctx = context.Background()
var ctx = context.TODO()

func TestDrain(t *testing.T) {
t.Parallel()
Expand All @@ -46,7 +46,7 @@ func TestDrain(t *testing.T) {
t.Fatal(err)
}

if err := alert.SendTelegram(template.MessageType{Template: "e2e"}); err != nil {
if err := alert.SendTelegram(&template.MessageType{Template: "e2e"}); err != nil {
t.Fatal(err)
}

Expand Down
16 changes: 1 addition & 15 deletions pkg/alert/alert.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,11 @@ limitations under the License.
package alert

import (
"context"
"strconv"

tgbotapi "github.com/go-telegram-bot-api/telegram-bot-api"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/config"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/template"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/webhook"
"github.com/pkg/errors"
log "github.com/sirupsen/logrus"
)
Expand Down Expand Up @@ -56,19 +54,7 @@ func Ping() error {
return nil
}

func SendALL(ctx context.Context, obj template.MessageType) error {
if err := SendTelegram(obj); err != nil {
return errors.Wrap(err, "error in sending to telegram")
}

if err := webhook.SendWebHook(ctx, obj); err != nil {
return errors.Wrap(err, "error in sending to webhook")
}

return nil
}

func SendTelegram(obj template.MessageType) error {
func SendTelegram(obj *template.MessageType) error {
if len(*config.Get().TelegramToken) == 0 {
return nil
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/cache/cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import (
func TestCache(t *testing.T) {
t.Parallel()

ctx, cancel := context.WithCancel(context.Background())
ctx, cancel := context.WithCancel(context.TODO())
defer cancel()

go cache.SheduleCleaning(ctx)
Expand Down
8 changes: 5 additions & 3 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ import (

const (
azureEndpoint = "http://169.254.169.254/metadata/scheduledevents?api-version=2020-07-01"
defaultAlertMessage = "Draining node={{ .Node }}, type={{ .Event.EventType }}"
defaultAlertMessage = "Draining node={{ .NodeName }}, type={{ .Event.EventType }}"
defaultPeriod = 5 * time.Second
defaultPodGracePeriodSeconds = -1
defaultNodeGracePeriodSeconds = 120
defaultGracePeriodSecond = 10
defaultRequestTimeout = 1 * time.Second
defaultWebHookTimeout = 5 * time.Second
defaultWebHookTimeout = 30 * time.Second
)

var (
Expand All @@ -58,6 +58,7 @@ type Type struct {
WebHookContentType *string
WebHookURL *string
WebHookTemplate *string
WebHookTemplateFile *string
WebHookMethod *string
WebHookTimeout *time.Duration
SentryDSN *string
Expand Down Expand Up @@ -86,7 +87,8 @@ var config = Type{
WebHookContentType: flag.String("webhook.contentType", "application/json", "request content-type header"),
WebHookURL: flag.String("webhook.url", os.Getenv("WEBHOOK_URL"), "send alerts to webhook"),
WebHookTimeout: flag.Duration("webhook.timeout", defaultWebHookTimeout, "request timeout"),
WebHookTemplate: flag.String("webhook.template", "test", "request body"),
WebHookTemplate: flag.String("webhook.template", os.Getenv("WEBHOOK_TEMPLATE"), "request body"),
WebHookTemplateFile: flag.String("webhook.template-file", os.Getenv("WEBHOOK_TEMPLATE_FILE"), "path to request body template file"), //nolint:lll
SentryDSN: flag.String("sentry.dsn", "", "sentry DSN"),
WebHTTPAddress: flag.String("web.address", ":17923", ""),
TaintNode: flag.Bool("taint.node", false, "Taint the node before cordon and draining"),
Expand Down
31 changes: 24 additions & 7 deletions pkg/events/events.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import (
"github.com/maksim-paskal/aks-node-termination-handler/pkg/template"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/types"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/utils"
"github.com/maksim-paskal/aks-node-termination-handler/pkg/webhook"
"github.com/pkg/errors"
log "github.com/sirupsen/logrus"
)
Expand Down Expand Up @@ -131,13 +132,8 @@ func readEndpoint(ctx context.Context, azureResource string) (bool, error) { //n
continue
}

err := alert.SendALL(ctx, template.MessageType{
Event: event,
Node: azureResource,
Template: *config.Get().AlertMessage,
})
if err != nil {
log.WithError(err).Error("error in alerts.Send")
if err := sendEvent(ctx, event); err != nil {
log.WithError(err).Error("error in sendEvent")
}

err = api.DrainNode(ctx, *config.Get().NodeName, string(event.EventType), event.EventId)
Expand All @@ -159,3 +155,24 @@ func getsharedMetricsLabels(resourceName string) []string {
resourceName,
}
}

func sendEvent(ctx context.Context, event types.ScheduledEventsEvent) error {
message, err := template.NewMessageType(ctx, *config.Get().NodeName, event)
if err != nil {
return errors.Wrap(err, "error in template.NewMessageType")
}

log.Infof("Message: %+v", message)

message.Template = *config.Get().AlertMessage

if err := alert.SendTelegram(message); err != nil {
log.WithError(err).Error("error in alert.SendTelegram")
}

if err := webhook.SendWebHook(ctx, message); err != nil {
log.WithError(err).Error("error in webhook.SendWebHook")
}

return nil
}
2 changes: 1 addition & 1 deletion pkg/metrics/metrics_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import (
var (
client = &http.Client{}
ts = httptest.NewServer(metrics.GetHandler())
ctx = context.Background()
ctx = context.TODO()
)

func TestMetricsInc(t *testing.T) {
Expand Down
23 changes: 23 additions & 0 deletions pkg/template/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Templating Options

| Template | Description | Example |
| --------- | ----------- | ------- |
| `{{ .Event.EventId }}` | Globally unique identifier for this event. | 602d9444-d2cd-49c7-8624-8643e7171297 |
| `{{ .Event.EventType }}` | Impact this event causes. | Reboot |
| `{{ .Event.ResourceType }}` | Type of resource this event affects. | VirtualMachine |
| `{{ .Event.Resources }}` | List of resources this event affects. | [ FrontEnd_IN_0 ...] |
| `{{ .Event.EventStatus }}` | Status of this event. | Scheduled |
| `{{ .Event.NotBefore }}` | Time after which this event can start. The event is guaranteed to not start before this time. Will be blank if the event has already started | Mon, 19 Sep 2016 18:29:47 GMT |
| `{{ .Event.Description }}` | Description of this event. | Host server is undergoing maintenance |
| `{{ .Event.EventSource }}` | Initiator of the event. | Platform |
| `{{ .Event.DurationInSeconds }}` | The expected duration of the interruption caused by the event. | -1 |
| `{{ .NodeLabels }}` | Node labels | kubernetes.azure.com/agentpool:spotcpu4m16n ... |
| `{{ .NodeName }}` | Node name | aks-spotcpu4m16n-41289323-vmss0000ny |
| `{{ .ClusterName }}` | Node label kubernetes.azure.com/cluster | MC_EAST-US-RC-STAGE_stage-cluster_eastus |
| `{{ .InstanceType }}` | Node label node.kubernetes.io/instance-type | Standard_D4as_v5 |
| `{{ .NodeArch }}` | Node label kubernetes.io/arch | amd64 |
| `{{ .NodeOS }}` | Node label kubernetes.io/os | linux |
| `{{ .NodeRole }}` | Node label kubernetes.io/role | agent |
| `{{ .NodeRegion }}` | Node label topology.kubernetes.io/region | eastus |
| `{{ .NodeZone }}` | Node label topology.kubernetes.io/zone | 0 |
| `{{ .NodePods }}` | List of pods on node | [ pod1 ...] |
Loading

0 comments on commit 4128d94

Please sign in to comment.