Skip to content

Commit

Permalink
chore: Migration from Datadog to Grafana (#42)
Browse files Browse the repository at this point in the history
* migration from Datadog to grafana

* migration from Datadog to grafana

* you dont need to build your own agent anymore

* how to configure the logs with fluentbit

* how to configure the logs with fluentbit

* how to configure the logs with fluentbit
  • Loading branch information
eedygreen authored Nov 21, 2024
1 parent f5509b8 commit 0842bd4
Show file tree
Hide file tree
Showing 7 changed files with 160 additions and 586 deletions.
118 changes: 101 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,27 +242,111 @@ Relayer configuration is done with `--config-url` flag on Relayer start and can
This flag sets up shared configuration IPNS URL that is used by all Relayers in the MPC network and provided by Sygma.
More on [shared configuration](https://github.com/sygmaprotocol/sygma-shared-configuration)

## Logs and Metrics

### OTLP AGENT
We use OpenTelemetry Agent as a sidecar container for aggregating relayers metrics, for now. Read the followings to build the OpenTelemetry Agent
### Logs
Configure Fluent Bit as follows
- Log Router
- Log Configuration

**Two stages are required for the configuration**
- Building OpenTelemetry Agent
- Configuring Task Definition for ecs users

#### Building OpenTelemetry Agent
See the otlp-agent directory [here](https://github.com/sygmaprotocol/sygma-relayer-deployment/tree/main/otlp-agent) br
The agent require three major files
- Builder: `otlp-builder.yml`
- Config File: `otlp-config.yml`
- Dockerfile
1. Log Router
```
{
"name": "log_router",
"image": "grafana/fluent-bit-plugin-loki:2.9.3-amd64",
"cpu": 0,
"memoryReservation": 50,
"portMappings": [],
"essential": true,
"environment": [],
"mountPoints": [],
"volumesFrom": [],
"user": "0",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/relayer-{{ relayerId }}-TESTNET",
"awslogs-create-group": "true",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
},
"systemControls": [],
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true"
}
}
},
```
2. Log Configuration - configure the Relayer container with this lines of codes
see here for example
```
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"tls.verify": "on",
"remove_keys": "container_id,ecs_task_arn",
"label_keys": "$source,$container_name,$ecs_task_definition,$ecs_cluster",
"Port": "443",
"host": " { request for the endpoint } ",
"http_user": " { request for the userID } ",
"tls": "on",
"line_format": "json",
"Name": "loki",
"labels": "job=fluent-bit,env=testnet,project=sygma,service_name=relayer-{{ relayerId }}-container-TESTNET,image={{ imageTag }}"
},
"secretOptions": [
{
"name": "http_passwd",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/logs/grafana"
}
]
},
```
### OTLP AGENT for Metrics
We use OpenTelemetry Agent as a sidecar container for aggregating relayers metrics, for now.
#### The OTLP Agent
Configure The OLTP Agent as a sidecar container on the ECS Task definition file
```
{
"name": "otel-collector",
"image": "ghcr.io/sygmaprotocol/sygma-opentelemetry-collector:v1.0.3",
"essential": true,
"secrets": [
{
"name": "GRAFANA_CLOUD",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/auth/secrets"
},
{
"name": "USER_ID",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/auth/userid"
},
{
"name": "ENDPOINT",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/logs/grafana/endpoint"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/{{ relayerName }}-{{ relayerId }}-{{ TESTNET }}",
"awslogs-create-group": "True",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
}
}

#### Build The OTLP Agent
The otlp-agent directory contains a CI workflow in .github directory to automate the build process. [Here](https://github.com/sygmaprotocol/sygma-relayer-deployment/blob/main/otlp-agent/.github/workflows/opentelemetry.yaml) is GitHub CI that build the image.
You can use it as an example or use our build system of choice.
```
For K8s or other environment
Here is the image ghcr.io/sygmaprotocol/sygma-opentelemetry-collector:v1.0.3
- Run the Image as a sidecar container
- set this variables `GRAFANA_CLOUD` `USER_ID` `ENDPOINT`
- Sygma will share the values of these variables through secure channel(s)
After you have built your image, you should change [here](https://github.com/sygmaprotocol/sygma-relayer-deployment/blob/main/ecs/task_definition_PARTNERS.j2#L200) for your image path
#### The Integration of the OpenTelemetry Agent
See the task Definition section for the integration [here](https://github.com/sygmaprotocol/sygma-relayer-deployment/blob/main/ecs/task_definition_PARTNERS.j2#L199)
Expand All @@ -282,4 +366,4 @@ Configure [this](https://github.com/sygmaprotocol/sygma-relayer-deployment/blob/
You may chose to remove [this](https://github.com/sygmaprotocol/sygma-relayer-deployment/blob/main/ecs/task_definition_PARTNERS.j2#L201) for accessing private repository.
The Sygma Team Highly Recommend to use private repository for the otlp agent
The Sygma Team Highly recommend to be security conscious while storing the shared credentials - store the credentials in private and secure environment with least previlige. Use Vault, AWS secrets manager for storing crednetials.
171 changes: 59 additions & 112 deletions ecs/task_definition_PARTNERS.j2
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"family": "{{ relayerName }}-{{ relayerId }}-container-{{ appTag }}",
"family": "{{ relayerName }}-{{ relayerId }}-container-{{ TESTNET }}",
"containerDefinitions": [
{
"name": "{{ relayerName }}-{{ relayerId }}-container-{{ appTag }}",
"name": "{{ relayerName }}-{{ relayerId }}-container-{{ TESTNET }}",
"image": "ghcr.io/sygmaprotocol/sygma-relayer:{{ set Sygma release version }}",
"portMappings": [
{
Expand Down Expand Up @@ -40,7 +40,7 @@
},
{
"name": "SYG_RELAYER_ID",
"value": "5"
"value": "{{ relayerId }}"
},
{
"name": "SYG_RELAYER_ENV",
Expand Down Expand Up @@ -87,138 +87,85 @@
"logConfiguration": {
"logDriver": "awsfirelens",
"options": {
"provider": "ecs",
"dd_service": "{{ env }}-relayers-{{ relayerId }}",
"dd_tags": "env:{{ env }},project:chainbridge,relayerid:{{ relayerId }},image:{{ set Sygma release version }}",
"dd_message_key": "log",
"Host": "http-intake.logs.datadoghq.com",
"TLS": "on",
"dd_source": "{{ relayerName }}-{{ relayerId }}-container-{{ appTag }}",
"Name": "datadog"
},
"secretOptions": [
{
"name": "apikey",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/common/datadog/key"
}
"tls.verify": "on",
"remove_keys": "container_id,ecs_task_arn",
"label_keys": "$source,$container_name,$ecs_task_definition,$ecs_cluster",
"Port": "443",
"host": " { request for the Loging ENDPOINT } ",
"http_user": " { request for the USER_ID } ",
"tls": "on",
"line_format": "json",
"Name": "loki",
"labels": "job=fluent-bit,env=testnet,project=sygma,service_name=relayer-{{ relayerId }}-container-TESTNET,image={{ imageTag }}"
},
"secretOptions": [
{
"name": "http_passwd",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/logs/grafana"
}
]
},
"dependsOn": [
{
"containerName": "log_router",
"condition": "START"
}
]
},
{
"name": "datadog-agent",
"image": "gcr.io/datadoghq/agent:latest",
"essential": true,
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/{{ relayerName }}-{{ relayerId }}-{{ appTag }}",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"retries": 3,
"command": ["CMD-SHELL","agent health"],
"timeout": 5,
"interval": 30,
"startPeriod": 15
},
"portMappings": [
{
"hostPort": 8126,
"protocol": "tcp",
"containerPort": 8126
}
],
"command": [],
"cpu": 0,
"environment": [
{
"name": "DD_APM_ENABLED",
"value": "true"
},
{
"name": "DD_APM_NON_LOCAL_TRAFFIC",
"value": "true"
},
{
"name": "DD_TAGS",
"value": "env:{{ env }},project:relayer-{{ relayerId }}"
},
{
"name": "DD_LOG_LEVEL",
"value": "INFO"
},
{
"name": "ECS_FARGATE",
"value": "true"
},
{
"name": "ENV",
"value": "{{ env }}"
}
],
"secrets": [
{
"name": "DD_API_KEY",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/common/datadog/key"
}
],
"mountPoints": [],
"volumesFrom": []
},
{
"name": "log_router",
"image": "amazon/aws-for-fluent-bit:latest",
"essential": true,
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true"
}
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/{{ relayerName }}-{{ relayerId }}-{{ appTag }}",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
},
"portMappings": [],
"command": [],
"name": "log_router",
"image": "grafana/fluent-bit-plugin-loki:2.9.3-amd64",
"cpu": 0,
"memoryReservation": 50,
"portMappings": [],
"essential": true,
"environment": [],
"mountPoints": [],
"volumesFrom": [],
"user": "0",
"volumesFrom": []
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/relayer-{{ relayerId }}-TESTNET",
"awslogs-create-group": "true",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
},
"systemControls": [],
"firelensConfiguration": {
"type": "fluentbit",
"options": {
"enable-ecs-log-metadata": "true"
}
}
},
{
"name": "otel-collector",
"image": "ghcr.io/sygmaprotocol/sygma-opentelemetry-collector:latest",
"repositoryCredentials": {
"credentialsParameter": "arn:aws:secretsmanager:{{ awsRegion }}:{{ awsAccountId }}:secret:sygma/opentelemetry-Z1wcYA"
},
"image": "ghcr.io/sygmaprotocol/sygma-opentelemetry-collector:v1.0.3",
"cpu": 0,
"portMappings": [],
"essential": true,
"environment": [],
"mountPoints": [],
"volumesFrom": [],
"secrets": [
{
"name": "GRAFANA_CLOUD",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/basicauth/secrets"
},
{
"name": "USER_ID",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/basicauth/userid"
},
{
"name": "ENDPOINT",
"valueFrom": "arn:aws:ssm:{{ awsRegion }}:{{ awsAccountId }}:parameter/sygma/logs/grafana/endpoint"
}
],
"dockerLabels": {},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/{{ relayerName }}-{{ relayerId }}-{{ appTag }}",
"awslogs-group": "/ecs/{{ relayerName }}-{{ relayerId }}-{{ TESTNET }}",
"awslogs-create-group": "True",
"awslogs-region": "{{ awsRegion }}",
"awslogs-stream-prefix": "ecs"
}
}
}
}
],
Expand Down
Loading

0 comments on commit 0842bd4

Please sign in to comment.