Skip to content

Commit

Permalink
feat!: Enable eventbridge by default (#4320)
Browse files Browse the repository at this point in the history
Promote eventbridge to standard. The eventbridge can be disable by
setting:

```hcl
  eventbridge = {
    enable = false
  }
```

---------

Co-authored-by: philips-labs-pr|bot <philips-labs-pr[bot]@users.noreply.github.com>
  • Loading branch information
npalm and philips-labs-pr|bot authored Dec 20, 2024
1 parent 7171215 commit 142bb61
Show file tree
Hide file tree
Showing 6 changed files with 14 additions and 78 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
| <a name="input_enable_ssm_on_runners"></a> [enable\_ssm\_on\_runners](#input\_enable\_ssm\_on\_runners) | Enable to allow access to the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances. | `bool` | `false` | no |
| <a name="input_enable_user_data_debug_logging_runner"></a> [enable\_user\_data\_debug\_logging\_runner](#input\_enable\_user\_data\_debug\_logging\_runner) | Option to enable debug logging for user-data, this logs all secrets as well. | `bool` | `false` | no |
| <a name="input_enable_userdata"></a> [enable\_userdata](#input\_enable\_userdata) | Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI. | `bool` | `true` | no |
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling.<br/><br/> `enable`: Enable the EventBridge feature.<br/> `accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), null)<br/> })</pre> | `{}` | no |
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling.<br/><br/> `enable`: Enable the EventBridge feature.<br/> `accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events. | <pre>object({<br/> enable = optional(bool, true)<br/> accept_events = optional(list(string), null)<br/> })</pre> | `{}` | no |
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br/> key_base64 = string<br/> id = string<br/> webhook_secret = string<br/> })</pre> | n/a | yes |
Expand Down
64 changes: 4 additions & 60 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ To be able to support a number of use-cases, the module has quite a lot of confi

- Org vs Repo level. You can configure the module to connect the runners in GitHub on an org level and share the runners in your org, or set the runners on repo level and the module will install the runner to the repo. There can be multiple repos but runners are not shared between repos.
- Multi-Runner module. This modules allows you to create multiple runner configurations with a single webhook and single GitHub App to simplify deployment of different types of runners. Check the detailed module [documentation](modules/public/multi-runner.md) for more information or checkout the [multi-runner example](examples/multi-runner.md).
- Webhook mode, the module can be deployed in `direct` mode or `EventBridge` (Experimental) mode. The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `EventBridge` mode will publish the events to a eventbus, the rule then directs the received events to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `EventBridge` mode is useful when you want to have more control over the events and potentially filter them. The `EventBridge` mode is disabled by default. An example of what the `EventBridge` mode could be used for is building a data lake, build metrics, act on `workflow_job` job started events, etc.
- Webhook mode, the module can be deployed in `direct` mode or `EventBridge` (Experimental) mode. The `direct` mode is the default and will directly distribute to SQS for the scale-up lambda. The `EventBridge` mode will publish the events to a eventbus, the rule then directs the received events to a dispatch lambda. The dispatch lambda will send the event to the SQS queue. The `EventBridge` mode is the default and allows to have more control over the events and potentially filter them. The `EventBridge` mode can be disabled, messages are sent directed to queues in that case. An example of what the `EventBridge` mode could be used for is building a data lake, build metrics, act on `workflow_job` job started events, etc.
- Linux vs Windows. You can configure the OS types linux and win. Linux will be used by default.
- Re-use vs Ephemeral. By default runners are re-used, until detected idle. Once idle they will be removed from the pool. To improve security we are introducing ephemeral runners. Those runners are only used for one job. Ephemeral runners only work in combination with the workflow job event. For ephemeral runners the lambda requests a JIT (just in time) configuration via the GitHub API to register the runner. [JIT configuration](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-just-in-time-runners) is limited to ephemeral runners (and currently not supported by GHES). For non-ephemeral runners, a registration token is always requested. In both cases the configuration is made available to the instance via the same SSM parameter. To disable JIT configuration for ephemeral runners set `enable_jit_config` to `false`. We also suggest using a pre-build AMI to improve the start time of jobs for ephemeral runners.
- Job retry (**Beta**). By default the scale-up lambda will discard the message when it is handled. Meaning in the ephemeral use-case an instance is created. The created runner will ask GitHub for a job, no guarantee it will run the job for which it was scaling. Result could be that with small system hick-up the job is keeping waiting for a runner. Enable a pool (org runners) is one option to avoid this problem. Another option is to enable the job retry function. Which will retry the job after a delay for a configured number of times.
Expand Down Expand Up @@ -263,9 +263,9 @@ Below an example of the the log messages created.

### EventBridge

This module can be deployed in using the mode `EventBridge` (Experimental). The `EventBridge` mode will publish an event to a eventbus. Within the eventbus, there is a target rule set, sending events to the dispatch lambda. The `EventBridge` mode is disabled by default.
This module can be deployed in using the mode `EventBridge`. The `EventBridge` mode will publish an event to a eventbus. Within the eventbus, there is a target rule set, sending events to the dispatch lambda. The `EventBridge` mode is enabled by default.

Example to use the EventBridge:
Example to extend the EventBridge:

```hcl
Expand All @@ -274,7 +274,7 @@ module "runners" {
...
eventbridge = {
enable = true
enable = false
}
...
}
Expand Down Expand Up @@ -332,61 +332,5 @@ resource "aws_iam_role_policy" "event_rule_firehose_role" {
}
```

### Queue to publish workflow job events

!!! warning "Removed

This feaTure will be removed since we introducing the EventBridge. Same functionality can be implemented by adding a rule to the EventBridge to forward `workflow_job` events to the SQS queue.

Below an example how you can sent all `workflow_job` with action `in_progress` to a SQS queue.

```hcl
resource "aws_cloudwatch_event_rule" "workflow_job_in_progress" {
name = "workflow-job-in-progress"
event_bus_name = modules.runners.webhook.eventbridge.name # The name of the event bus output by the module
event_pattern = <<EOF
{
"detail-type": ["workflow_job"],
"detail": {
"action": ["in_progress"]
}
}
EOF
}
resource "aws_sqs_queue" "workflow_job_in_progress" {
name = "workflow_job_in_progress
}
resource "aws_sqs_queue_policy" "workflow_job_in_progress" {
queue_url = aws_sqs_queue.workflow_job_in_progress.id
policy = data.aws_iam_policy_document.sqs_policy.json
}
data "aws_iam_policy_document" "sqs_policy" {
statement {
sid = "AllowFromEventBridge"
actions = ["sqs:SendMessage"]
principals {
type = "Service"
identifiers = ["events.amazonaws.com"]
}
resources = [aws_sqs_queue.workflow_job_in_progress.arn]
condition {
test = "ArnEquals"
variable = "aws:SourceArn"
values = [aws_cloudwatch_event_rule.workflow_job_in_progress.arn]
}
}
}
```



NOTE: By default, a runner AMI update requires a re-apply of this terraform config (the runner AMI ID is looked up by a terraform data source). To avoid this, you can use `ami_id_ssm_parameter_name` to have the scale-up lambda dynamically lookup the runner AMI ID from an SSM parameter at instance launch time. Said SSM parameter is managed outside of this module (e.g. by a runner AMI build workflow).
10 changes: 4 additions & 6 deletions examples/default/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,10 @@ module "runners" {
# prefix GitHub runners with the environment name
runner_name_prefix = "${local.environment}_"

# webhook supports two modes, either direct or via the eventbridge, uncomment to enable eventbridge
# eventbridge = {
# enable = true
# # adjust the allow events to only allow specific events, like workflow_job
# # allowed_events = ['workflow_job']
# }
# by default eventbridge is used, see multi-runner example. Here we disable the eventbridge
eventbridge = {
enable = false
}

# Enable debug logging for the lambda functions
# log_level = "debug"
Expand Down
4 changes: 2 additions & 2 deletions modules/multi-runner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,12 +128,12 @@ module "multi-runner" {
| <a name="input_cloudwatch_config"></a> [cloudwatch\_config](#input\_cloudwatch\_config) | (optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details. | `string` | `null` | no |
| <a name="input_enable_ami_housekeeper"></a> [enable\_ami\_housekeeper](#input\_enable\_ami\_housekeeper) | Option to disable the lambda to clean up old AMIs. | `bool` | `false` | no |
| <a name="input_enable_managed_runner_security_group"></a> [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enabling the default managed security group creation. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no |
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling. | <pre>object({<br/> enable = optional(bool, false)<br/> accept_events = optional(list(string), [])<br/> })</pre> | `{}` | no |
| <a name="input_eventbridge"></a> [eventbridge](#input\_eventbridge) | Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling. | <pre>object({<br/> enable = optional(bool, true)<br/> accept_events = optional(list(string), [])<br/> })</pre> | `{}` | no |
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br/> key_base64 = string<br/> id = string<br/> webhook_secret = string<br/> })</pre> | n/a | yes |
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.<br/><br/>`enable`: Enable or disable the spot termination watcher.<br/>`memory_size`: Memory size linit in MB of the lambda.<br/>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br/>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br/>`timeout`: Time out of the lambda in seconds.<br/>`zip`: File location of the lambda zip file. | <pre>object({<br/> enable = optional(bool, false)<br/> enable_metrics = optional(string, null) # deprecated<br/> features = optional(object({<br/> enable_spot_termination_handler = optional(bool, true)<br/> enable_spot_termination_notification_watcher = optional(bool, true)<br/> }), {})<br/> memory_size = optional(number, null)<br/> s3_key = optional(string, null)<br/> s3_object_version = optional(string, null)<br/> timeout = optional(number, null)<br/> zip = optional(string, null)<br/> })</pre> | `{}` | no |
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.<br/><br/>`enable`: Enable or disable the spot termination watcher.<br/>`memory_size`: Memory size linit in MB of the lambda.<br/>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br/>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br/>`timeout`: Time out of the lambda in seconds.<br/>`zip`: File location of the lambda zip file. | <pre>object({<br/> enable = optional(bool, false)<br/> features = optional(object({<br/> enable_spot_termination_handler = optional(bool, true)<br/> enable_spot_termination_notification_watcher = optional(bool, true)<br/> }), {})<br/> memory_size = optional(number, null)<br/> s3_key = optional(string, null)<br/> s3_object_version = optional(string, null)<br/> timeout = optional(number, null)<br/> zip = optional(string, null)<br/> })</pre> | `{}` | no |
| <a name="input_key_name"></a> [key\_name](#input\_key\_name) | Key pair name | `string` | `null` | no |
| <a name="input_kms_key_arn"></a> [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. | `string` | `null` | no |
| <a name="input_lambda_architecture"></a> [lambda\_architecture](#input\_lambda\_architecture) | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions. | `string` | `"arm64"` | no |
Expand Down
10 changes: 2 additions & 8 deletions modules/multi-runner/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -618,8 +618,7 @@ variable "instance_termination_watcher" {
EOF

type = object({
enable = optional(bool, false)
enable_metrics = optional(string, null) # deprecated
enable = optional(bool, false)
features = optional(object({
enable_spot_termination_handler = optional(bool, true)
enable_spot_termination_notification_watcher = optional(bool, true)
Expand All @@ -631,11 +630,6 @@ variable "instance_termination_watcher" {
zip = optional(string, null)
})
default = {}

validation {
condition = var.instance_termination_watcher.enable_metrics == null
error_message = "The feature `instance_termination_watcher` is deprecated and will be removed in a future release. Please use the `termination_watcher` variable instead."
}
}

variable "lambda_tags" {
Expand Down Expand Up @@ -671,7 +665,7 @@ variable "metrics" {
variable "eventbridge" {
description = "Enable the use of EventBridge by the module. By enabling this feature events will be put on the EventBridge by the webhook instead of directly dispatching to queues for scaling."
type = object({
enable = optional(bool, false)
enable = optional(bool, true)
accept_events = optional(list(string), [])
})

Expand Down
2 changes: 1 addition & 1 deletion variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -946,7 +946,7 @@ variable "eventbridge" {
`accept_events`: List can be used to only allow specific events to be putted on the EventBridge. By default all events, empty list will be be interpreted as all events.
EOF
type = object({
enable = optional(bool, false)
enable = optional(bool, true)
accept_events = optional(list(string), null)
})

Expand Down

0 comments on commit 142bb61

Please sign in to comment.