Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: autoscaler with scaling schedules #139

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

d-costa
Copy link
Collaborator

@d-costa d-costa commented Jan 3, 2024

what

  • Add the ability to use an autoscaler to scale down to zero outside the defined schedules.
  • No more than one instance will be created at the same time.
  • Since we can only attach autoscalers to non-stateful MIGs, this commit also removes the responsibility of creating the home folder disk (atlantis-disk-0) from it, effectively making it a stateless MIG. Nonetheless, destroying the group will not destroy the disk.
  • Add resources for the disk and the autoscaler, and a usage example. Updated the README.

BREAKING CHANGE: the 50GB stateful disk is no longer created by the MIG, which makes the MIG no longer stateful. Additionally, if terraform destroy is executed, the disk is destroyed.

why

  • Many times, teams only use Atlantis during certain week days and periods of the day (e.g. 8AM to 7PM). This feature allows the MIG to scale down to zero outside the defined periods. Scaling down during the weekends alone will reduce costs by ~28%.
  • Cost is usually the main concern when adopting Atlantis vs GitHub Actions.

Notes:

  • How responsive is the scaling with respect to the schedule?
    • ~2-3 minutes to scale up, ~10 minutes to scale down after the window ends.
  • Will the MIG scale down even if an apply is executing?
    • Yes. The plan becomes stale, and an atlantis plan and apply will fix the drift.
  • What happens if the instance is destroyed after a plan is calculated?
    • When the instance is brought back up, the disk is attached and you can atlantis apply as usual.

Let me know if you find this useful, and whether it fits your vision for the module! 😄

Sidenote: We also tried to implement an on-demand scale up (to deploy an instance outside the schedules) using Monitoring metrics based on the load balancer, which is technically possible, but we were unsuccessful. While the group indeed scales from 0 to 1 when requests arrive, it never scales back down: in the absence of requests, the metric will keep the last value. For reference, we tried the following:

resource "google_compute_autoscaler" "default" {
  # ...
  autoscaling_policy {
    dynamic "metric" {
      for_each = var.autoscaling.scale_up_on_demand ? [
        # You can only use the AND operator for joining selectors. You can only use direct equality comparison operator (=) without any functions for each selector.
        # Metric types must be unique within the scaling configuration.
        {
          # Keep instance up when used
          name   = "loadbalancing.googleapis.com/https/request_bytes_count"
          filter = "metric.labels.response_code_class = \"200\" AND resource.type = \"https_lb_rule\" AND resource.labels.project_id = \"${var.project}\" AND resource.labels.forwarding_rule_name = \"${var.name}\""
          target = 1
        },
        {
          # Scale up when needed
          name   = "loadbalancing.googleapis.com/https/request_count"
          filter = "metric.labels.response_code = \"503\" AND resource.type = \"https_lb_rule\" AND resource.labels.project_id = \"${var.project}\" resource.labels.forwarding_rule_name = \"${var.name}\""
          target = 0.001
        }
      ] : []
      content {
        name   = metric.value.name
        target = metric.value.target
        type   = "DELTA_PER_SECOND"
        filter = metric.value.filter
      }
    }

references

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Jan 3, 2024

Thanks for opening this extensive PR @d-costa it looks really well on first sight - allow me to review it as soon as possible together again with the other PR (which I already reviewed but just need to merge). End of year was really hectic for me unfortunately!

@d-costa
Copy link
Collaborator Author

d-costa commented Jan 3, 2024

No problem at all! And happy new year 😄

@bschaatsbergen
Copy link
Member

bschaatsbergen commented Jan 5, 2024

Happy new year and best wishes @d-costa 😃
Could you please resolve the conflicts @d-costa, I've just merged the Shared VPC PR from you (#137)

@dgteixeira
Copy link

hey @bschaatsbergen , how are you?
Any idea on when you might be able to review this and #131 ?

We are currently using a local version of this but we would love to point all the way to your upstream :)

@bschaatsbergen
Copy link
Member

hey @bschaatsbergen , how are you? Any idea on when you might be able to review this and #131 ?

We are currently using a local version of this but we would love to point all the way to your upstream :)

Hi @dgteixeira, thanks for getting in touch. Now that the repository has been transferred to the runatlantis organization, and you, @d-costa, @DanielRieske, and @cblkwell are maintainers of this repository, I encourage you to collaborate on addressing these PRs together :)

@cblkwell
Copy link
Contributor

cblkwell commented Apr 1, 2024

This one looks good to me if we can resolve the conflicts -- once that is done I don't think I'll have a problem signing off.

@d-costa d-costa requested a review from a team as a code owner April 1, 2024 15:29
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 1, 2024
@cblkwell
Copy link
Contributor

cblkwell commented Apr 2, 2024

Hrm. What's up with the ci? :/

@d-costa d-costa closed this Apr 6, 2024
@d-costa d-costa reopened this Apr 6, 2024
@d-costa d-costa self-assigned this Apr 7, 2024
@d-costa d-costa force-pushed the autoscaler branch 2 times, most recently from c03e6c8 to 69bf12c Compare April 9, 2024 08:55
@d-costa d-costa force-pushed the autoscaler branch 2 times, most recently from 3a04ef4 to d37e3c9 Compare August 6, 2024 15:29
@d-costa d-costa force-pushed the autoscaler branch 2 times, most recently from 3c43c83 to 050fbff Compare November 11, 2024 16:47
Add the ability to use an autoscaler to scale down to zero outside the defined schedules.

Only non-stateful MIGs can be used with autoscalers, so this commit also removes the responsibility of creating the home folder disk (atlantis-disk-0) from the MIG, effectively making it a stateless MIG.
Nonetheless, destroying the group will not destroy the disk.

Add resources for the disk and the autoscaler, and a usage example.
Update the README.

BREAKING CHANGE: the 50GB stateful disk is no longer created by the mig, which makes the mig no longer stateful. Additionally, if terraform destroy is executed, the disk is destroyed.

Signed-off-by: David Costa <[email protected]>
@d-costa
Copy link
Collaborator Author

d-costa commented Jan 9, 2025

@bschaatsbergen could we release this as well? Since we are already bumping the major version

@bschaatsbergen
Copy link
Member

While I like the proposal, could you clarify what scaling is actually happening apart from the existing managed instance group that handles the lifecycle of the VM? It seems that the max instance is set to 1, and aside from the disk, I’m not seeing any additional scaling. Could you provide more details on that?

@d-costa
Copy link
Collaborator Author

d-costa commented Jan 10, 2025

The ideia is to only to scale the number of VMs between 0 and 1 through schedules.
This allows, for example, to destroy the VM during the weekend (or whenever teams know they won't need it) to reduce unnecessary costs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants