Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable budget alerts on azure clusters #4476

Merged
merged 7 commits into from
Jul 24, 2024

Conversation

GeorgianaElena
Copy link
Member

@GeorgianaElena GeorgianaElena commented Jul 23, 2024

Fixes https://github.com/2i2c-org/meta/issues/1278

Important notes

  1. dynamic budget
    I don't think there's a way to automatically adjust the amount on azure like it was done for GCP or AWS because the amount field must be a number and that's it. Based on my reading of the documentation there isn't a way to automatically adjust this number
  2. recent hub
    Because the hub has only been deployed at the end of May, the budget is calculated based on the cost of one month-June, which is the only full month that the hub has been operating for

Todos before merging

  • verify that an alert is triggered
  • document this

@sgibson91
Copy link
Member

sgibson91 commented Jul 23, 2024

2. Because the hub has only been deployed at the end of May, the budget is calculated based on the cost of one month-June, which is the only full month that the hub has been operating for

In another issue somewhere, and I remember adding it to docs (but Erik probably overwrote it when he made everything automatically adjustable), we agreed to use Kitware to set a threshold for new hubs.

ETA: https://github.com/2i2c-org/meta/issues/1258

@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Jul 23, 2024

terraform plan output:

Terraform will perform the following actions:

  # azurerm_consumption_budget_resource_group.budget will be created
  + resource "azurerm_consumption_budget_resource_group" "budget" {
      + amount            = 600
      + etag              = (known after apply)
      + id                = (known after apply)
      + name              = "BudgetResourceGroup"
      + resource_group_id = "/subscriptions/4ca0b08a-26e1-482f-bca6-f4eb0926124a/resourceGroups/2i2c-pchub"
      + time_grain        = "Monthly"

      + notification {
          + contact_emails = [
              + "[email protected]",
            ]
          + contact_groups = []
          + contact_roles  = []
          + enabled        = true
          + operator       = "GreaterThanOrEqualTo"
          + threshold      = 120
          + threshold_type = "Forecasted"
        }

      + time_period {
          + end_date   = "2029-07-01T00:00:00Z"
          + start_date = "2024-07-01T00:00:00Z"
        }
    }

  # azurerm_kubernetes_cluster.jupyterhub will be updated in-place
  ~ resource "azurerm_kubernetes_cluster" "jupyterhub" {
        id                                  = "/subscriptions/4ca0b08a-26e1-482f-bca6-f4eb0926124a/resourceGroups/2i2c-pchub/providers/Microsoft.ContainerService/managedClusters/hub-cluster"
        name                                = "hub-cluster"
        tags                                = {}
        # (35 unchanged attributes hidden)

      ~ default_node_pool {
            name                          = "core"
            tags                          = {}
            # (33 unchanged attributes hidden)

          - upgrade_settings {
              - drain_timeout_in_minutes      = 0 -> null
              - max_surge                     = "10%" -> null
              - node_soak_duration_in_minutes = 0 -> null
            }
        }

        # (5 unchanged blocks hidden)
    }

Plan: 1 to add, 1 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────


@GeorgianaElena
Copy link
Member Author

In another issue somewhere, and I remember adding it to docs (but Erik probably overwrote it when he made everything automatically adjustable), we agreed to use Kitware to set a threshold for new hubs

Thank you @sgibson91! I will add back that piece of info to the docs. However, for this hub, as it's not a new one and has been operating for a full month, I think it's best to take its own cost rather than one from another hub. Does this sound ok to you?

@GeorgianaElena
Copy link
Member Author

Update

I had to replace consumption_budget_resource_group mentioned in the issue for consumption_budget_subscription because there were more than one resource group and we only define one through terraform, the others get automatically created.

@sgibson91 has provided more context about this and possible solution in #4478. Thank you @sgibson91!

@GeorgianaElena
Copy link
Member Author

I've set this to a lower budget limit to see if it triggers an email. The docs say it should take for an email to come through after the evaluation, but not sure how often the evaluation happens.

If this takes a lot of time, I will merge it and check again tomorow for further fixes if it hadn't triggered still.

@GeorgianaElena
Copy link
Member Author

🎉 Yuhoo, it triggered

Screenshot 2024-07-24 at 11 22 12

@GeorgianaElena GeorgianaElena merged commit ee431e8 into 2i2c-org:main Jul 24, 2024
4 checks passed
@GeorgianaElena GeorgianaElena deleted the azure-alerts branch July 24, 2024 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants