-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
basehub: rely on a single user-scheduler replica #3869
basehub: rely on a single user-scheduler replica #3869
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the scheduler is down, it means that no user pods will be scheduled. Due to this, I'm concerned about moving this to 1 replica. Figuring out if this is acceptable and what failure modes are needed requires more research, as node failure is not necessarily the only reason to have any tool be HA.
Please bring this up in the next sprint planning meeting so we can prioritize this accordingly with the rest of our tasks.
I'll try to do that, but I propose if we don't manage to allocate time to review this in the next sprint, I'd like to propose we merge it without further review. I'm very confident on this change is fine to make without regressions. I'm looking at shared cluster cost in aws, and found that we have two code nodes running due to too many pods, where each node can schedule up to 57 pods. With this change, we can almost (2 pods too much still) go below the need to have two core nodes. |
With this PR merged and #3999 resolved, we would cut core node costs by half in 2i2c-aws-us |
af2bfd8
to
8c4e472
Compare
I'll go for a merge here to save 2i2c-aws-us and catalystproject-africa almost 200 USD a month per cluster in core node pool costs. I'm very confident on this change being safe to make @yuvipanda and have self-reviewed my thinking about this a ~fourth time or so now. |
🎉🎉🎉🎉 Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/8986088624 |
The issue being fixed (#3865) includes the motivation and why I think its safe enough to do.