Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: improve replica counting on openshift #278

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

elmiko
Copy link

@elmiko elmiko commented Jan 5, 2024

DO NOT MERGE THIS YET

This change adds logic to count the number of owned machines by each machineset when calculating the replica count to the core autoscaler. It is needed because the machine-api controllers do not include machines in deleting phase when updating their replica field. This causes a problem with the core autoscaler as the count of nodes will not match the resources from the cloud provider.

This can be removed when the machine-api controllers have been fully removed from openshift.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 5, 2024
Copy link

openshift-ci bot commented Jan 5, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Jan 5, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from elmiko. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elmiko
Copy link
Author

elmiko commented Jan 5, 2024

/test all

This change adds logic to count the number of owned machines by each
machineset when calculating the replica count to the core autoscaler. It
is needed because the machine-api controllers do not include machines in
deleting phase when updating their replica field. This causes a problem
with the core autoscaler as the count of nodes will not match the
resources from the cloud provider.

This can be removed when the machine-api controllers have been fully
removed from openshift.
@elmiko elmiko force-pushed the update-openshift-replica-counting branch from e5c96ba to 32770e2 Compare January 8, 2024 16:40
@elmiko
Copy link
Author

elmiko commented Jan 8, 2024

/test all

@elmiko
Copy link
Author

elmiko commented Jan 8, 2024

/test e2e-aws-operator

Copy link

openshift-ci bot commented Jan 8, 2024

@elmiko: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@JoelSpeed
Copy link

Change appears to be what I had expected, hows the testing going?

@elmiko
Copy link
Author

elmiko commented Jan 15, 2024

everything seems to be working for me, but i don't have a good handle on testing the case where a machineset goes into a failed mode.

@JoelSpeed
Copy link

Is this something QE have a better grip on? Can they help with the testing of this?

@elmiko
Copy link
Author

elmiko commented Jan 23, 2024

i think this is helping, but Zhaohua did a qe run and it didn't solve the problem. it will take some deeper investigation, but i think we want this change.

@elmiko
Copy link
Author

elmiko commented Apr 19, 2024

@JoelSpeed i think it would be worthwhile to get this in, maybe make a jira card to capture what its doing, what do you think?

@JoelSpeed
Copy link

@elmiko Was this not raised as part of a bug investigation? If we don't have a bug for it, I would expect this to be counted as one. We probably want to backport the fix right?

@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 23, 2024
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 22, 2024
@openshift-bot
Copy link

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Sep 22, 2024
Copy link

openshift-ci bot commented Sep 22, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@elmiko
Copy link
Author

elmiko commented Sep 24, 2024

/reopen
/remove-lifecycle rotten

i need to figure out what to do with this, i think we want it but it doesn't seem to affect the bug we were chasing.

@openshift-ci openshift-ci bot reopened this Sep 24, 2024
Copy link

openshift-ci bot commented Sep 24, 2024

@elmiko: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten

i need to figure out what to do with this, i think we want it but it doesn't seem to affect the bug we were chasing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 24, 2024
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants