-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Option to configure node-reaper differently for specific node group #74
Comments
At present, No. Such option doesn't exist. Just a thought - the nodes that are older and must be terminated, won't they go out of compliance when they are waiting for the newer nodes to age?
node-reaper is a cronjob that runs every 10 mins. If the state of the cluster is relatively stable and no new nodes are getting added in 7 days, maybe you can configure this to run once every 7 days?
You can explore upgrade-manager and see if it fits your requirements. upgrade-manager is a controller that will perform node rotations. You can find more details in these flow charts and additional explanation in this comment. Note: upgrade-manager isn't a cronjob like node-reaper. Instead, it is a controller that reconciles on a custom resource - |
Thanks @shreyas-badiger for the quick response. I think we have some grace period to terminate older nodes. But mainly looking into not reaping too many times a day for a specific node group as each recycle event will trigger entire job restart. I believe we are running scanning every 20 minutes to find nodes for reaping and reap one node at a time. It would have been nicer if we can adjust configuration for certain node group(s) and reap all candidates together. But as you said it is not possible as node reaper is not aware of the node group information. Do you think it is something we can add support by reading node labels? Thanks for the details on upgrade-manager. Will explore its capabilities. |
We can surely talk about this further. In my opinion, this will probably be a whole new feature. This could change how we list the nodes and parse them. We can connect over keikoproj slack , #proj-governor-reapers to discuss more about this.
Currently, we use upgrade-manager to upgrade clusters that run Flink. |
Thanks again. I will join slack channel and continue conversation there. |
Is this a BUG REPORT or FEATURE REQUEST?:
QUESTION
What happened:
We are running node reaper in our kube cluster for reaping nodes older than 7d for security and compliance reasons. Some of our workloads are ML workloads (Apache Flink jobs) and they run on a specific node group. When we reap a node in that node group, the entire ML job need to get restarted due to the architecture of Flink (same with many ML architectures). So if we reap nodes in the node group one by one (many times a day when nodes are 7d old), the job needs to be restarted many times in a day and it is adding up significant processing lag. As reaping a node older than 7d is same as reaping all nodes in that node group, we are wondering:
As noted above we are looking for the advance configuration for few special node groups in addition to the regular options for all other nodes.
Thanks in advance for any help on this.
What you expected to happen:
Option to configure a specific node group differently
How to reproduce it (as minimally and precisely as possible):
N/A
Anything else we need to know?:
N/A
Environment:
Other debugging information (if applicable):
The text was updated successfully, but these errors were encountered: