You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following #4318 and the migration of ci.jenkins.io to AWS, it looks like that the EKS Windows container agents are not working as expected.
Our tests were successfully, but in real condition (multiple builds of multiple plugins at the same time) shows a lot of memory management errors: #4552.
Short term: we are reverting back from using containers to VMs for Windows agents. Rationale is that the scope of projects impacted by build or tests failures is way less important with Windows VM agents in the current state.
Medium term: additional work is required to fix the failing plugin builds. It includes the remoting" component which is really important.
Long term: we have to reconsider even using Windows containers for agents. It was a useful technique years ago to provide Windows agents (when using ACI) but Windows VMs are easier to operate (for the same cost as containers), and are even faster.
We are facing the following problems regarding Windows container agents:
The operational cost in AWS (compared to Azure) is too high.
Kubernetes Nodes do not behave the same between AWS and Azure when running JVM builds
The performances of containers are not that good: need to spin up a node (5 to 9 min. currently) , then pull the image
Only if there has been no recent request for a Windows container agent, surely? Because you are running a Windows node pool that can scale to zero? Once a node is running, it ought to be able to create fresh pods very quickly.
The performances of containers are not that good: need to spin up a node (5 to 9 min. currently) , then pull the image
Only if there has been no recent request for a Windows container agent, surely? Because you are running a Windows node pool that can scale to zero? Once a node is running, it ought to be able to create fresh pods very quickly.
If the node was statically sized to handle multiple nodes (so not with Karpenter which select optimized instance for the current workload when required). Then it moves to the billing part: either you have a probability to have a pod coming up quickly (e.g. keep the node running a few minute but it costs more) with the same image (e.g. no additional pull) or the node is recycled and you're back to step 0.
=> it is a non deterministic cycle and even ci.jenkins.io does not a huge the critical mass for windows case only.
However these elements could be optimized (windows nodes fast to start, unifying the container images, precaching the container image, etc.) with some effort. The main concern is around the Windows memory management. Not sure why we had these OOM while having huge machines
Following #4318 and the migration of ci.jenkins.io to AWS, it looks like that the EKS Windows container agents are not working as expected.
Our tests were successfully, but in real condition (multiple builds of multiple plugins at the same time) shows a lot of memory management errors: #4552.
Short term: we are reverting back from using containers to VMs for Windows agents. Rationale is that the scope of projects impacted by build or tests failures is way less important with Windows VM agents in the current state.
Medium term: additional work is required to fix the failing plugin builds. It includes the remoting" component which is really important.
Long term: we have to reconsider even using Windows containers for agents. It was a useful technique years ago to provide Windows agents (when using ACI) but Windows VMs are easier to operate (for the same cost as containers), and are even faster.
We are facing the following problems regarding Windows container agents:
wip
The text was updated successfully, but these errors were encountered: