-
Notifications
You must be signed in to change notification settings - Fork 173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
intermittent winrm connection failures with large hosts count #597
Comments
There's unfortunately no real resolution for these problems as they are more symptoms of another problem. This could be things like an unreliable network or the host being under high usage breaking the WinRM service in some way. Fixing this problem is not a simple thing as I cannot really give you a one shot solution. I'll try and explain the errors you are getting a bit more though
This is when the client tries to resolve the hostname specified into an IP address. It's purely a DNS specific task and is before any WinRM operations occur. Why this might occur I'm not sure but it's a mandatory first step to figure out how to communicate with the target.
This one is a bit trickier as it could be the result of 2 things; a firewall is explicitly blocking the connection, or nothing is listening at that endpoint port. As it is an intermittent problem I'm not sure what the cause could be behind it but this is the error that the TCP stack returns when it fails to open the socket connection as the server rejected it. You can try the psrp connection plugin which also operates over WinRM but with a newer protocol on top. This has a few configurable knobs for connection retries but it's not guaranteed to solve the problem. Ultimately these problems sit in a few layers below where Ansible sits so there is little we can really do to solve these problems. |
Two questions:
|
I'm not aware of any migration guide for the both but in this case the option is now ansible_psrp_negotiate_delegate. The
In all honesty probably not. I've not really seen much of a benefit over the connection retry mechanism as usually when a retry is needed the underlying service is in a bad state where the retry won't help. The bar for touching |
Hi Jordan, thanks! I will create a POC for migrating to psrp.
Would you mind elaborating these benefits? I'd love to learn. |
The main benefit would be speed improvements. It's nothing substantial but you should see some really great improvements when running a looped task. General tasks should still be a tiny bit quicker but you'll only really see the improvements when the connection is reused (a loop reuses the connection). There are a few improvements in the authentication process but honestly probably nothing you would notice.
While I can't give a definitive date it is part of my current documentation goals so hopefully I'll have something soon. |
Thanks! We have migrated to pysrp now and will evaluate the results in the upcoming weeks. Will close this for now. |
Any followup feedback on the result of that transition (for others like myself) would be appreciated. |
@agibson2 We are very happy, switching to pysrp has fixed the connection issues and made our ci/cd execution much more reliable. |
SUMMARY
Since our number of ansible managed windows hosts is growing over time (currently at 80 windows 2022 servers), I am more and more often running into intermittent winrm connection issues. These happen roughly once every two thousand winrm task executions, but for this number of hosts, it is starting to be problematic.
These issues are intermittent and not reproducible, they do not occur in the beginning of a play, but after previous winrm tasks for the same host were executed successfully and running the play again always fixes the issue. (Even though it has happened that another host fails on the next run...
Here are some sample errors:
I found this issue discussing the same problem, but it appears it was closed with no real solution, except for modifying winrm python files by hand or switching to ssh, both of which is not really a viable option for us. Would love your opinion on this @jborean93 :)
ISSUE TYPE
COMPONENT NAME
winrm
ANSIBLE VERSION
COLLECTION VERSION
CONFIGURATION
OS / ENVIRONMENT
Target OS: windows server 2022
pywinrm-0.4.3
pykerberos-1.2.4
The text was updated successfully, but these errors were encountered: