-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WinRM: Bails out with "[Errno 111] Connection refused" #25532
Comments
The same issue appears consistently when installing SCVMM (in async mode). If you have the following tasks in a playbook: - name: Transfer System-Center ISO
win_get_url:
url: '{{ binaries_source }}/mu_system_center_2012_r2_virtual_machine_manager_x86_and_x64_dvd_2913737.iso'
dest: C:\Windows\Temp\mu_system_center_2012_r2_virtual_machine_manager_x86_and_x64_dvd_2913737.iso
force: no
skip_certificate_validation: yes
- name: Mount System-Center ISO image
win_disk_image:
image_path: 'C:\Windows\Temp\mu_system_center_2012_r2_virtual_machine_manager_x86_and_x64_dvd_2913737.iso'
state: present
register: iso
- name: Run System-Center installer
win_command: >
{{ iso.mount_path }}setup.exe /server /i /f "C:\Windows\Temp\VMServer.ini"
/SqlDBAdminDomain "{{ dc }}" /SqlDBAdminName "{{ windows_admin_user }}" /SqlDBAdminPassword "{{ windows_admin_password }}"
/VmmServiceDomain "{{ dc }}" /VmmServiceUserName "scvmmsvc" /VmmServiceUserPassword "{{ windows_admin_password }}"
/IACCEPTSCEULA
args:
creates: 'C:\Program Files\Microsoft System Center 2012 R2\Virtual Machine Manager\bin\VmmAdminUi.exe'
vars:
ansible_user: '{{ dc }}\{{ windows_admin_user }}'
when: not vmadminui.stat.exists
register: systemcenter
async: 1000
poll: 15
ignore_errors: yes
- name: Run System-Center installer (again)
win_command: >
{{ iso.mount_path }}setup.exe /server /i /f "C:\Windows\Temp\VMServer.ini"
/SqlDBAdminDomain "{{ dc }}" /SqlDBAdminName "{{ windows_admin_user }}" /SqlDBAdminPassword "{{ windows_admin_password }}"
/VmmServiceDomain "{{ dc }}" /VmmServiceUserName "scvmmsvc" /VmmServiceUserPassword "{{ windows_admin_password }}"
/IACCEPTSCEULA
args:
creates: 'C:\Program Files\Microsoft System Center 2012 R2\Virtual Machine Manager\bin\VmmAdminUi.exe'
vars:
ansible_user: '{{ dc }}\{{ windows_admin_user }}'
when: not vmadminui.stat.exists and systemcenter|failed
async: 1000
poll: 15 It fails consistently with a Connection refused:
|
So I wrote a simple implementation that would retry 5 times for 5 seconds when dealing with Connection refused, and I ended up with reproducible HTTP 500 errors after that (with Windows 2012R2). When I then upgraded WMF/PS 4.0 to WMF/PS 5.1, the HTTP 500 errors were a thing of the past, while the task would then succeed successfully ! But the task would only work successfully if it was run in async mode, if not I would get a probem related to a None value being provided to the ElementTree parser. I haven't looked into that issue. So it seems that HTTP 500 issues with WinRM (likely due to the WinRM service being restarted) disappear with WMF 5.1 ! (Potentially with other scenarios) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I updated my patch to pywinrm to recover from this at: diyan/pywinrm#174 Now you can set reconnection_retries and reconnection_backoff (e.g. resp to 4 retries and 2.0 seconds) to recover from temporary Connection Refused situations. This can recover from e.g. installing SCVMM (which apparently makes WinRM unavailable for a short while). The backoff period is 2, 4, 8, 16 (=30) seconds. |
I also implemented the same solution for pypsrp now at: jborean93/pypsrp#10 |
Here is a quick-fix for hand-editing your pywinrm installation: --- a/winrm/protocol.py
+++ b/winrm/protocol.py
@@ -158,6 +158,16 @@ class Transport(object):
settings = session.merge_environment_settings(url=self.endpoint, proxies={}, stream=None,
verify=None, cert=None)
+ # Retry on connection errors, with a backoff factor
+ retries = requests.packages.urllib3.util.retry.Retry(total=4,
+ connect=4,
+ status=4,
+ read=0,
+ backoff_factor=2.0,
+ status_forcelist=(413, 425, 429, 503))
+ session.mount('http://', requests.adapters.HTTPAdapter(max_retries=retries))
+ session.mount('https://', requests.adapters.HTTPAdapter(max_retries=retries))
+
# get proxy settings from env
# FUTURE: allow proxy to be passed in directly to supersede this value
session.proxies = settings['proxies'] Or your pypsrp installation: --- a/pypsrp/wsman.py
+++ b/pypsrp/wsman.py
@@ -773,6 +773,18 @@ class _TransportHTTP(object):
elif self.no_proxy:
session.proxies = orig_proxy
+ # Retry on connection errors, with a backoff factor
+ retries = requests.packages.urllib3.util.retry.Retry(
+ total=4,
+ connect=4,
+ status=4,
+ read=0,
+ backoff_factor=2.0,
+ status_forcelist=(413, 425, 429, 503),
+ )
+ session.mount('http://', requests.adapters.HTTPAdapter(max_retries=retries))
+ session.mount('https://', requests.adapters.HTTPAdapter(max_retries=retries))
+
# set cert validation config
session.verify = self.cert_validation This will implement 4 retries with an exponential back-off of 2.0 seconds. Please test and report back. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
How to implement or ensure 3-5 retries for winrm playbooks ? getting random 104 errors across deploys in azure can only see reconnection_retries under psrp options |
Thanks to @dagwieers ! |
@ullibo You mentioned you patched transport.py. What was the patch? Many Thanks |
@AL71B : the lines between #UKI ....
|
Hello everyone. Since SSH can now be installed on Windows my team is in the process of switching over to OpenSSH on Windows as these issues are resolved with switching to SSH. I just wanted to let everyone know to ease their pain with WinRM. Running Ansible with SSH to Windows is a dramatic improvement as with what comes with SSH. Please also check out Mitogen for Ansible, a performance add on for SSH. Mitogen: https://mitogen.networkgenomics.com/ansible_detailed.html Read Up: https://docs.microsoft.com/en-us/windows-server/administration/openssh/openssh_overview GitHub Project: https://github.com/PowerShell/openssh-portable |
Closing as we have a few workaround for winrm, the retry options for psrp and now ssh is a solution. Unfortunately there is not too much else we can do for this issue. |
ISSUE TYPE
COMPONENT NAME
WinRM
ANSIBLE VERSION
v2.4
OS / ENVIRONMENT
Control master: RHEL7
Target nodes: Windows 2012R2 (with Powershell 4.0, also tried Powershell 5.1)
SUMMARY
I just experienced again a Connection refused. The task was waiting for 3 VMs to appear (wait_for_connection doing a win_ping test), the last VM to come online then gave me a Connection refused in the next task doing setup.
We are using CredSSP.
I wonder if we could retry longer/delayed on Connection refused to hopefully make it survive such intermittent issues better.
It is not unlikely that during the first boot the WinRM service starts, stops and then starts again, causing the "Connection refused", however we should recover from this situation if it appears.
This relates to #23320 (more examples from others there)
The playbook looks like this, and it fails on the setup task.
The text was updated successfully, but these errors were encountered: