-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious failures in CI #3793
Comments
In #3789, simply rerunning the Circle CI test succeeded. |
The reported error is a My guess at what is happening is that the SE API is rate limiting the IP address. This may, or may not, have been preceded by the SE API sending a There was a bug in the |
The code in that section of the |
Spurious failures in CI because of flaky DNS
(Giftedly copy/pasted a spelling error. Let's not make more changes on mobile.) |
So let me break this down a little. CI failures have happened with select domains. These domains are getting in DNS what we call SERVFAIL lookups - which means that whatever authoritative DNS server serves records for that domain could not be reached or is misconfigured. This isnt necessarily DNS issues in CircleCI or Travis but issues with the specific domains being tested's nameservers. We need to adjust our tests to catch the SERVFAIL cases and such. Let me poke around some tests myself and see if we can adjust the tests to catch these SERVFAILS and just pass over those tests without hard failing... |
A DNS SERVFAIL should result in a warning in the test environment. It's not something critical that will stop Smokey running, but should be logged so that someone can come around and remove the failing domains. |
Agreed. However, what we've got in CI is that it's hard-failing because it's an uncaught exception and raising dns.errors.NoNameservers which isn't handled in existing tests. |
Keep an eye on https://github.com/Charcoal-SE/SmokeDetector/tree/dns-tests This adds an except handler to handle NoNameservers errors - this was previously an uncaught error in the tests, but now it'll catch and debug log just like the other errors we catch, but without resolving any details because there's no DNS Nameservers available for the request. This will, however, catch the error. |
@makyen yet another uncaught exception. That we can fix for tests too. I'll write up a bit for those failures shortly and get that pushed in. |
I pushed an additional handler capture on DNS Timeouts - the DNS Timeout one now throws a warning into the logs but doesn't error out with an unhandled exception. It's probable that Travis or Circle CIs might have janky DNS capabilities, so we'll have to just gracefully handle DNS lookup errors instead of letting them go uncaught which is what caused the spurious CI errors. |
What problem has occurred? What issues has it caused?
Two of my recent PRs had Circle CI failures in code which I had not touched. One of them failed, then succeeded after I rebased my commits -- the other exhibited the opposite behavior: test passed, then I rebased the commits and force pushed, and the same code failed.
For the record; #3789 #3790
What would you like to happen/not happen?
The test should not fail spuriously.
I think I had this on my laptop occasionally too, so I don't think it's specific to Circle CI.
The failing code has some comments which vaguely hint at what might be wrong, but then why would they fail only some of the time?
The text was updated successfully, but these errors were encountered: