-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS Outages #99
Labels
Comments
Update: The RDS servers used for the database back end of pdns were recently upgraded to have double the RAM, but the registration server went down again 9 days later. Some more complete logs... System log of docker image:
/var/log/supervisor/supervisord.log inside docker image:
See also: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
STR:
Expected:
Actual:
This has been happening regularly for many months now, and requires a reboot of the registration server EC2 instances in order to fix it. We believe it is caused by PowerDNS crashing so that the registration server no longer resolves DNS lookups.
In the logs of the registration server docker container there is an error which says "5001 questions waiting for database/backend attention. Limit is 5000, respawning". pdns then re-spawns and after that happens so many times, the init system in the docker container gives up and just kills it. This is happening on both EC2 instances.
We think that the DNS servers are occasionally getting overwhelmed by traffic but we don't know where it's coming from, I suspect it isn't WebThings users because there are lots of failed lookups for subdomains that don't exist in the logs.
Some potential solutions:
My personal preference is to start with option 1 and see if it helps. I suspect the spikes in traffic are not coming from WebThings users and if we cut off the source of the excessive traffic the service would hopefully go back to being stable again.
If anyone has experience of configuring rate limiting for pdns, I would be grateful for some help.
The text was updated successfully, but these errors were encountered: