Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP server keep-alive times out even though data is being received. #235

Open
dbrnz opened this issue May 20, 2024 · 2 comments
Open

TCP server keep-alive times out even though data is being received. #235

dbrnz opened this issue May 20, 2024 · 2 comments

Comments

@dbrnz
Copy link

dbrnz commented May 20, 2024

We have a quart based application server running under hypercorn that, upon receipt of a POST, runs a time-consuming process (using multiprocessing in a separate thread). The initiating client then polls every second using GET to obtain information about its long running process. All well and good, and everything works as expected in a local network environment.

Things though start breaking when the server is deployed behind an Apache proxy, with the client occasionally getting 502 proxy errors. After debugging and tracing things, and comparing with the working local client, it appears that:

  1. In the localhost case, a new TCPServer is created for each GET request. This server enters idle wait after reading the request and responding -- the next read finds EOF (because the requests based client has closed the connection after it received a response), that results in the task group and the server's writer stream closing, cancelling the idle wait, and terminating the server instance.
  2. The proxy case differs in that only two TCPServer instances are created, as Apache keeps socket connections open and uses them for subsequent requests. Depending on how requests are shared between the servers, one can timeout on keep-alive. For example, with a three second timeout and one second polling, if server B handles three concurrent GET requests after server A goes into idle wait, then server A will timeout. This timeout results in the underlying socket connection being closed and a subsequent 502 from Apache.

@pgjones does this reasoning make sense? It certainly is consistent with my observations and traces. The workaround for me is to increase the keep-alive timeout to something like 10 minutes.

dbrnz added a commit to AnatomicMaps/flatmap-server that referenced this issue May 20, 2024
@pgjones
Copy link
Owner

pgjones commented May 26, 2024

Could the issue be explained here? If so I think the default keep alive timeout for apache and Hypercorn is 5 seconds so I'd try with Hypercorn at 6 and see if the issue is solved?

@dbrnz
Copy link
Author

dbrnz commented May 27, 2024

Thanks for that link -- it certainly explains things! I now retry proxy failures (502, 503 and 504 result codes) and all seems to be well with the default 5 second timeouts, although I will try your suggestion of a six second Hypercorn timeout to see if that works instead.

BTW, is there a way to have Hypercorn log when it closed a connection because of a timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants