-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when retrying, send to the same proxy that sent the accept #47896
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkarneges can you please explain why its important to retry on the same proxy that sent the accept
It has to do with connection max stats accuracy. When the handler passes a connection to the proxy for retrying, it discards everything it knows about the connection except for the info in StatsManager which is put into "linger" mode. When the proxy later sends conn-max stats to the handler for aggregation, the handler clears out its lingered entry. The reason for this lingering is to avoid undercounting, in case the handler happens to generate a stats report right after sending a retry but before the proxy has processed the retry and sent new stats. For example, if the proxy is managing 3 connections and the handler is managing 7, the handler should report a total of 10 (it will receive conn-max packets from the proxy with a value of 3, which it aggregates with its own local count of 7). If the handler then passes a connection to the proxy for retrying, reducing its count to 6, and the proxy sends a conn-max value of 4, before the handler generates a new report, then the next report will still be 10, which is good. Or if the handler passes a connection to the proxy, reducing its count to 6, and the proxy fully processes/discards that connection and sends a conn-max value of 3, before the handler generates a new report, then the next report will be 9, which is also good. However, if the handler passes a connection to the proxy, reducing its count to 6, and the proxy is still processing that connection and has not yet sent a conn-max value, such that the last proxy conn-max value known by the handler is 3, then without this lingering behavior the next report from the handler would be 9, which is not right. The lingering behavior ensures the report would be 10. The tricky part is knowing when to clear out the lingered entry. The handler does this by keeping a sequence number (the "retry seq"), which it increments whenever it sends a retry and it includes this value in the retry packet. The proxy keeps a copy of this value, updating it whenever it processes a retry, and includes the latest value in any conn-max packets it sends to the handler. When the handler receives a conn-max packet, it uses the included retry-seq value as a kind of "ack" to know which lingering connections to clear out. The proxy doesn't know the meaning if the seq value, it's basically just echoing it back to the handler, as a way for the handler to know what the proxy has seen. To support multiple proxies, the handler will need to maintain a retry seq value with each one. This is because if the seq value were shared, for example if the handler sends seq 1 to proxy A, and seq 2 to proxy B, then if the next received conn-max packet is from proxy B with seq 2, it would imply that seq 1 has been processed but this is actually not known. So there needs to be a separately incremented seq value per-proxy. With that context, one way to manage this per-proxy communication is to always retry with the same proxy that sent the accept. :) Technically, these seq values have nothing to do with accepts. Keying on accept is just a simple way to ensure each proxy receives independent incrementing seqs. Alternatively, it could be possible to round robin the retries, and decide which seq value to include in the retry packet based on which proxy was selected as the destination, but this is harder to do. |
3ab04d6
to
27fecfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great
Previously, retries would round robin between multiple proxies. Note that stats will still be incorrect when multiple proxies are used since the handler shares a single retry-seq value among all proxies. To complete multiple proxy support for accept/retry interactions, the handler will need to be updated to keep a retry-seq value per proxy.