Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when retrying, send to the same proxy that sent the accept #47896

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

jkarneges
Copy link
Member

Previously, retries would round robin between multiple proxies. Note that stats will still be incorrect when multiple proxies are used since the handler shares a single retry-seq value among all proxies. To complete multiple proxy support for accept/retry interactions, the handler will need to be updated to keep a retry-seq value per proxy.

[DEBUG] 2024-01-26 14:08:35.403 [handler] accepting 1 requests from pushpin-proxy_43696
[DEBUG] 2024-01-26 14:08:36.555 [handler] OUT retry: to=pushpin-proxy_43696 { "request-data": { "headers": [ [ "Host", "localhost:7999" ], [ "User-Agent", "curl/8.4.0" ], [ "Accept", "*/*" ] ], "method": "GET", "uri": "http://localhost:7999/longpoll.php", "body": "" }, "retry-seq": 0, "inspect": { "no-proxy": false }, "requests": [ { "rid": { "sender": "condure", "id": "0-0-0" }, "peer-address": "127.0.0.1", "in-seq": 3, "out-seq": 4, "out-credits": 8192 } ] }
[DEBUG] 2024-01-26 14:08:36.556 [proxy] retry: IN { "inspect": { "no-proxy": false }, "request-data": { "body": "", "headers": [ [ "Host", "localhost:7999" ], [ "User-Agent", "curl/8.4.0" ], [ "Accept", "*/*" ] ], "method": "GET", "uri": "http://localhost:7999/longpoll.php" }, "requests": [ { "in-seq": 3, "rid": { "id": "0-0-0", "sender": "condure" }, "out-seq": 4, "out-credits": 8192, "peer-address": "127.0.0.1" } ], "retry-seq": 0 }

Copy link
Contributor

@sima-fastly sima-fastly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkarneges can you please explain why its important to retry on the same proxy that sent the accept

@jkarneges
Copy link
Member Author

can you please explain why its important to retry on the same proxy that sent the accept

It has to do with connection max stats accuracy.

When the handler passes a connection to the proxy for retrying, it discards everything it knows about the connection except for the info in StatsManager which is put into "linger" mode. When the proxy later sends conn-max stats to the handler for aggregation, the handler clears out its lingered entry. The reason for this lingering is to avoid undercounting, in case the handler happens to generate a stats report right after sending a retry but before the proxy has processed the retry and sent new stats.

For example, if the proxy is managing 3 connections and the handler is managing 7, the handler should report a total of 10 (it will receive conn-max packets from the proxy with a value of 3, which it aggregates with its own local count of 7). If the handler then passes a connection to the proxy for retrying, reducing its count to 6, and the proxy sends a conn-max value of 4, before the handler generates a new report, then the next report will still be 10, which is good. Or if the handler passes a connection to the proxy, reducing its count to 6, and the proxy fully processes/discards that connection and sends a conn-max value of 3, before the handler generates a new report, then the next report will be 9, which is also good. However, if the handler passes a connection to the proxy, reducing its count to 6, and the proxy is still processing that connection and has not yet sent a conn-max value, such that the last proxy conn-max value known by the handler is 3, then without this lingering behavior the next report from the handler would be 9, which is not right. The lingering behavior ensures the report would be 10.

The tricky part is knowing when to clear out the lingered entry. The handler does this by keeping a sequence number (the "retry seq"), which it increments whenever it sends a retry and it includes this value in the retry packet. The proxy keeps a copy of this value, updating it whenever it processes a retry, and includes the latest value in any conn-max packets it sends to the handler. When the handler receives a conn-max packet, it uses the included retry-seq value as a kind of "ack" to know which lingering connections to clear out. The proxy doesn't know the meaning if the seq value, it's basically just echoing it back to the handler, as a way for the handler to know what the proxy has seen.

To support multiple proxies, the handler will need to maintain a retry seq value with each one. This is because if the seq value were shared, for example if the handler sends seq 1 to proxy A, and seq 2 to proxy B, then if the next received conn-max packet is from proxy B with seq 2, it would imply that seq 1 has been processed but this is actually not known. So there needs to be a separately incremented seq value per-proxy.

With that context, one way to manage this per-proxy communication is to always retry with the same proxy that sent the accept. :)

Technically, these seq values have nothing to do with accepts. Keying on accept is just a simple way to ensure each proxy receives independent incrementing seqs. Alternatively, it could be possible to round robin the retries, and decide which seq value to include in the retry packet based on which proxy was selected as the destination, but this is harder to do.

@jkarneges jkarneges force-pushed the jkarneges/directed-retries branch from 3ab04d6 to 27fecfb Compare January 29, 2024 18:57
Copy link
Contributor

@sima-fastly sima-fastly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@jkarneges jkarneges merged commit e36e2c6 into main Jan 29, 2024
1 check passed
@jkarneges jkarneges deleted the jkarneges/directed-retries branch January 29, 2024 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants