when retrying, send to the same proxy that sent the accept #47896

jkarneges · 2024-01-26T22:14:25Z

Previously, retries would round robin between multiple proxies. Note that stats will still be incorrect when multiple proxies are used since the handler shares a single retry-seq value among all proxies. To complete multiple proxy support for accept/retry interactions, the handler will need to be updated to keep a retry-seq value per proxy.

[DEBUG] 2024-01-26 14:08:35.403 [handler] accepting 1 requests from pushpin-proxy_43696
[DEBUG] 2024-01-26 14:08:36.555 [handler] OUT retry: to=pushpin-proxy_43696 { "request-data": { "headers": [ [ "Host", "localhost:7999" ], [ "User-Agent", "curl/8.4.0" ], [ "Accept", "*/*" ] ], "method": "GET", "uri": "http://localhost:7999/longpoll.php", "body": "" }, "retry-seq": 0, "inspect": { "no-proxy": false }, "requests": [ { "rid": { "sender": "condure", "id": "0-0-0" }, "peer-address": "127.0.0.1", "in-seq": 3, "out-seq": 4, "out-credits": 8192 } ] }
[DEBUG] 2024-01-26 14:08:36.556 [proxy] retry: IN { "inspect": { "no-proxy": false }, "request-data": { "body": "", "headers": [ [ "Host", "localhost:7999" ], [ "User-Agent", "curl/8.4.0" ], [ "Accept", "*/*" ] ], "method": "GET", "uri": "http://localhost:7999/longpoll.php" }, "requests": [ { "in-seq": 3, "rid": { "id": "0-0-0", "sender": "condure" }, "out-seq": 4, "out-credits": 8192, "peer-address": "127.0.0.1" } ], "retry-seq": 0 }

sima-fastly

@jkarneges can you please explain why its important to retry on the same proxy that sent the accept

jkarneges · 2024-01-29T18:52:38Z

can you please explain why its important to retry on the same proxy that sent the accept

It has to do with connection max stats accuracy.

When the handler passes a connection to the proxy for retrying, it discards everything it knows about the connection except for the info in StatsManager which is put into "linger" mode. When the proxy later sends conn-max stats to the handler for aggregation, the handler clears out its lingered entry. The reason for this lingering is to avoid undercounting, in case the handler happens to generate a stats report right after sending a retry but before the proxy has processed the retry and sent new stats.

For example, if the proxy is managing 3 connections and the handler is managing 7, the handler should report a total of 10 (it will receive conn-max packets from the proxy with a value of 3, which it aggregates with its own local count of 7). If the handler then passes a connection to the proxy for retrying, reducing its count to 6, and the proxy sends a conn-max value of 4, before the handler generates a new report, then the next report will still be 10, which is good. Or if the handler passes a connection to the proxy, reducing its count to 6, and the proxy fully processes/discards that connection and sends a conn-max value of 3, before the handler generates a new report, then the next report will be 9, which is also good. However, if the handler passes a connection to the proxy, reducing its count to 6, and the proxy is still processing that connection and has not yet sent a conn-max value, such that the last proxy conn-max value known by the handler is 3, then without this lingering behavior the next report from the handler would be 9, which is not right. The lingering behavior ensures the report would be 10.

The tricky part is knowing when to clear out the lingered entry. The handler does this by keeping a sequence number (the "retry seq"), which it increments whenever it sends a retry and it includes this value in the retry packet. The proxy keeps a copy of this value, updating it whenever it processes a retry, and includes the latest value in any conn-max packets it sends to the handler. When the handler receives a conn-max packet, it uses the included retry-seq value as a kind of "ack" to know which lingering connections to clear out. The proxy doesn't know the meaning if the seq value, it's basically just echoing it back to the handler, as a way for the handler to know what the proxy has seen.

To support multiple proxies, the handler will need to maintain a retry seq value with each one. This is because if the seq value were shared, for example if the handler sends seq 1 to proxy A, and seq 2 to proxy B, then if the next received conn-max packet is from proxy B with seq 2, it would imply that seq 1 has been processed but this is actually not known. So there needs to be a separately incremented seq value per-proxy.

With that context, one way to manage this per-proxy communication is to always retry with the same proxy that sent the accept. :)

Technically, these seq values have nothing to do with accepts. Keying on accept is just a simple way to ensure each proxy receives independent incrementing seqs. Alternatively, it could be possible to round robin the retries, and decide which seq value to include in the retry packet based on which proxy was selected as the destination, but this is harder to do.

sima-fastly

looks great

sima-fastly reviewed Jan 29, 2024

View reviewed changes

when retrying, send to the same proxy that sent the accept

27fecfb

jkarneges force-pushed the jkarneges/directed-retries branch from 3ab04d6 to 27fecfb Compare January 29, 2024 18:57

sima-fastly approved these changes Jan 29, 2024

View reviewed changes

jkarneges merged commit e36e2c6 into main Jan 29, 2024
1 check passed

jkarneges deleted the jkarneges/directed-retries branch January 29, 2024 21:47

sima-fastly pushed a commit that referenced this pull request Mar 14, 2024

when retrying, send to the same proxy that sent the accept (#47896)

14ab91b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when retrying, send to the same proxy that sent the accept #47896

when retrying, send to the same proxy that sent the accept #47896

jkarneges commented Jan 26, 2024

sima-fastly left a comment

jkarneges commented Jan 29, 2024

sima-fastly left a comment

when retrying, send to the same proxy that sent the accept #47896

when retrying, send to the same proxy that sent the accept #47896

Conversation

jkarneges commented Jan 26, 2024

sima-fastly left a comment

Choose a reason for hiding this comment

jkarneges commented Jan 29, 2024

sima-fastly left a comment

Choose a reason for hiding this comment