Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egress Connection Timeouts Design #63

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

rari459
Copy link

@rari459 rari459 commented Jan 16, 2025

As mentioned in the discussion with Cilium Community on 1/8/2025, we propose an additional timeouts field to CiliumEgressGatewayPolicy to allow users to control egress connection timeouts values at the CEGP level. This lays out approaches for designing this feature.

As mentioned in the discussion with Cilium Community on 1/8/2025, we propose an additional timeouts field to CiliumEgressGatewayPolicy to allow users to control egress connection timeouts values at the CEGP level. 
This lays out approaches for designing this feature.

Signed-off-by: rari459 <[email protected]>
@rari459 rari459 changed the title Create Egress Connection Timeouts Design for Review Egress Connection Timeouts Design Jan 16, 2025
Replace struct of connection timeouts with just one __u32 value with represents lifetime of the connection in seconds.

Signed-off-by: rari459 <[email protected]>
Change back to struct of connection timeout values instead of only one uint32 timeout value.

Signed-off-by: rari459 <[email protected]>
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this sort of low-level datapath optimization is an interesting idea! I left some initial thought inline to better understand the proposal.

@@ -0,0 +1,319 @@
# CFP-TBD: Egress Connection Timeouts

**SIG: SIG-POLICY**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right mailbox. I'd expect these changes to primarily touch on @cilium/egress-gateway and @cilium/sig-datapath territory.

Comment on lines +20 to +25

Currently, Cilium relies heavily on default operating system connection timeout settings and offers control over connection timeouts at cluster or node level. It is not optimal for all workloads to be bound to these node level timeouts, especially with respect to egress gateways where prolonged idle connections can contribute to port exhaustion on the NAT gateway.

Modifying CiliumEgressPolicy to include an optional timeout field would allow us to ingest custom timeouts and give users additional control over egress connections.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some High-level comment

  1. do you have feedback from potential users on whether they're comfortable fiddling with such low-level values? Making this a per-CEGP configuration feels like a reasonable granularity, but I'm unsure whether admins will know what timeouts to use. And how to tune them.
  2. I haven't grasped whether you're proposing to use the custom timeouts only on the GW node, or also for the CT entry on the worker nodes?
  3. As your motivation is to avoid port exhaustion on the gateway - what interaction do you see with the CT GC engine? Getting reliable results from configuring low CT lifetimes would require an appropriate GC timer, no?



* Add an optional timeout field to CEGP.
* Modify CEGP ingestion logic to add new timeout fields to EGRESS_POLICY_MAP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe extending the map value will require a map migration. At that point, let's consider what additional values we should place into the map value - I've wanted to store the ifindex of the egress interface for a long time :).

<tr>
<td style="background-color: null">Engineering Investment
</td>
<td style="background-color: #ffe599">Modify SNAT Datapath and conntrack creation to ingest and write custom timeouts to egress nat conntrack entries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context EGW connections, what would be the implication of a conntrack entry timing out via one of these timeouts?

Without a timeout on the socket level wouldn't we possibly see hanging connections?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To broaden a bit, I think we'd need to detail how the various mechanisms for enforcing timeouts would play out in practice - especially with regards to intended use cases.

For example, are we looking for timeout functionality from/in-parity with kernel sysctl settings (ex. ipv4 timeout) such that the connection is immediately terminated following the expiry?

That might help with making a decision.

struct egress_gw_policy_entry {
__u32 egress_ip;
__u32 gateway_ip;
__u8 custom_timeouts_specified;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Migrating entry code could present some complexity with regard to upgrading Cilium. Possible alternative approach could be to enable the lookup code only if the feature is enabled.

</td>
<td style="background-color: #b6d7a8">Watch CEGP and modify CiliumNodeConfig for the respective gateway node
</td>
<td style="background-color: #ea9999"> New bpf program to hook into sockopt syscall event

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I know, this would be a novel approach in Cilium and would require building out new bpf infrastructure that we don't currently. We may want to consider that for this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants