Egress Connection Timeouts Design #63

rari459 · 2025-01-16T20:50:02Z

As mentioned in the discussion with Cilium Community on 1/8/2025, we propose an additional timeouts field to CiliumEgressGatewayPolicy to allow users to control egress connection timeouts values at the CEGP level. This lays out approaches for designing this feature.

As mentioned in the discussion with Cilium Community on 1/8/2025, we propose an additional timeouts field to CiliumEgressGatewayPolicy to allow users to control egress connection timeouts values at the CEGP level. This lays out approaches for designing this feature. Signed-off-by: rari459 <[email protected]>

Replace struct of connection timeouts with just one __u32 value with represents lifetime of the connection in seconds. Signed-off-by: rari459 <[email protected]>

Signed-off-by: rari459 <[email protected]>

Change back to struct of connection timeout values instead of only one uint32 timeout value. Signed-off-by: rari459 <[email protected]>

julianwiedmann

Thank you, this sort of low-level datapath optimization is an interesting idea! I left some initial thought inline to better understand the proposal.

cilium/egress-connection-timeouts.md

julianwiedmann · 2025-01-22T14:02:31Z

cilium/egress-connection-timeouts.md

+
+Currently, Cilium relies heavily on default operating system connection timeout settings and offers control over connection timeouts at cluster or node level. It is not optimal for all workloads to be bound to these node level timeouts, especially with respect to egress gateways where prolonged idle connections can contribute to port exhaustion on the NAT gateway.
+
+Modifying CiliumEgressPolicy to include an optional timeout field would allow us to ingest custom timeouts and give users additional control over egress connections. 
+
+


Some High-level comment

do you have feedback from potential users on whether they're comfortable fiddling with such low-level values? Making this a per-CEGP configuration feels like a reasonable granularity, but I'm unsure whether admins will know what timeouts to use. And how to tune them.

I haven't grasped whether you're proposing to use the custom timeouts only on the GW node, or also for the CT entry on the worker nodes?

As your motivation is to avoid port exhaustion on the gateway - what interaction do you see with the CT GC engine? Getting reliable results from configuring low CT lifetimes would require an appropriate GC timer, no?

I will try to collect some data on potential users via the Slack Channel and update.

Since Egress Policy Map is available at both nodes, I think it makes the most sense to update timeouts for ct entries for both src and gw nodes.

Right now we do not make any changes to the GC engine. The report interval for GC is set to 5s by default and is configurable. I think this config can be left for the user and out of the scope for this feature, if the user sets low CT lifetimes they should configure the CT report interval to be lower.

Right now we do not make any changes to the GC engine. The report interval for GC is set to 5s by default and is configurable.

huh, I thought we were defaulting to 0s (== dynamic interval) ?

I think this config can be left for the user and out of the scope for this feature, if the user sets low CT lifetimes they should configure the CT report interval to be lower.

But we're dealing with different personas here? Setting a per-policy GC option will only be effective if the node-wide configuration is also tuned accordingly. So I don't think we can just declare this aspect out-of-scope.

Ah yes, I confused GC Interval with Ct Report Interval.
CT GC interval is set dynamically and proportionally by taking into account the percent of entries deleted from the ct_table in each cycle.

I believe this would be sufficient for the initial implementation of this feature as it keeps the footprint of this change small. If a large number of ct_entries are being deleted in each interval on the GW Node due to low CT timeouts, then GC interval should happen more frequently.

I'm proposing this simpler approach for now to minimize the footprint of this change. However, I agree that a more comprehensive review of the GC interval mechanism could be beneficial down the road. Perhaps we can explore that as a follow-up enhancement to this feature?

julianwiedmann · 2025-01-22T14:06:27Z

cilium/egress-connection-timeouts.md

+
+
+*   Add an optional timeout field to CEGP.
+*   Modify CEGP ingestion logic to add new timeout fields to EGRESS_POLICY_MAP


I believe extending the map value will require a map migration. At that point, let's consider what additional values we should place into the map value - I've wanted to store the ifindex of the egress interface for a long time :).

Sounds good, where can I get a list of proposed additional values to add for this map?

I don't think such a list exists, it's something we can raise with @cilium/egress-gateway once work on an implementation has started.

tommyp1ckles · 2025-01-22T16:46:38Z

cilium/egress-connection-timeouts.md

+  <tr>
+   <td style="background-color: null">Engineering Investment
+   </td>
+   <td style="background-color: #ffe599">Modify SNAT Datapath and conntrack creation to ingest and write custom timeouts to egress nat conntrack entries. 


In the context EGW connections, what would be the implication of a conntrack entry timing out via one of these timeouts?

Without a timeout on the socket level wouldn't we possibly see hanging connections?

To broaden a bit, I think we'd need to detail how the various mechanisms for enforcing timeouts would play out in practice - especially with regards to intended use cases.

For example, are we looking for timeout functionality from/in-parity with kernel sysctl settings (ex. ipv4 timeout) such that the connection is immediately terminated following the expiry?

That might help with making a decision.

We want to keep the behavior consistent with current cilium implementation. Today ct timeouts are configurable at the cluster/node level and these timeouts are not passed up to the socket. This feature just looks to improve the granularity of the existing functionality.

CT timeouts are especially useful for the EGW node because they can free up ports from stale connections and prevent exhaustion. This is the primary motivator for the feature; so passing these timeouts up to socket level is not a goal of this feature.

tommyp1ckles · 2025-01-22T18:50:41Z

cilium/egress-connection-timeouts.md

+struct egress_gw_policy_entry {
+	__u32 egress_ip;
+	__u32 gateway_ip;
+	__u8 custom_timeouts_specified;


Migrating entry code could present some complexity with regard to upgrading Cilium. Possible alternative approach could be to enable the lookup code only if the feature is enabled.

Makes sense, we can introduce a new cilium config flag to enable this feature.

cilium/egress-connection-timeouts.md

Signed-off-by: rari459 <[email protected]>

cilium/egress-connection-timeouts.md

Co-authored-by: Bill Mulligan <[email protected]> Signed-off-by: rari459 <[email protected]>

rari459 changed the title ~~Create Egress Connection Timeouts Design for Review~~ Egress Connection Timeouts Design Jan 16, 2025

rari459 added 3 commits January 20, 2025 13:01

Update egress-connection-timeouts.md

4484bf6

Replace struct of connection timeouts with just one __u32 value with represents lifetime of the connection in seconds. Signed-off-by: rari459 <[email protected]>

Update egress-connection-timeouts.md

7e6141d

Signed-off-by: rari459 <[email protected]>

Update egress-connection-timeouts.md

115f76b

Change back to struct of connection timeout values instead of only one uint32 timeout value. Signed-off-by: rari459 <[email protected]>

julianwiedmann reviewed Jan 22, 2025

View reviewed changes

tommyp1ckles reviewed Jan 22, 2025

View reviewed changes

Update egress-connection-timeouts.md

1348474

Signed-off-by: rari459 <[email protected]>

tommyp1ckles self-requested a review January 30, 2025 18:29

xmulligan reviewed Feb 7, 2025

View reviewed changes

cilium/egress-connection-timeouts.md Show resolved Hide resolved

Update cilium/egress-connection-timeouts.md

90835a9

Co-authored-by: Bill Mulligan <[email protected]> Signed-off-by: rari459 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Egress Connection Timeouts Design #63

Egress Connection Timeouts Design #63

rari459 commented Jan 16, 2025

julianwiedmann left a comment

julianwiedmann Jan 22, 2025

rari459 Jan 27, 2025

julianwiedmann Jan 30, 2025

rari459 Feb 5, 2025 •

edited

Loading

julianwiedmann Jan 22, 2025

rari459 Jan 27, 2025

julianwiedmann Jan 30, 2025

tommyp1ckles Jan 22, 2025

tommyp1ckles Jan 22, 2025

rari459 Jan 27, 2025

tommyp1ckles Jan 22, 2025

rari459 Jan 27, 2025


		Currently, Cilium relies heavily on default operating system connection timeout settings and offers control over connection timeouts at cluster or node level. It is not optimal for all workloads to be bound to these node level timeouts, especially with respect to egress gateways where prolonged idle connections can contribute to port exhaustion on the NAT gateway.

		Modifying CiliumEgressPolicy to include an optional timeout field would allow us to ingest custom timeouts and give users additional control over egress connections.



		* Add an optional timeout field to CEGP.
		* Modify CEGP ingestion logic to add new timeout fields to EGRESS_POLICY_MAP

Egress Connection Timeouts Design #63

Are you sure you want to change the base?

Egress Connection Timeouts Design #63

Conversation

rari459 commented Jan 16, 2025

julianwiedmann left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rari459 Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rari459 Feb 5, 2025 •

edited

Loading