Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

kahirokunn · 2025-01-15T06:40:57Z

Proposed Unified Solution

Enhance the AWS Load Balancer Controller to directly register IP addresses from EndpointSlices in “target-type: ip” mode, even if those addresses are intended for multi-cluster usage (MCS) or represent external endpoints. This can be done by:

Recognizing that an EndpointSlice may contain additional or external IP addresses (for instance, based on TargetRef.Kind != "Pod").
Incorporating these addresses into the Target Group, alongside the local cluster Pod IPs already handled.

A relevant part of the AWS Load Balancer Controller’s current design is located here:

aws-load-balancer-controller/pkg/backend/endpoint_resolver.go

Lines 155 to 157 in c701a42

    
           if ep.TargetRef == nil || ep.TargetRef.Kind != "Pod" { 
        
           	continue 
        
           }

Here, the logic could be extended to handle these alternative address types. For example, if the endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io label is missing, the Controller might treat the EndpointSlice’s IP addresses as external IPs; or if EndpointSlice.Endpoints[].TargetRef.Kind != "Pod", the Controller might interpret them as external endpoints.

In both cases, the goal remains the same: provide direct integration with new or external IP addresses listed in EndpointSlices, reducing complexity and offering more efficient traffic routing.

Alternatives Considered

Using “target-type: instance”

This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Example: MCS with Additional Cluster IPs

Below is a sample configuration demonstrating how MCS might export a Service, creating an EndpointSlice in one cluster with Pod IPs from another cluster:

apiVersion: v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  selector:
    app: example
  ports:
    - name: http
      port: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80}]'
spec:
  rules:
    - http:
        paths:
          - path: /*
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-service
                port:
                  number: 80
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: example-service-remotecluster
  namespace: default
  labels:
    kubernetes.io/service-name: example-service
addressType: IPv4
ports:
  - name: "http"
    port: 80
    protocol: TCP
endpoints:
  - addresses:
      - 10.11.12.13   # Pod IP on a remote EKS cluster
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: remote-node-1
    zone: remote-az-1

With the proposed feature enabled, the IP “10.11.12.13” would be recognized by the AWS Load Balancer Controller and automatically registered in the Target Group.

References

The text was updated successfully, but these errors were encountered:

zac-nixon · 2025-01-17T20:16:56Z

Could you expand further on this point:

This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Later versions of Kubernetes and the controller have made using NodePorts for traffic a lot more reliable. For example, when using cluster autoscaler: #1688

kahirokunn · 2025-01-20T01:25:06Z

@zac-nixon
Thank you for your insight and all the work you've done on this project. I wanted to share my experience using Karpenter instead of the Cluster Autoscaler. In my tests, when running ab (ApacheBench) or other load-testing tools while a node scales in, I often observe connections that do not return any response (instead of a 5xx error). After multiple rounds of verification, I suspect the following factors may be playing a role:

Karpenter may terminate a node before it is fully deregistered from the ALB’s Target Group.
There may be insufficient coordination between Karpenter and the AWS Load Balancer Controller during node termination.
Any long-lived connections—such as WebSockets, long polling, or HTTP/2—remain open on nodes that are about to be terminated. Moreover, slower requests and long-running processes also stay active. As a result, when Karpenter scales in a node, these open connections or requests can be abruptly severed, causing no response to return to the client.

Additionally, by supporting direct IP-based communication as described in the Kubernetes documentation—rather than routing traffic exclusively through Nodes—we can further improve interoperability with existing controllers, foster additional integrations, and enable even more significant innovation in future.

kahirokunn · 2025-01-20T02:04:26Z

I've created a separate issue regarding the problem we discussed about AWS Load Balancer Controller not handling Karpenter taints:
#4023
Along with this, I've also created a related PR:
#4022
However, I still want to continue the discussion about Ingress resources supporting custom EndpointSlices, as I believe this is a needed feature.
Thx 🙏

zac-nixon · 2025-01-23T04:07:12Z

Sorry for the delayed response. What automation are you using to populate the custom endpoint slice? I wonder if you can use a Multicluster Target Group Binding (https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/targetgroupbinding/targetgroupbinding/#multicluster-target-group) and then point your automation to just register the targets directly into the Target Group?

kahirokunn · 2025-01-23T07:06:46Z

I am currently trying to implement an MCS controller using Sveltos (Related Issue: projectsveltos/sveltos#435 (comment)).
While the proposed Multicluster Target Group Binding could achieve something similar, I believe there are challenges in the following areas:

ALB and Listener need to be managed by separate tools like Terraform or Crossplane
AWS Load Balancer Controller and information required for its operation need to be distributed to all clusters, increasing additional setup and management costs
Not compatible with sig-multicluster, making it difficult to extend and apply in the long term

On the other hand, if AWS Load Balancer Controller directly supports Custom EndpointSlices, which is a Kubernetes standard specification, the complicated setup mentioned above would become unnecessary. I believe this approach is preferable in terms of achieving the configuration that users ultimately need in a simpler way.

kahirokunn · 2025-02-12T00:26:40Z

Hi @zac-nixon ,

I hope you’re doing well. I’d like to follow up on the feature request discussed earlier in this thread and get your input on a couple of points:

Feature Request Validity: Do you feel that supporting endpoints beyond just in‑cluster pods is a worthwhile direction?
Implementation Approach: My current proposal is to introduce a new function—tentatively named resolveNonPodEndpointsWithEndpointsData—in addition to the existing resolvePodEndpointsWithEndpointsData. This new function would be activated via a feature toggle (provisionally called nonPodEndpoints). What are your thoughts on this approach? Are there any modifications or improvements you’d suggest?

Once we have consensus on both the overall feature request and the implementation plan, my goal would be to update the feature request status to “implementation pending” so we can move forward with development.

Given your extensive contributions and deep understanding of aws‑load‑balancer‑controller, your feedback is extremely valuable. Looking forward to hearing your thoughts.

Best regards,
kahirokunn

zac-nixon · 2025-02-13T17:57:29Z

Hi @kahirokunn,

I apologize for the delayed response. While I think we have existing solutions in place, like instance based targets or usage of a multicluster target group, I think your purposed solution makes sense. This new pod discovery would have to be completely feature flagged, which your proposal suggests. One caveat that we can't support is the usage of public IPs are registered targets, doing so would block target registration. Are you ok with this caveat?

Thank you for putting together this feature idea. We can work together to implement it to be sure that it fits your usecase.

kahirokunn · 2025-02-13T23:34:23Z

Dear @zac-nixon ,

Thank you for your response. I am delighted to receive your feedback.

One caveat that we can't support is the usage of public IPs are registered targets, doing so would block target registration. Are you ok with this caveat?

Yes, I agree with the restriction on public IPs.

Upon investigating why public IPs cannot be allowed, I found the following AWS documentation:

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#target-type

According to this documentation, the allowed CIDRs are:

Subnets in the target group's VPC
10.0.0.0/8 (RFC 1918)
100.64.0.0/10 (RFC 6598)
172.16.0.0/12 (RFC 1918)
192.168.0.0/16 (RFC 1918)

Following these specifications, it naturally follows that public IPs cannot be allowed. This limitation does not pose any functional issues, as we can still achieve multi-cluster functionality without problems.

As a user-friendly enhancement, we could reflect error conditions in the Ingress status when IPs outside these ranges are specified. Here's an example of how the status condition could be formatted in YAML:

status:
  conditions:
  - type: ValidIPRange
    status: "False"
    reason: "IPOutOfAllowedRange"
    message: "One or more IP addresses are outside the allowed private IP ranges. Allowed ranges are: 10.0.0.0/8 (RFC1918), 100.64.0.0/10 (RFC6598), 172.16.0.0/12 (RFC1918), and 192.168.0.0/16 (RFC1918)."
    lastTransitionTime: "2025-02-14T12:00:00Z"

In this example, the ValidIPRange condition type would return False when target IPs are outside the allowed ranges, accompanied by the reason IPOutOfAllowedRange and an error message. This would enable users to identify issues through the Ingress status and set up alerts accordingly.

I believe implementing according to these specifications will enable integration with on-premises and multi-cloud environments through private IP registration.

Thank you for putting together this feature idea. We can work together to implement it to be sure that it fits your usecase.

I deeply appreciate your support.

Best regards,
Kahirokun

kahirokunn changed the title ~~FeatureRequest: Support EndpointSlices Without In-cluster Pod Targets in Ingress~~ Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress Jan 15, 2025

shraddhabang added triage/needs-investigation kind/feature Categorizes issue or PR as related to a new feature. and removed triage/needs-investigation labels Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

kahirokunn commented Jan 15, 2025 •

edited

Loading

zac-nixon commented Jan 17, 2025

kahirokunn commented Jan 20, 2025

kahirokunn commented Jan 20, 2025

zac-nixon commented Jan 23, 2025

kahirokunn commented Jan 23, 2025

kahirokunn commented Feb 12, 2025

zac-nixon commented Feb 13, 2025

kahirokunn commented Feb 13, 2025

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

Comments

kahirokunn commented Jan 15, 2025 • edited Loading

Related Problem

Proposed Unified Solution

Alternatives Considered

Example: MCS with Additional Cluster IPs

References

zac-nixon commented Jan 17, 2025

kahirokunn commented Jan 20, 2025

kahirokunn commented Jan 20, 2025

zac-nixon commented Jan 23, 2025

kahirokunn commented Jan 23, 2025

kahirokunn commented Feb 12, 2025

zac-nixon commented Feb 13, 2025

kahirokunn commented Feb 13, 2025

kahirokunn commented Jan 15, 2025 •

edited

Loading