Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress #4017

Open
kahirokunn opened this issue Jan 15, 2025 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@kahirokunn
Copy link
Member

kahirokunn commented Jan 15, 2025

Related Problem

When deploying a multi-cluster EKS environment that shares services via the Multi-Cluster Services (MCS) API, multiple EndpointSlices may be created for a single Service. Currently, in “target-type: ip” mode, the AWS Load Balancer Controller only registers Pod IPs of locally running Pods. It does not register:

  1. Pod IPs from other clusters exposed via the MCS API and listed in EndpointSlices; or
  2. External IPs included in EndpointSlices whose TargetRef.Kind is not "Pod."

This behavior forces users to employ workarounds—such as using “target-type: instance” and routing traffic through NodePorts—which can introduce suboptimal routing and increase the risk of disruptions if a Node is scaled in or replaced.

Proposed Unified Solution

Enhance the AWS Load Balancer Controller to directly register IP addresses from EndpointSlices in “target-type: ip” mode, even if those addresses are intended for multi-cluster usage (MCS) or represent external endpoints. This can be done by:

  • Recognizing that an EndpointSlice may contain additional or external IP addresses (for instance, based on TargetRef.Kind != "Pod").
  • Incorporating these addresses into the Target Group, alongside the local cluster Pod IPs already handled.

A relevant part of the AWS Load Balancer Controller’s current design is located here:

if ep.TargetRef == nil || ep.TargetRef.Kind != "Pod" {
continue
}

Here, the logic could be extended to handle these alternative address types. For example, if the endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io label is missing, the Controller might treat the EndpointSlice’s IP addresses as external IPs; or if EndpointSlice.Endpoints[].TargetRef.Kind != "Pod", the Controller might interpret them as external endpoints.

In both cases, the goal remains the same: provide direct integration with new or external IP addresses listed in EndpointSlices, reducing complexity and offering more efficient traffic routing.

Alternatives Considered

Using “target-type: instance”

  • This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Example: MCS with Additional Cluster IPs

Below is a sample configuration demonstrating how MCS might export a Service, creating an EndpointSlice in one cluster with Pod IPs from another cluster:

apiVersion: v1
kind: Service
metadata:
  name: example-service
  namespace: default
spec:
  selector:
    app: example
  ports:
    - name: http
      port: 80
      protocol: TCP
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
  namespace: default
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80}]'
spec:
  rules:
    - http:
        paths:
          - path: /*
            pathType: ImplementationSpecific
            backend:
              service:
                name: example-service
                port:
                  number: 80
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: example-service-remotecluster
  namespace: default
  labels:
    kubernetes.io/service-name: example-service
addressType: IPv4
ports:
  - name: "http"
    port: 80
    protocol: TCP
endpoints:
  - addresses:
      - 10.11.12.13   # Pod IP on a remote EKS cluster
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: remote-node-1
    zone: remote-az-1

With the proposed feature enabled, the IP “10.11.12.13” would be recognized by the AWS Load Balancer Controller and automatically registered in the Target Group.

References

@kahirokunn kahirokunn changed the title FeatureRequest: Support EndpointSlices Without In-cluster Pod Targets in Ingress Feature Request: Support EndpointSlices Without In-cluster Pod Targets in Ingress Jan 15, 2025
@shraddhabang shraddhabang added triage/needs-investigation kind/feature Categorizes issue or PR as related to a new feature. and removed triage/needs-investigation labels Jan 15, 2025
@zac-nixon
Copy link
Collaborator

Could you expand further on this point:

This solution leads to indirect routing (through NodePorts) and higher susceptibility to disruptions upon Node scale-in or replacement.

Later versions of Kubernetes and the controller have made using NodePorts for traffic a lot more reliable. For example, when using cluster autoscaler: #1688

@kahirokunn
Copy link
Member Author

@zac-nixon
Thank you for your insight and all the work you've done on this project. I wanted to share my experience using Karpenter instead of the Cluster Autoscaler. In my tests, when running ab (ApacheBench) or other load-testing tools while a node scales in, I often observe connections that do not return any response (instead of a 5xx error). After multiple rounds of verification, I suspect the following factors may be playing a role:

  1. Karpenter may terminate a node before it is fully deregistered from the ALB’s Target Group.
  2. There may be insufficient coordination between Karpenter and the AWS Load Balancer Controller during node termination.
  3. Any long-lived connections—such as WebSockets, long polling, or HTTP/2—remain open on nodes that are about to be terminated. Moreover, slower requests and long-running processes also stay active. As a result, when Karpenter scales in a node, these open connections or requests can be abruptly severed, causing no response to return to the client.

Additionally, by supporting direct IP-based communication as described in the Kubernetes documentation—rather than routing traffic exclusively through Nodes—we can further improve interoperability with existing controllers, foster additional integrations, and enable even more significant innovation in future.

@kahirokunn
Copy link
Member Author

I've created a separate issue regarding the problem we discussed about AWS Load Balancer Controller not handling Karpenter taints:
#4023
Along with this, I've also created a related PR:
#4022
However, I still want to continue the discussion about Ingress resources supporting custom EndpointSlices, as I believe this is a needed feature.
Thx 🙏

@zac-nixon
Copy link
Collaborator

Sorry for the delayed response. What automation are you using to populate the custom endpoint slice? I wonder if you can use a Multicluster Target Group Binding (https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/guide/targetgroupbinding/targetgroupbinding/#multicluster-target-group) and then point your automation to just register the targets directly into the Target Group?

@kahirokunn
Copy link
Member Author

I am currently trying to implement an MCS controller using Sveltos (Related Issue: projectsveltos/sveltos#435 (comment)).
While the proposed Multicluster Target Group Binding could achieve something similar, I believe there are challenges in the following areas:

  1. ALB and Listener need to be managed by separate tools like Terraform or Crossplane
  2. AWS Load Balancer Controller and information required for its operation need to be distributed to all clusters, increasing additional setup and management costs
  3. Not compatible with sig-multicluster, making it difficult to extend and apply in the long term

On the other hand, if AWS Load Balancer Controller directly supports Custom EndpointSlices, which is a Kubernetes standard specification, the complicated setup mentioned above would become unnecessary. I believe this approach is preferable in terms of achieving the configuration that users ultimately need in a simpler way.

@kahirokunn
Copy link
Member Author

Hi @zac-nixon ,

I hope you’re doing well. I’d like to follow up on the feature request discussed earlier in this thread and get your input on a couple of points:

  1. Feature Request Validity: Do you feel that supporting endpoints beyond just in‑cluster pods is a worthwhile direction?
  2. Implementation Approach: My current proposal is to introduce a new function—tentatively named resolveNonPodEndpointsWithEndpointsData—in addition to the existing resolvePodEndpointsWithEndpointsData. This new function would be activated via a feature toggle (provisionally called nonPodEndpoints). What are your thoughts on this approach? Are there any modifications or improvements you’d suggest?

Once we have consensus on both the overall feature request and the implementation plan, my goal would be to update the feature request status to “implementation pending” so we can move forward with development.

Given your extensive contributions and deep understanding of aws‑load‑balancer‑controller, your feedback is extremely valuable. Looking forward to hearing your thoughts.

Best regards,
kahirokunn

@zac-nixon
Copy link
Collaborator

Hi @kahirokunn,

I apologize for the delayed response. While I think we have existing solutions in place, like instance based targets or usage of a multicluster target group, I think your purposed solution makes sense. This new pod discovery would have to be completely feature flagged, which your proposal suggests. One caveat that we can't support is the usage of public IPs are registered targets, doing so would block target registration. Are you ok with this caveat?

Thank you for putting together this feature idea. We can work together to implement it to be sure that it fits your usecase.

@kahirokunn
Copy link
Member Author

Dear @zac-nixon ,

Thank you for your response. I am delighted to receive your feedback.

One caveat that we can't support is the usage of public IPs are registered targets, doing so would block target registration. Are you ok with this caveat?

Yes, I agree with the restriction on public IPs.

Upon investigating why public IPs cannot be allowed, I found the following AWS documentation:

https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-target-groups.html#target-type

According to this documentation, the allowed CIDRs are:

  • Subnets in the target group's VPC
  • 10.0.0.0/8 (RFC 1918)
  • 100.64.0.0/10 (RFC 6598)
  • 172.16.0.0/12 (RFC 1918)
  • 192.168.0.0/16 (RFC 1918)

Following these specifications, it naturally follows that public IPs cannot be allowed. This limitation does not pose any functional issues, as we can still achieve multi-cluster functionality without problems.

As a user-friendly enhancement, we could reflect error conditions in the Ingress status when IPs outside these ranges are specified. Here's an example of how the status condition could be formatted in YAML:

status:
  conditions:
  - type: ValidIPRange
    status: "False"
    reason: "IPOutOfAllowedRange"
    message: "One or more IP addresses are outside the allowed private IP ranges. Allowed ranges are: 10.0.0.0/8 (RFC1918), 100.64.0.0/10 (RFC6598), 172.16.0.0/12 (RFC1918), and 192.168.0.0/16 (RFC1918)."
    lastTransitionTime: "2025-02-14T12:00:00Z"

In this example, the ValidIPRange condition type would return False when target IPs are outside the allowed ranges, accompanied by the reason IPOutOfAllowedRange and an error message. This would enable users to identify issues through the Ingress status and set up alerts accordingly.

I believe implementing according to these specifications will enable integration with on-premises and multi-cloud environments through private IP registration.

Thank you for putting together this feature idea. We can work together to implement it to be sure that it fits your usecase.

I deeply appreciate your support.

Best regards,
Kahirokun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants