Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

Open
yuvalaz99 opened this issue Feb 20, 2025 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@yuvalaz99
Copy link

What happened:

When a ClusterQueue is configured with nominalQuota and reclaimWithinCohort: any, it cannot reclaim resources from another queue that has nominalQuota: 0 but an extremely high weight.

setup:

ClusterQueueA:

  • nominalQuota: 4
  • reclaimWithinCohort: any

ClusterQueueB:

  • nominalQuota: 0
  • weight: 9999999

What you expected to happen:

Since ClusterQueueB is using more than its nominal quota (which is 0), preemption should occur, allowing ClusterQueueA to reclaim resources. This aligns with the reclaimWithinCohort definition:

"Determines whether a pending Workload can preempt Workloads from other ClusterQueues in the cohort that are using more than their nominal quota."

However, no preemption occurs.

Observed Behavior:

  • Despite ClusterQueueB exceeding its nominal quota, ClusterQueueA is unable to reclaim resources.
  • The weightedShare of ClusterQueueB remains 0, which might be preventing preemption.
  • No scheduler logs indicate an attempt to preempt.

Is this expected behavior, or is there a misalignment in how weighted share interacts with reclaimWithinCohort?

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.30
  • Kueue version (use git describe --tags --dirty --always): 0.10.0
  • Cloud provider or hardware configuration: GCP
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@yuvalaz99 yuvalaz99 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2025
@gabesaba
Copy link
Contributor

The weightedShare of ClusterQueueB remains 0, which might be preventing preemption.

I suspect this to be the root cause - that when weight is high, the weightedShare collapses to zero. This is a known issue, which we are tracking in #4247

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants