ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

yuvalaz99 · 2025-02-20T08:31:07Z

What happened:

When a ClusterQueue is configured with nominalQuota and reclaimWithinCohort: any, it cannot reclaim resources from another queue that has nominalQuota: 0 but an extremely high weight.

setup:

ClusterQueueA:

nominalQuota: 4
reclaimWithinCohort: any

ClusterQueueB:

nominalQuota: 0
weight: 9999999

What you expected to happen:

Since ClusterQueueB is using more than its nominal quota (which is 0), preemption should occur, allowing ClusterQueueA to reclaim resources. This aligns with the reclaimWithinCohort definition:

"Determines whether a pending Workload can preempt Workloads from other ClusterQueues in the cohort that are using more than their nominal quota."

However, no preemption occurs.

Observed Behavior:

Despite ClusterQueueB exceeding its nominal quota, ClusterQueueA is unable to reclaim resources.
The weightedShare of ClusterQueueB remains 0, which might be preventing preemption.
No scheduler logs indicate an attempt to preempt.

Is this expected behavior, or is there a misalignment in how weighted share interacts with reclaimWithinCohort?

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): 1.30
Kueue version (use git describe --tags --dirty --always): 0.10.0
Cloud provider or hardware configuration: GCP
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

gabesaba · 2025-02-20T11:48:06Z

The weightedShare of ClusterQueueB remains 0, which might be preventing preemption.

I suspect this to be the root cause - that when weight is high, the weightedShare collapses to zero. This is a known issue, which we are tracking in #4247

yuvalaz99 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

yuvalaz99 commented Feb 20, 2025

gabesaba commented Feb 20, 2025

ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

ClusterQueue not reclaiming resources from high-weight ClusterQueue in cohort #4333

Comments

yuvalaz99 commented Feb 20, 2025

gabesaba commented Feb 20, 2025