Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schedules in budgets are ignored completely #1939

Closed
nantiferov opened this issue Jan 28, 2025 · 5 comments
Closed

Schedules in budgets are ignored completely #1939

nantiferov opened this issue Jan 28, 2025 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/solved Indicates an issue that has been considered solved by the maintainers.

Comments

@nantiferov
Copy link

Description

Observed Behavior:

Hi,
We're running Karpenter 1.0.8 in multiple EKS clusters and noticed that schedule in disruption.budgets is ignored completely. We have different configurations for disruption.budgets and all of them act like schedule doesn't exist there.

We weren't using schedule in 0.37, so I cannot say if this is a bug in v1.0 or schedule was never working.
There are also no errors in Karpenter logs itself and all functionality is working fine without issues, except schedule.

Expected Behavior:

Schedule section in disruption.budgets is honored and there are no disruptions at random times outside of schedule.

Reproduction Steps (Please include YAML):

  • Install Karpenter 1.0.8 to EKS cluster
  • Create EC2NodeClass and NodePool with schedule in disruption.budgets, examples:
---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: some-pool
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 0s
    budgets:
      - schedule: 0 8 * * mon-fri # working days and hours
        duration: 8h
        nodes: 10%
        reasons: [Empty, Underutilized]
      - nodes: "0"
        reasons: [Drifted] # disable drift disruption

⏫ with this budget Empty/Underutilized are happening at any time instead of schedule: 0 8 * * mon-fri

---
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: other-pool
spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 0s
    budgets:
      - schedule: 0 8 * * mon-fri # working days and hours
        duration: 8h
        nodes: 10%

⏫ with this budget all disruptions, including Drift are happening at any time. For example, when new AL2023 AMI is released, all nodes are re-provisioned, ignoring schedule.

Other:

It was reported couple of times in other issues, but I wasn't able to find clear issue exactly about non-working schedules:

  1. AMI drift triggers node recreation outside Disruption schedule aws/karpenter-provider-aws#7592
  2. Pod disruption schedule #1719 (comment)
  3. Karpenter does not respect reasons in budgets #1691 (comment)

Versions:

  • Chart Version: 1.0.8
  • Kubernetes Version: v1.30.8-eks-2d5f260
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@nantiferov nantiferov added the kind/bug Categorizes issue or PR as related to a bug. label Jan 28, 2025
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 28, 2025
@jmdeal
Copy link
Member

jmdeal commented Jan 31, 2025

There's room to clarify the disruption budget docs, but I think you're reasoning about disruption budgets backwards. Each term in a disruption budget restricts the number of disruptions allowed at any given time, rather than enabling disruptions. So in this case, your budget actually specifies that it can only disrupt 10% of nodes during 0 8 * * mon-fri, but outside of that window 100% of nodes are eligible for disruption. It might be easier to visualize it with the implicit base case added:

apiVersion: karpenter.sh/v1
kind: NodePool
spec:
  disruption:
    budgets:
      # Unless otherwise constrained, allow all nodes to be disrupted
      - nodes: 100%
      # During the schedule, restrict the number of allowed disruptions to 10%
      - schedule: 0 8 * * mon-fri # working days and hours
        duration: 8h
        nodes: 10%

To only allow 10% disruption during business hours, and 0 disruptions outside of those hours, you could model it like this:

apiVersion: karpenter.sh/v1
kind: NodePool
spec:
  disruption:
    budgets:
      # Set a baseline disruption budget of 10%
      - nodes: 10%
      # Restrict disruption from 00:00-8:00 and 16:00-00:00 on weekdays
      - schedule: 0 0,16 * * mon-fri
        duration: 8h
        nodes: 0
      # Restrict disruption on weekends
      - schedule: 0 0 * * sat,sun
        duration: 24h
        nodes: 0 

To verify that a disruption budget is working as expected, you can reference the karpenter_nodepools_allowed_disruptions metric.

@jmdeal
Copy link
Member

jmdeal commented Jan 31, 2025

/triage solved
/kind support

@k8s-ci-robot k8s-ci-robot added triage/solved Indicates an issue that has been considered solved by the maintainers. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 31, 2025
@nantiferov
Copy link
Author

Hmm, thank you for clarification and suggested change.

I'll update budgets in our node pools and check metrics.

@jonathan-innis
Copy link
Member

/close

@k8s-ci-robot
Copy link
Contributor

@jonathan-innis: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. triage/solved Indicates an issue that has been considered solved by the maintainers.
Projects
None yet
Development

No branches or pull requests

4 participants