You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What would you like to be added:
Currently the RayJob admission webhook returns an error when the job sets enableInTreeAutoscaling: true. It makes sense that autoscaling up isn't supported, but I would expect autoscaling down could be supported via the "dynamic reclaim" feature.
I'd potentially be down to implement this - I just wanted to open an issue to see how people feel about this feature before spending time on it.
Why is this needed:
We want to run hyperparameter sweeps with Ray Tune, and they have a feature that allows early exiting from trials based on metric values. With this, it's possible that at some point the job doesn't need all the resources it was initially given (because most trials have finished), and we'd like to be able to reclaim those resources.
Completion requirements: RayJobs support dynamic reclaiming if the Ray autoscaler indicates the job should scale down
This enhancement requires the following artifacts:
Design doc
API change
Docs update
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered:
Yeah, I think so too. Could we prioritize this after the next minor release (0.12)? Because we have a lot of alpha features and trying to fix bugs in this release cycle. This seems to be a slightly big feature.
What would you like to be added:
Currently the
RayJob
admission webhook returns an error when the job setsenableInTreeAutoscaling: true
. It makes sense that autoscaling up isn't supported, but I would expect autoscaling down could be supported via the "dynamic reclaim" feature.I'd potentially be down to implement this - I just wanted to open an issue to see how people feel about this feature before spending time on it.
Why is this needed:
We want to run hyperparameter sweeps with Ray Tune, and they have a feature that allows early exiting from trials based on metric values. With this, it's possible that at some point the job doesn't need all the resources it was initially given (because most trials have finished), and we'd like to be able to reclaim those resources.
Completion requirements:
RayJob
s support dynamic reclaiming if the Ray autoscaler indicates the job should scale downThis enhancement requires the following artifacts:
The artifacts should be linked in subsequent comments.
The text was updated successfully, but these errors were encountered: