-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jobs are waiting too long for a runner to come online. #3704
Comments
I was having a similar issue, though not quite as bad. The helm chart accepts a parameter "minRunners" you can set to have warm runners available to service jobs immediately. For my architecture, setting it to 1 for a small team and 2 for a big team has proven sufficient. If you're not using helm, it looks like minRunners gets set in the AutoscalingRunnerSet spec. |
I agree that setting warm runners could resolve the problem. 1 - job is created on Github |
Warm runners have one issue: they force the node that their respective pods run on to be running by preventing cluster-autoscaler from terminating the instance, which makes it impossible to set minRunners=0 to save costs by running the builder instances only when they are needed. Ideally, we should have an option to start a certain number of extra runners when a job arrives, e.g.:
this way, when a new job arrives, there is an initial wait period for a build node and the initial runners to start, but a certain number of extra runners will be kept ready to take the next jobs when they arrive. Eventually all of them would scale back to zero after a certain period without new jobs. Besides, the following is true, I am observing it too:
There are significant delays which are not related to the normal pod startup routine. |
It would also be nice to make it possible to "preheat" a pool of runners. For example:
|
I tried something similar to work around this issue, but something I noticed is the runners cannot be reused across multiple jobs. Here's an example:
What I expect What actually happens I would prefer to not have to "preheat" 10 runners at the start of the workflow. |
Checks
Controller Version
0.9.3
Deployment Method
ArgoCD
Checks
To Reproduce
Describe the bug
Some jobs are waiting from 30 seconds to more than 90 seconds to be scheduled on a runner.
Describe the expected behavior
Jobs should not have to wait that long in my opinion
Additional Context
Controller Logs
Runner Pod Logs
The text was updated successfully, but these errors were encountered: