Add Customizable Failure Threshold for Ephemeral Runner Retries #3700
Labels
bug
Something isn't working
gha-runner-scale-set
Related to the gha-runner-scale-set mode
needs triage
Requires review from the maintainers
Checks
Controller Version
0.9.3
Deployment Method
Helm
Checks
To Reproduce
Describe the bug
Related to this issue: #3300
Related to this line of code: https://github.com/actions/actions-runner-controller/blob/master/controllers/actions.github.com/ephemeralrunner_controller.go#L202
If an ephemeral runner fails to start up more than 5 times it is marked as failed. If multiple runners fail to startup it will take up the max runner limit and block new runners from starting up. We need this to be configurable and somehow clean the failed runners after sometime as well.
Describe the expected behavior
The expected behavior we want is to set the failure threshold so that we can buy more time to catch these failed ephemeral runners. Something like this would be great:
We should be able to set it in the helm chart for the actions runner controller. And if the controller automatically cleaned the failed runners that would be great as well maybe once a day or something.
Additional Context
N/A
Controller Logs
Runner Pod Logs
The text was updated successfully, but these errors were encountered: