-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
checkout v4 fails with ACTIONS_RUNNER_CONTAINER_HOOKS #145
Comments
Hey @cgundy, I am failing to reproduce the issue. I forced the runner internal node version like you specified, and this is the output: |
Hi @nikola-jokic, thank you very much for testing it out. Yes, I am using the latest runner image
Did you also test on a runner that uses |
I have not, but it would depend on what the template is, right? The hook template modifies the job pod that you specify. So if the spec for the new pod is invalid, then the action would fail. But I'm more worried about the |
Hi, thanks for the quick response. I think you're onto something. I tested checkout v4 without using For completeness, here is my pod template:
And I am using cephfs as a storage class for
I'd rather not change our storageclass since it has been working well with this setup otherwise, but am open to any suggestions or debugging steps I can take. |
@nikola-jokic this is still an ongoing issue for us. We've tried to use checkout@v3, but now we're in a situation where we need to upgrade. I've checked that the permissions are all correct. If you have any suggestions for debugging steps please let me know, as the only option we may have is to not use the kube scheduler anymore, or move to dind. |
Could you share your workflow file? Did you manage to create a reproducible issue? I'm wondering if node we mount is the issue, but I'm not sure here but I'm not sure. It works for ubuntu image so maybe the check which node to mount is wrong (We compile node for alpine to mount it to alpine based containers) |
Just wanted to add a note here that I have been able to observe this issue under very similar circumstances - running in kube scheduler mode with rook ceph fs for multi attach. We're attempting to do some debugging on our end in this area as we're not seeing a consistent link between the checkout and this issue. That is, sometimes checkouts succeed, and tasks following the checkout fail (for example). I will ping back here again if we find additional information that may help. |
We've done some validation on this issue and have some interesting insights. The tests we performed with results are listed below: 1 - Conditions: 2 - Conditions: 3 - Conditions: 4 - Conditions: 5 - Conditions: Conclusion: Its looking like there is some kind of filecsystem level cache, or slight file lag when workloads run on two different nodes read and write the same file (perhaps some kind of stale data). We have seen some examples where checkouts succeed, but we aren't able to re-produce these successes to narrow down exactly what is different in these cases; for now we're assuming this is just good luck - the successful runs seem to be independent of any changes we make and are extremely uncommon. Todo: investigate mount options, sync / cache options and possibly locking options available in ceph. Hopefully this information is useful / not too un-necessary.
|
When trying to upgrade the GitHub
checkout
action from v3 to v4 using self-hosted runners with Kubernetes mode, I consistently get the following error:I've tried upgrading the internal runner node version from 16 to 20 using:
But I still see the same error. I believe this is a somewhat urgent issue as GitHub actions won't support node16 after Spring 2024 anymore (post) and we will need to upgrade
checkout
actions from v3 to v4.Thank you!
The text was updated successfully, but these errors were encountered: