-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8s mode without PV #160
base: main
Are you sure you want to change the base?
K8s mode without PV #160
Conversation
Is there any work happening on this one? @DenisPalnitsky @nikola-jokic |
@anlesk I tested it on my case and it worked. Let me rebase this PR, give it another test and mark it as ready for review if all good. |
6426853
to
1a2ba2f
Compare
1a2ba2f
to
f6d616b
Compare
I ran some tests and overall the approach works however when I try to run quite heavy workflow with 60 jobs and 6 containers in each job I get below error ~50% of cases.
The memory consumption is ~1Gb for both successful and failed runners and it's less then the pod limit. I could not figure out what is the root-cause however I'm not a JS expert so maybe I'm missing something obvious. I would appreciate the review and a help from the community. |
The goal of this PR is to remove the dependency on PV from Kubernetes mode.
The approach
Currently PV is used to share the files between Runner and Workflow pods. Those files include scripts that should be executed on runners and required software such as Node.
My assumption is that it's a one way communication from Runner to Workflow, meaning that it's a Runner pod that "sends" file to the "Workflow" and Workflow pod does not need to share any files with Runner.
Based on that assumption we can copy the required files directly to Workflow pod without provisioning common storage. We can use cpToPod function for direct file copying.
This will address the issue described here.
Feedback Request
This PR is a POC that was tested on a workflows that run in a container and it worked fine. I wonder, if there is any flaws that will make this approach unviable.
Implementation Notes
I had to rework
cpToPod
function to wait for the operation to finish.