Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s mode without PV #160

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

DenisPalnitsky
Copy link

The goal of this PR is to remove the dependency on PV from Kubernetes mode.

The approach

Currently PV is used to share the files between Runner and Workflow pods. Those files include scripts that should be executed on runners and required software such as Node.

My assumption is that it's a one way communication from Runner to Workflow, meaning that it's a Runner pod that "sends" file to the "Workflow" and Workflow pod does not need to share any files with Runner.

flowchart LR
    A[Runner Pod] -->|Sends files to| B[Workflow Pod]
    A -. Pull logs from.-> B
Loading

Based on that assumption we can copy the required files directly to Workflow pod without provisioning common storage. We can use cpToPod function for direct file copying.

This will address the issue described here.

Feedback Request

This PR is a POC that was tested on a workflows that run in a container and it worked fine. I wonder, if there is any flaws that will make this approach unviable.

Implementation Notes

I had to rework cpToPod function to wait for the operation to finish.

@anlesk
Copy link

anlesk commented Aug 12, 2024

Is there any work happening on this one? @DenisPalnitsky @nikola-jokic
This got mentioned in couple of issues as a potential way of resolving those, but it is in draft without any information about the future of that.

@DenisPalnitsky
Copy link
Author

@anlesk I tested it on my case and it worked. Let me rebase this PR, give it another test and mark it as ready for review if all good.

@DenisPalnitsky
Copy link
Author

DenisPalnitsky commented Aug 18, 2024

I ran some tests and overall the approach works however when I try to run quite heavy workflow with 60 jobs and 6 containers in each job I get below error ~50% of cases.

##[debug]Archiving _work to /tmp/3b04107c-1297-42dd-9b15-e50a7db6af96
##[debug]Transferring: 486,784,512 Bytes
##[debug]Exec cpToPod
(node:72) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
Error: Process completed with exit code 137.
Error: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug]System.Exception: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
##[debug] ---> System.Exception: The hook script at '/home/runner/k8s/index.js' running command 'PrepareJob' did not execute successfully
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   --- End of inner exception stack trace ---
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.ExecuteHookScript[T](IExecutionContext context, HookInput input, ActionRunStage stage, String prependPath)
##[debug]   at GitHub.Runner.Worker.Container.ContainerHooks.ContainerHookManager.PrepareJobAsync(IExecutionContext context, List`1 containers)
##[debug]   at GitHub.Runner.Worker.ContainerOperationProvider.StartContainersAsync(IExecutionContext executionContext, Object data)
##[debug]   at GitHub.Runner.Worker.JobExtensionRunner.RunAsync()
##[debug]   at GitHub.Runner.Worker.StepsRunner.RunStepAsync(IStep step, CancellationToken jobCancellationToken)
##[debug]Finishing: Initialize containers

The memory consumption is ~1Gb for both successful and failed runners and it's less then the pod limit.

I could not figure out what is the root-cause however I'm not a JS expert so maybe I'm missing something obvious. I would appreciate the review and a help from the community.

@DenisPalnitsky DenisPalnitsky changed the title K8s mode without PV [DRAFT] K8s mode without PV Aug 18, 2024
@DenisPalnitsky DenisPalnitsky marked this pull request as ready for review August 18, 2024 16:59
@DenisPalnitsky DenisPalnitsky requested review from a team as code owners August 18, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants