Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never terminating workflow step #179

Open
luka5 opened this issue Sep 25, 2024 · 1 comment
Open

Never terminating workflow step #179

luka5 opened this issue Sep 25, 2024 · 1 comment

Comments

@luka5
Copy link

luka5 commented Sep 25, 2024

Today we debugged some never terminating workflow step and we ended up here. Turns out, if you configure a container workflow, then a single container step will take forever and never terminate if it does not print something for a while. We were able to easily reproduce this with this workflow:

name: Debug issue
on: [workflow_dispatch]
jobs:
  debug:
    container:
      image: **some-image-path**
    runs-on: [**some-runner**]
    steps:
      - name: step1
        shell: bash
        run: |
          echo "Hello"

      - name: Wait for 5m
        shell: bash
        run: |
          sleep 5m

      - name: step3
        shell: bash
        run: |
          echo "Ciao"

If we instead change the second step to print continuously, we do not run into an error.

      - name: Wait for 5m
        shell: bash
        run: |
          for ((i=0; i<300; i++)); do
            echo "."
            sleep 1
          done
          echo "done"

We do not get any errors or valuable information when re-running in debug mode. We expect the issue to be in execPodStep. The version of @kubernetes/client-node is quite old (0.18.1 vs 0.22.0) - but to be fair, the release notes and the changes do not like there was a bug fixed.

Have you seen a similar behavior before? Do you have another recommendation to us to simply be more verbose and write progress on stdout?

Thanks!

@oradwell
Copy link
Contributor

oradwell commented Oct 9, 2024

I believe I'm having a similar issue but ours is only for docker actions. no matter the result of the docker action step, the step gets stuck forever. it doesn't happen with composite or nodejs actions. I assume you're using Kubernetes mode of the actions-runner-controller. is that right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants