You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My issue is the same that was reported on issue 2624. That issue was closed without a solution.
We use AWS Codebuild as the self hosted platform.
This issue is happening on my repository too, frequently. I don't think the EC2 instance is starving and dying because the issue happens in different steps. We use EC2 large (8vCPUs 15GB of memory) to run the workflow.
Sometimes one step completes until the end, sometimes it is aborted in the middle.
The workflow is bellow:
name: Validate Codeon:
push:
branches:
- feature/*
- bugfix/*
- docs/*
- dependabot/**merge_group:
branches:
- mainworkflow_call:
inputs:
since:
type: stringconcurrency:
group: "validate-code-${{ github.ref }}"cancel-in-progress: ${{ inputs.since == '' }}jobs:
unit-tests:
name: Unit Tests (Backend)runs-on:
- codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}steps:
- uses: actions/checkout@v4with:
fetch-depth: 100
- name: Fetch main to compareif: github.ref != 'refs/heads/main'run: git fetch origin main:main --depth=50
- name: Create swap spaceuses: ./.github/actions/create-swap-space
- uses: actions/setup-node@v4with:
node-version: 20
- name: Monorepo installuses: ./.github/actions/yarn-nm-install
- name: Run unit testsrun: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} --exclude "{agent,appview,authn,canvas,imagine,usage,voicegen,sidekick,taskbuilder}-service" --exclude "{imagine-base,imagine-api,auth-fe,parent-link-fe,resource-fe,ai-hub-fe,artifact-cards,feedback-fe,taskconfig-fe}" run testunit-tests-fe:
name: Unit Tests (Frontend)runs-on:
- codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}steps:
- uses: actions/checkout@v4with:
fetch-depth: 100
- name: Fetch main to compareif: github.ref != 'refs/heads/main'run: git fetch origin main:main --depth=50
- name: Create swap spaceuses: ./.github/actions/create-swap-space
- uses: actions/setup-node@v4with:
node-version: 20
- name: Monorepo installuses: ./.github/actions/yarn-nm-install
- name: Run unit testsrun: yarn workspaces foreach -vvR --since=${{ inputs.since || 'main' }} --include "{agent,appview,authn,canvas,imagine,usage,voicegen,sidekick,taskbuilder}-service" --include "{imagine-base,imagine-api,auth-fe,parent-link-fe,resource-fe,ai-hub-fe,artifact-cards}" run testlinting:
name: Lintingruns-on:
- codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}steps:
- uses: actions/checkout@v4with:
fetch-depth: 100
- name: Fetch main to compareif: github.ref != 'refs/heads/main'run: git fetch origin main:main --depth=50
- name: Create swap spaceuses: ./.github/actions/create-swap-space
- uses: actions/setup-node@v4with:
node-version: 20
- name: Monorepo installuses: ./.github/actions/yarn-nm-install
- name: Lint changed workspacesrun: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} run lintenv:
# increased memory because we were getting an out of memory error when running lintNODE_OPTIONS: "--max-old-space-size=8192"
- name: Find unused files, dependencies and exportsrun: yarn kniptype-checking:
name: Type checkingruns-on:
- codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}steps:
- uses: actions/checkout@v4with:
fetch-depth: 100
- name: Fetch main to compareif: github.ref != 'refs/heads/main'run: git fetch origin main:main --depth=50
- name: Create swap spaceuses: ./.github/actions/create-swap-space
- uses: actions/setup-node@v4with:
node-version: 20
- name: Monorepo installuses: ./.github/actions/yarn-nm-install
- name: Cache TypeScript build infouses: actions/cache@v4with:
path: | **/tsconfig.tsbuildinfokey: ${{ runner.os }}-tsbuildinfo-${{ hashFiles('tsconfig.base.json', '**/tsconfig.json') }}restore-keys: | ${{ runner.os }}-tsbuildinfo-
- name: Type check changed workspacesrun: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} run type-check --incrementalenv:
# increased memory because we were getting an out of memory error when running lintNODE_OPTIONS: "--max-old-space-size=8192"
It may be just a coincidence, but when I saw this issue happen, one of the workflows finished with error (it can completely, but finished with error, say, because some unit test failed). Then the other job that was running on the other runner stops executing in the middle. And then in the "Annotations" section I have that message: "The self-hosted runner: b4ac7d30-8387-4499-a899-f75d06e2941f lost communication with the server."
When I go check the logs of that runner on AWS, there is no error message. The build just stops running in the middle.
The error does not happen in a single job. It happens in any of the jobs on that workflow. Some jobs complete successfully, there is one that completes with an error (like unit test, type checking, linting error), and another job that is apparently aborted in the middle
This was the log on the aws runner (for one of the cases this issue hapened, this time, on the linting job):
We are also getting the same error since couple of days - The self-hosted runner: lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
My issue is the same that was reported on issue 2624. That issue was
closed
without a solution.We use AWS Codebuild as the self hosted platform.
This issue is happening on my repository too, frequently. I don't think the EC2 instance is starving and dying because the issue happens in different steps. We use EC2 large (8vCPUs 15GB of memory) to run the workflow.
Sometimes one step completes until the end, sometimes it is aborted in the middle.
The workflow is bellow:
It may be just a coincidence, but when I saw this issue happen, one of the workflows finished with error (it can completely, but finished with error, say, because some unit test failed). Then the other job that was running on the other runner stops executing in the middle. And then in the "Annotations" section I have that message: "The self-hosted runner: b4ac7d30-8387-4499-a899-f75d06e2941f lost communication with the server."
When I go check the logs of that runner on AWS, there is no error message. The build just stops running in the middle.
The error does not happen in a single job. It happens in any of the jobs on that workflow. Some jobs complete successfully, there is one that completes with an error (like unit test, type checking, linting error), and another job that is apparently aborted in the middle
This was the log on the aws runner (for one of the cases this issue hapened, this time, on the linting job):
The text was updated successfully, but these errors were encountered: