Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The self-hosted runner: xxx lost communication with the server #3539

Open
bruno-zica opened this issue Nov 5, 2024 · 1 comment
Open

The self-hosted runner: xxx lost communication with the server #3539

bruno-zica opened this issue Nov 5, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@bruno-zica
Copy link

bruno-zica commented Nov 5, 2024

My issue is the same that was reported on issue 2624. That issue was closed without a solution.

We use AWS Codebuild as the self hosted platform.

This issue is happening on my repository too, frequently. I don't think the EC2 instance is starving and dying because the issue happens in different steps. We use EC2 large (8vCPUs 15GB of memory) to run the workflow.

Sometimes one step completes until the end, sometimes it is aborted in the middle.

The workflow is bellow:

name: Validate Code

on:
  push:
    branches:
      - feature/*
      - bugfix/*
      - docs/*
      - dependabot/**
  merge_group:
    branches:
      - main
  workflow_call:
    inputs:
      since:
        type: string

concurrency:
  group: "validate-code-${{ github.ref }}"
  cancel-in-progress: ${{ inputs.since == '' }}

jobs:
  unit-tests:
    name: Unit Tests (Backend)
    runs-on:
      - codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 100
      - name: Fetch main to compare
        if: github.ref != 'refs/heads/main'
        run: git fetch origin main:main --depth=50
      - name: Create swap space
        uses: ./.github/actions/create-swap-space
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Monorepo install
        uses: ./.github/actions/yarn-nm-install
      - name: Run unit tests
        run: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} --exclude "{agent,appview,authn,canvas,imagine,usage,voicegen,sidekick,taskbuilder}-service" --exclude "{imagine-base,imagine-api,auth-fe,parent-link-fe,resource-fe,ai-hub-fe,artifact-cards,feedback-fe,taskconfig-fe}" run test

  unit-tests-fe:
    name: Unit Tests (Frontend)
    runs-on:
      - codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 100
      - name: Fetch main to compare
        if: github.ref != 'refs/heads/main'
        run: git fetch origin main:main --depth=50
      - name: Create swap space
        uses: ./.github/actions/create-swap-space
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Monorepo install
        uses: ./.github/actions/yarn-nm-install
      - name: Run unit tests
        run: yarn workspaces foreach -vvR --since=${{ inputs.since || 'main' }} --include "{agent,appview,authn,canvas,imagine,usage,voicegen,sidekick,taskbuilder}-service" --include "{imagine-base,imagine-api,auth-fe,parent-link-fe,resource-fe,ai-hub-fe,artifact-cards}" run test

  linting:
    name: Linting
    runs-on:
      - codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 100
      - name: Fetch main to compare
        if: github.ref != 'refs/heads/main'
        run: git fetch origin main:main --depth=50
      - name: Create swap space
        uses: ./.github/actions/create-swap-space
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Monorepo install
        uses: ./.github/actions/yarn-nm-install
      - name: Lint changed workspaces
        run: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} run lint
        env:
          # increased memory because we were getting an out of memory error when running lint
          NODE_OPTIONS: "--max-old-space-size=8192"
      - name: Find unused files, dependencies and exports
        run: yarn knip

  type-checking:
    name: Type checking
    runs-on:
      - codebuild-UniteGithubRunner-${{ github.run_id }}-${{ github.run_attempt }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 100
      - name: Fetch main to compare
        if: github.ref != 'refs/heads/main'
        run: git fetch origin main:main --depth=50
      - name: Create swap space
        uses: ./.github/actions/create-swap-space
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Monorepo install
        uses: ./.github/actions/yarn-nm-install
      - name: Cache TypeScript build info
        uses: actions/cache@v4
        with:
          path: |
            **/tsconfig.tsbuildinfo
          key: ${{ runner.os }}-tsbuildinfo-${{ hashFiles('tsconfig.base.json', '**/tsconfig.json') }}
          restore-keys: |
            ${{ runner.os }}-tsbuildinfo-
      - name: Type check changed workspaces
        run: yarn workspaces foreach -vvRp --since=${{ inputs.since || 'main' }} run type-check --incremental
        env:
          # increased memory because we were getting an out of memory error when running lint
          NODE_OPTIONS: "--max-old-space-size=8192"

It may be just a coincidence, but when I saw this issue happen, one of the workflows finished with error (it can completely, but finished with error, say, because some unit test failed). Then the other job that was running on the other runner stops executing in the middle. And then in the "Annotations" section I have that message: "The self-hosted runner: b4ac7d30-8387-4499-a899-f75d06e2941f lost communication with the server."

When I go check the logs of that runner on AWS, there is no error message. The build just stops running in the middle.

The error does not happen in a single job. It happens in any of the jobs on that workflow. Some jobs complete successfully, there is one that completes with an error (like unit test, type checking, linting error), and another job that is apparently aborted in the middle

This was the log on the aws runner (for one of the cases this issue hapened, this time, on the linting job):

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   timestamp   |                                                                                                                                                            message                                                                                                                                                            |
|---------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1730730597587 | [Container] 2024/11/04 14:29:52.994950 Running on CodeBuild On-demand                                                                                                                                                                                                                                                         |
| 1730730597587 | [Container] 2024/11/04 14:29:52.994965 Waiting for agent ping                                                                                                                                                                                                                                                                 |
| 1730730597587 | [Container] 2024/11/04 14:29:53.095829 Waiting for DOWNLOAD_SOURCE                                                                                                                                                                                                                                                            |
| 1730730597587 | [Container] 2024/11/04 14:29:53.542561 Phase is DOWNLOAD_SOURCE                                                                                                                                                                                                                                                               |
| 1730730597587 | [Container] 2024/11/04 14:29:53.579585 CODEBUILD_SRC_DIR=/codebuild/output/src838862602/src                                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:53.579709 YAML location is /codebuild/readonly/buildspec.yml                                                                                                                                                                                                                                     |
| 1730730597587 | [Container] 2024/11/04 14:29:53.581659 Processing environment variables                                                                                                                                                                                                                                                       |
| 1730730597587 | [Container] 2024/11/04 14:29:53.707724 No runtime version selected in buildspec.                                                                                                                                                                                                                                              |
| 1730730597587 | [Container] 2024/11/04 14:29:53.886788 Moving to directory /codebuild/output/src838862602/src                                                                                                                                                                                                                                 |
| 1730730597587 | [Container] 2024/11/04 14:29:53.889496 Unable to initialize cache download: no paths specified to be cached                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.128661 Configuring ssm agent with target id: codebuild:177c3ec4-a435-4fd8-966c-1d337021976b                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.164327 Successfully updated ssm agent configuration                                                                                                                                                                                                                                           |
| 1730730597587 | [Container] 2024/11/04 14:29:54.164650 Registering with agent                                                                                                                                                                                                                                                                 |
| 1730730597587 | [Container] 2024/11/04 14:29:54.198405 Phases found in YAML: 1                                                                                                                                                                                                                                                                |
| 1730730597587 | [Container] 2024/11/04 14:29:54.198427  BUILD: 1 commands                                                                                                                                                                                                                                                                     |
| 1730730597587 | [Container] 2024/11/04 14:29:54.198642 Phase complete: DOWNLOAD_SOURCE State: SUCCEEDED                                                                                                                                                                                                                                       |
| 1730730597587 | [Container] 2024/11/04 14:29:54.198655 Phase context status code:  Message:                                                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.265345 Entering phase INSTALL                                                                                                                                                                                                                                                                 |
| 1730730597587 | [Container] 2024/11/04 14:29:54.266546 Phase complete: INSTALL State: SUCCEEDED                                                                                                                                                                                                                                               |
| 1730730597587 | [Container] 2024/11/04 14:29:54.266561 Phase context status code:  Message:                                                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.307895 Entering phase PRE_BUILD                                                                                                                                                                                                                                                               |
| 1730730597587 | [Container] 2024/11/04 14:29:54.309323 Phase complete: PRE_BUILD State: SUCCEEDED                                                                                                                                                                                                                                             |
| 1730730597587 | [Container] 2024/11/04 14:29:54.309336 Phase context status code:  Message:                                                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.342911 Entering phase BUILD                                                                                                                                                                                                                                                                   |
| 1730730597587 | [Container] 2024/11/04 14:29:54.342930 Ignoring BUILD phase commands for self-hosted runner build.                                                                                                                                                                                                                            |
| 1730730597587 | [Container] 2024/11/04 14:29:54.378406 Checking if docker is running. Running command: docker version                                                                                                                                                                                                                         |
| 1730730597587 | GHA self-hosted runner build triggered by /actions/runs/11666276634/job/32480845110                                                                                                                                                                                                                                           |
| 1730730597587 | Creating GHA self-hosted runner workspace folder: actions-runner                                                                                                                                                                                                                                                              |
| 1730730597587 | Downloading GHA self-hosted runner binary                                                                                                                                                                                                                                                                                     |
| 1730730597587 |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                                                                                                                                                                                                                               |
| 1730730597587 |                                  Dload  Upload   Total   Spent    Left  Speed                                                                                                                                                                                                                                                 |
| 1730730599632 |    0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  18  136M   18 24.9M    0     0  45.9M      0  0:00:02 --:--:--  0:00:02 45.9M  62  136M   62 84.9M    0     0  57.2M      0  0:00:02  0:00:01  0:00:01 57.2M 100  136M  100  136M    0     0  57.0M      0  0:00:02  0:00:02 --:--:-- 57.0M  |
| 1730730601648 | Configuring GHA self-hosted runner                                                                                                                                                                                                                                                                                            |
| 1730730615670 | --------------------------------------------------------------------------------                                                                                                                                                                                                                                              |
| 1730730615670 | |        ____ _ _   _   _       _          _        _   _                      |                                                                                                                                                                                                                                              |
| 1730730615670 | |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |                                                                                                                                                                                                                                              |
| 1730730615670 | |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |                                                                                                                                                                                                                                              |
| 1730730615670 | |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |                                                                                                                                                                                                                                              |
| 1730730615670 | |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |                                                                                                                                                                                                                                              |
| 1730730615670 | |                       Self-hosted runner registration                        |                                                                                                                                                                                                                                              |
| 1730730615670 | # Authentication                                                                                                                                                                                                                                                                                                              |
| 1730730615670 | √ Connected to GitHub                                                                                                                                                                                                                                                                                                         |
| 1730730617717 | # Runner Registration                                                                                                                                                                                                                                                                                                         |
| 1730730617717 | √ Runner successfully added                                                                                                                                                                                                                                                                                                   |
| 1730730617717 | √ Runner connection is good                                                                                                                                                                                                                                                                                                   |
| 1730730617717 | # Runner settings                                                                                                                                                                                                                                                                                                             |
| 1730730617717 | √ Settings Saved.                                                                                                                                                                                                                                                                                                             |
| 1730730617717 | Running GHA self-hosted runner binary                                                                                                                                                                                                                                                                                         |
| 1730730619730 | √ Connected to GitHub                                                                                                                                                                                                                                                                                                         |
| 1730730619730 | Current runner version: '2.320.0'                                                                                                                                                                                                                                                                                             |
| 1730730619730 | 2024-11-04 14:30:18Z: Listening for Jobs                                                                                                                                                                                                                                                                                      |
| 1730730621746 | 2024-11-04 14:30:19Z: Running job: Linting                                                                                                                                                                                                                                                                                    |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@nagarjunareddysomu
Copy link

We are also getting the same error since couple of days -
The self-hosted runner: lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants