Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues with Service Fabric Runtime #10920

Open
2 of 15 tasks
scale-tone opened this issue Nov 7, 2024 · 1 comment
Open
2 of 15 tasks

Multiple issues with Service Fabric Runtime #10920

scale-tone opened this issue Nov 7, 2024 · 1 comment

Comments

@scale-tone
Copy link

scale-tone commented Nov 7, 2024

Description

We're running integration tests on a Service Fabric dev cluster provisioned on an Azure DevOps build pipeline.
We're using internal Windows Server 2022-based agent pool.
Everything worked until this Saturday 02.11.2024.

Before that we were getting this image: 20240922
Since Saturday we started getting this image: 20241021

Starting from Saturday the dev cluster fails to reach healthy state, due to the failing FaultAnalysisService (which is a non-configurable part of Service Fabric runtime). We don't have any visibility into why exactly it is failing.

This repo says we should be having this (two years old) version of Service Fabric runtime: 9.1.1436.9590.

That is not the case: the actual Service Fabric runtime, that now appears on our agents is this (two months old) one: 9.1.2718.9590. We established that by dumping FabricHost.exe from an agent.

We're not able to prove or disprove that SF runtime version is the actual culprit (because we cannot travel back in time to try the previous one - we're always getting the latest agent image, and cannot control its version), but it looks highly likely.

Question1: can there be any workaround for our failing SF cluster? E.g. maybe there's a way to override SF runtime version to be used? (Just remember that SF runtime installer requires root privileges, therefore just running it as part of the pipeline does not work).

Question2: why this repo's change history does not reflect the actual picture, and can this be fixed?

Question3: is there a chance to have SF runtime updated on the agent image? I cannot say which exact version it needs to be updated to (since we have no way to try them out), but maybe just to revert it to the previous, stable one?

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 20.04
  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 12
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022

Image version and build link

20241021

image

Is it regression?

yes

Expected behavior

SF dev cluster starts and successfully goes into healthy state on a build agent

Actual behavior

SF never reaches healthy state (waited for up to 1 hour)

Repro steps

  1. Run a build agent out of this image: 20241021
  2. Setup a Service Fabric dev cluster on it.
  3. Observe it never reaching healthy state (by periodically querying it with Powershell).
@vidyasagarnimmagaddi
Copy link
Contributor

Hi @scale-tone , we're looking into this issue , we will update on it ASAP. thank you !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants