Multiple issues with Service Fabric Runtime #10920

scale-tone · 2024-11-07T12:03:11Z

Description

We're running integration tests on a Service Fabric dev cluster provisioned on an Azure DevOps build pipeline.
We're using internal Windows Server 2022-based agent pool.
Everything worked until this Saturday 02.11.2024.

Before that we were getting this image: 20240922
Since Saturday we started getting this image: 20241021

Starting from Saturday the dev cluster fails to reach healthy state, due to the failing FaultAnalysisService (which is a non-configurable part of Service Fabric runtime). We don't have any visibility into why exactly it is failing.

This repo says we should be having this (two years old) version of Service Fabric runtime: 9.1.1436.9590.

That is not the case: the actual Service Fabric runtime, that now appears on our agents is this (two months old) one: 9.1.2718.9590. We established that by dumping FabricHost.exe from an agent.

We're not able to prove or disprove that SF runtime version is the actual culprit (because we cannot travel back in time to try the previous one - we're always getting the latest agent image, and cannot control its version), but it looks highly likely.

Question1: can there be any workaround for our failing SF cluster? E.g. maybe there's a way to override SF runtime version to be used? (Just remember that SF runtime installer requires root privileges, therefore just running it as part of the pipeline does not work).

Question2: why this repo's change history does not reflect the actual picture, and can this be fixed?

Question3: is there a chance to have SF runtime updated on the agent image? I cannot say which exact version it needs to be updated to (since we have no way to try them out), but maybe just to revert it to the previous, stable one?

Platforms affected

Azure DevOps
GitHub Actions - Standard Runners
GitHub Actions - Larger Runners

Runner images affected

Image version and build link

20241021

Is it regression?

yes

Expected behavior

SF dev cluster starts and successfully goes into healthy state on a build agent

Actual behavior

SF never reaches healthy state (waited for up to 1 hour)

Repro steps

Run a build agent out of this image: 20241021
Setup a Service Fabric dev cluster on it.
Observe it never reaching healthy state (by periodically querying it with Powershell).

vidyasagarnimmagaddi · 2024-11-07T13:32:22Z

Hi @scale-tone , we're looking into this issue , we will update on it ASAP. thank you !

scale-tone added bug report needs triage labels Nov 7, 2024

vidyasagarnimmagaddi assigned subir0071 Nov 7, 2024

vidyasagarnimmagaddi added OS: Windows and removed needs triage labels Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple issues with Service Fabric Runtime #10920

Multiple issues with Service Fabric Runtime #10920

scale-tone commented Nov 7, 2024 •

edited

Loading

vidyasagarnimmagaddi commented Nov 7, 2024

Multiple issues with Service Fabric Runtime #10920

Multiple issues with Service Fabric Runtime #10920

Comments

scale-tone commented Nov 7, 2024 • edited Loading

Description

Platforms affected

Runner images affected

Image version and build link

Is it regression?

Expected behavior

Actual behavior

Repro steps

vidyasagarnimmagaddi commented Nov 7, 2024

scale-tone commented Nov 7, 2024 •

edited

Loading