DowngradeUpgrade failpoint timeout #19306

siyuanfoundation · 2025-01-29T21:15:17Z

Which Github Action / Prow Jobs are flaking?

https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-etcd-robustness-main-amd64/1884310668266442752

Which tests are flaking?

DowngradeUpgrade

Github Action / Prow Job link

No response

Reason for failure (if possible)

The DowngradeUpgrade failpoint stops and restarts up etcd servers up to 6 times. It could take a long time for a new server to join a cluster.
So it is likely to see the test fail from time to time because Failpoints are expected to finish within 60s.

We should

make Failpoint respect context
reduce the time it takes for DowngradeUpgrade or
make it possible increase the timeout for specific failpoints

Anything else we need to know?

No response

siyuanfoundation · 2025-01-29T21:15:45Z

/cc @henrybear327 @serathius

gangli113 · 2025-01-30T20:43:53Z

/assign

siyuanfoundation added area/robustness-testing area/testing type/flake labels Jan 29, 2025

k8s-ci-robot assigned gangli113 Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DowngradeUpgrade failpoint timeout #19306

DowngradeUpgrade failpoint timeout #19306

siyuanfoundation commented Jan 29, 2025

siyuanfoundation commented Jan 29, 2025

gangli113 commented Jan 30, 2025

DowngradeUpgrade failpoint timeout #19306

DowngradeUpgrade failpoint timeout #19306

Comments

siyuanfoundation commented Jan 29, 2025

Which Github Action / Prow Jobs are flaking?

Which tests are flaking?

Github Action / Prow Job link

Reason for failure (if possible)

Anything else we need to know?

siyuanfoundation commented Jan 29, 2025

gangli113 commented Jan 30, 2025