Add retries and confirmations to ensure CNCF runners and machines are removed. #58

gyohuangxin · 2022-07-15T02:17:54Z

Description

There are some remaining CNCF runners not being remove after tests done, the number of them gradually increases over time.
We can delete them manually, but it's better to make sure they are properly removed.

The same thing happened to equinix servers deletion：

Expected Behavior

We should add retries and confirmations to ensure CNCF runners and machines are removed.

Screenshots/Logs

Environment:

Meshery Version:
Kubernetes Version:
Host OS:
Browser:

stale · 2022-09-09T01:54:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-09-21T00:40:14Z

This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.

stale · 2022-11-12T14:45:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2022-11-22T21:38:40Z

This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.

stale · 2023-01-07T12:39:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

leecalcote · 2023-01-08T18:41:58Z

Uh-oh. We do need to complete this item.

vielmetti · 2023-02-28T20:44:36Z

It's possible to create machines on Equinix Metal in such a way that there's a termination time associated with them. See the "termination_time" field at

https://deploy.equinix.com/developers/api/metal/#tag/Devices/operation/createDevice

in the Equinix Metal API reference.

(That's not a substitute for cleanup, but it could backstop any other efforts if there's a bug somewhere else).

vielmetti · 2023-03-01T15:24:54Z

There was a short-lived API outage yesterday, described at

https://status.equinixmetal.com/incidents/h30n2jlr5d3p

which may have impacted manual deletion of these systems. Please retry if you were affected by this. As of this writing, there are 48 systems deployed.

gyohuangxin · 2023-03-01T16:46:00Z

@vielmetti I'm still facing the issue to access the management UI:

vielmetti · 2023-03-01T17:05:49Z

@gyohuangxin can you open up a ticket with our support team? I'll share your UI issue with the team, but it may be something specific to your account.

vielmetti · 2023-03-07T12:09:05Z

@gyohuangxin Can you please task someone else on the project to assist you with cleaning up the idle and stranded resources while we sort out your access problems.

vielmetti · 2023-03-22T15:37:50Z

The code that notices that a deprovision failed is here

https://github.com/layer5io/meshery-smp-action/blob/862c5283953f1b5a3a607c9e1f00461f98a4b4d5/.github/workflows/scripts/stop-cil-runner.sh#L19

It logs an error:

echo "ERROR: Failed to remove CNCF CIL machine: $hostname, device id: $device_id."

and then exits without retrying. If anything fails for any temporary reason, the machines will live forever until someone has manual attention.

Where does this error log go? If it's published somewhere we could look for patterns.

leecalcote · 2023-03-22T15:40:23Z

@Revolyssup, will you please add this to tomorrow’s CI meeting? @edwvilla’s help here is much appreciated. Let’s ensure that we have a quick review and resolution. // @gyohuangxin

leecalcote · 2023-03-23T02:50:31Z

All existing servers were manually deprovisioned today. A fresh batch of newly provisioned servers is running (now) from workflow schedule. Let's see if those servers are automatically deprovisioned on completion of their task.

leecalcote · 2023-03-23T03:05:01Z

Yes, it seems that the test servers are successfully deprovisioned at end of test. 👍

gyohuangxin added the kind/bug Something isn't working label Jul 15, 2022

stale bot added the issue/stale Issue has not had any activity for an extended period of time label Sep 9, 2022

stale bot closed this as completed Sep 21, 2022

gyohuangxin reopened this Sep 21, 2022

stale bot removed the issue/stale Issue has not had any activity for an extended period of time label Sep 21, 2022

stale bot added the issue/stale Issue has not had any activity for an extended period of time label Nov 12, 2022

stale bot closed this as completed Nov 22, 2022

leecalcote reopened this Nov 22, 2022

stale bot removed the issue/stale Issue has not had any activity for an extended period of time label Nov 22, 2022

stale bot added the issue/stale Issue has not had any activity for an extended period of time label Jan 7, 2023

stale bot removed the issue/stale Issue has not had any activity for an extended period of time label Jan 8, 2023

leecalcote added issue/willfix This issue will be worked on area/performance Performance management labels Feb 11, 2023

leecalcote added the priority/high High priority issue label Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retries and confirmations to ensure CNCF runners and machines are removed. #58

Add retries and confirmations to ensure CNCF runners and machines are removed. #58

gyohuangxin commented Jul 15, 2022

stale bot commented Sep 9, 2022

stale bot commented Sep 21, 2022

stale bot commented Nov 12, 2022

stale bot commented Nov 22, 2022

stale bot commented Jan 7, 2023

leecalcote commented Jan 8, 2023

vielmetti commented Feb 28, 2023

vielmetti commented Mar 1, 2023

gyohuangxin commented Mar 1, 2023

vielmetti commented Mar 1, 2023

vielmetti commented Mar 7, 2023

vielmetti commented Mar 22, 2023

leecalcote commented Mar 22, 2023

leecalcote commented Mar 23, 2023

leecalcote commented Mar 23, 2023

Add retries and confirmations to ensure CNCF runners and machines are removed. #58

Add retries and confirmations to ensure CNCF runners and machines are removed. #58

Comments

gyohuangxin commented Jul 15, 2022

Description

Expected Behavior

Screenshots/Logs

Environment:

stale bot commented Sep 9, 2022

stale bot commented Sep 21, 2022

stale bot commented Nov 12, 2022

stale bot commented Nov 22, 2022

stale bot commented Jan 7, 2023

leecalcote commented Jan 8, 2023

vielmetti commented Feb 28, 2023

vielmetti commented Mar 1, 2023

gyohuangxin commented Mar 1, 2023

vielmetti commented Mar 1, 2023

vielmetti commented Mar 7, 2023

vielmetti commented Mar 22, 2023

leecalcote commented Mar 22, 2023

leecalcote commented Mar 23, 2023

leecalcote commented Mar 23, 2023