-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retries and confirmations to ensure CNCF runners and machines are removed. #58
Comments
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Uh-oh. We do need to complete this item. |
It's possible to create machines on Equinix Metal in such a way that there's a termination time associated with them. See the "termination_time" field at https://deploy.equinix.com/developers/api/metal/#tag/Devices/operation/createDevice in the Equinix Metal API reference. (That's not a substitute for cleanup, but it could backstop any other efforts if there's a bug somewhere else). |
There was a short-lived API outage yesterday, described at https://status.equinixmetal.com/incidents/h30n2jlr5d3p which may have impacted manual deletion of these systems. Please retry if you were affected by this. As of this writing, there are 48 systems deployed. |
@vielmetti I'm still facing the issue to access the management UI: |
@gyohuangxin can you open up a ticket with our support team? I'll share your UI issue with the team, but it may be something specific to your account. |
@gyohuangxin Can you please task someone else on the project to assist you with cleaning up the idle and stranded resources while we sort out your access problems. |
The code that notices that a deprovision failed is here It logs an error:
and then exits without retrying. If anything fails for any temporary reason, the machines will live forever until someone has manual attention. Where does this error log go? If it's published somewhere we could look for patterns. |
@Revolyssup, will you please add this to tomorrow’s CI meeting? @edwvilla’s help here is much appreciated. Let’s ensure that we have a quick review and resolution. // @gyohuangxin |
All existing servers were manually deprovisioned today. A fresh batch of newly provisioned servers is running (now) from workflow schedule. Let's see if those servers are automatically deprovisioned on completion of their task. |
Yes, it seems that the test servers are successfully deprovisioned at end of test. 👍 |
Description
There are some remaining CNCF runners not being remove after tests done, the number of them gradually increases over time.
We can delete them manually, but it's better to make sure they are properly removed.
The same thing happened to equinix servers deletion:
Expected Behavior
We should add retries and confirmations to ensure CNCF runners and machines are removed.
Screenshots/Logs
Environment:
The text was updated successfully, but these errors were encountered: