Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple intermittent restarts in ebs-csi driver #2122

Open
Neha130 opened this issue Aug 20, 2024 · 1 comment
Open

Multiple intermittent restarts in ebs-csi driver #2122

Neha130 opened this issue Aug 20, 2024 · 1 comment
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Neha130
Copy link

Neha130 commented Aug 20, 2024

/kind bug

What happened?

There have been multiple intermittent restarts in almost all ebs-csi-controller containers, attaching previous container logs below :

container : csi-provisioner

[Aug 08 2024 21:04:06 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:06.841330       1 controller.go:811] Starting provisioner controller ebs.csi.aws.com_ebs-csi-controller-5c7698687-mqxfp_33cd4205-586f-42f2-a13b-ee18c5cf8f67!
[Aug 08 2024 21:04:06 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:06.942386       1 controller.go:860] Started provisioner controller ebs.csi.aws.com_ebs-csi-controller-5c7698687-mqxfp_33cd4205-586f-42f2-a13b-ee18c5cf8f67!
[Aug 18 2024 23:56:45 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: E0818 18:26:45.933586       1 leaderelection.go:367] Failed to update lock: etcdserver: request timed out
[Aug 18 2024 23:56:48 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0818 18:26:48.927115       1 leaderelection.go:283] failed to renew lease utils/ebs-csi-aws-com: timed out waiting for the condition
[Aug 18 2024 23:56:48 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: F0818 18:26:48.927147       1 leader_election.go:182] stopped leading
[Aug 18 2024 23:56:49 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0818 18:26:48.932146       1 volume_store.go:104] Stopped save volume queue
[Aug 18 2024 23:56:49 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0818 18:26:48.932146       1 volume_store.go:104] Stopped save volume queue
[Aug 18 2024 23:56:49 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0818 18:26:48.932146       1 volume_store.go:104] Stopped save volume queue

[Aug 18 2024 18:33:53 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:53.350772       1 leaderelection.go:285] failed to renew lease kube-system/ebs-csi-aws-com: timed out waiting for the condition
[Aug 18 2024 18:33:53 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: F0818 13:03:53.350826       1 leader_election.go:181] stopped leading

container : csi-attacher

[Aug 08 2024 21:04:35 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:35.508399       1 csi_handler.go:282] Detaching "csi-16e4c15998045a1b011d7e7f034740d248b227d0d2d1d485d8b81949996fc8d1"
[Aug 08 2024 21:04:35 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:35.513470       1 csi_handler.go:251] Attaching "csi-80c174c723a6f7e84710cba88b5b4ca36286b06f8808379a063da772ce0b598a"
[Aug 08 2024 21:04:35 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:35.912392       1 csi_handler.go:581] Detached "csi-16e4c15998045a1b011d7e7f034740d248b227d0d2d1d485d8b81949996fc8d1"
[Aug 08 2024 21:04:37 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:34:37.460209       1 csi_handler.go:264] Attached "csi-80c174c723a6f7e84710cba88b5b4ca36286b06f8808379a063da772ce0b598a"
[Aug 08 2024 21:05:32 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:35:32.159139       1 csi_handler.go:251] Attaching "csi-1f84f2fd70d26f1d43500d0b05aaef3a05a7964e36770d94304267568902fc90"
[Aug 08 2024 21:05:34 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0808 15:35:34.184115       1 csi_handler.go:264] Attached "csi-1f84f2fd70d26f1d43500d0b05aaef3a05a7964e36770d94304267568902fc90"
[Aug 18 2024 23:56:46 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: E0818 18:26:46.780871       1 leaderelection.go:367] Failed to update lock: etcdserver: request timed out
[Aug 18 2024 23:56:49 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: I0818 18:26:49.769493       1 leaderelection.go:283] failed to renew lease utils/external-attacher-leader-ebs-csi-aws-com: timed out waiting for the condition
[Aug 18 2024 23:56:49 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: F0818 18:26:49.769524       1 leader_election.go:182] stopped leading

container : csi-resizer

[Aug 18 2024 18:33:54 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:54.034790       1 controller.go:262] "Shutting down external resizer" controller="ebs.csi.aws.com"
[Aug 18 2024 18:33:54 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: E0818 13:03:54.034653       1 leaderelection.go:332] error retrieving resource lock kube-system/external-resizer-ebs-csi-aws-com: Get "https://10.100.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/external-resizer-ebs-csi-aws-com": context deadline exceeded
[Aug 18 2024 18:33:54 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:54.034699       1 leaderelection.go:285] failed to renew lease kube-system/external-resizer-ebs-csi-aws-com: timed out waiting for the condition
[Aug 18 2024 18:33:54 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: F0818 13:03:54.034731       1 leader_election.go:181] stopped leading
[Aug 18 2024 18:33:54 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:54.034790       1 controller.go:262] "Shutting down external resizer" controller="ebs.csi.aws.com"

container : csi-snapshotter

[Aug 18 2024 18:33:58 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:58.527728       1 leaderelection.go:285] failed to renew lease kube-system/external-snapshotter-leader-ebs-csi-aws-com: timed out waiting for the condition
[Aug 18 2024 18:33:58 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: F0818 13:03:58.527782       1 leader_election.go:181] stopped leading
[Aug 18 2024 18:33:58 GMT+0530] ebs-csi-controller-c76b64f95-fvnnt: I0818 13:03:58.527728       1 leaderelection.go:285] failed to renew lease kube-system/external-snapshotter-leader-ebs-csi-aws-com: timed out waiting for the condition

Environment

  • Kubernetes version (use kubectl version): v1.30.0

  • Driver version:
    images version we are using :
    csi-attacher: v4.5.1-eks-1-30-2
    csi-provisioner: v4.0.1-eks-1-30-2
    csi-snapshotter: v7.0.2-eks-1-30-2
    csi-resizer: v1.10.1-eks-1-30-2

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 20, 2024
@ConnorJC3
Copy link
Contributor

ConnorJC3 commented Aug 21, 2024

Hi @Neha130 - the errors you are experiencing indicate an issue with your Kubernetes control plane. Based on the logs, the Kubernetes API server appears to be timing out when the sidecars are attempting to update the lease.

In particular, these errors indicate a likely issue with your cluster's etcd installation:

[Aug 18 2024 23:56:45 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: E0818 18:26:45.933586       1 leaderelection.go:367] Failed to update lock: etcdserver: request timed out
[Aug 18 2024 23:56:46 GMT+0530] ebs-csi-controller-5c7698687-mqxfp: E0818 18:26:46.780871       1 leaderelection.go:367] Failed to update lock: etcdserver: request timed out

You will need to rectify this issue for the EBS CSI Driver to function properly. The EBS CSI Driver (and the Kubernetes CSI sidecars it uses) are not designed to work in an environment where the Kubernetes API server is failing or timing out requests, and may experience abnormal behavior such as the restarts you are seeing in such an environment.

I would recommend reaching out for support from whoever operates your Kubernetes cluster or provides your Kubernetes distro, if applicable, for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants