Panic on WaitAllPodVolumesProcessed in velero 1.15.2 #8657

Gui13 · 2025-01-27T10:50:10Z

What steps did you take and what happened:

During a scheduled backup, velero rebooted due to a panic waiting for all podVolumeBackups to be finished.

What did you expect to happen:

No panic ;-)

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

kubectl logs deployment/velero -n velero
velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
velero backup logs <backupname>
velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
velero restore logs <restorename>

Here are the logs:

2025-01-27T04:01:42+01:00    {"time":"2025-01-27T03:01:42.51626203Z","_p":"F","log":"time=\"2025-01-27T03:01:42Z\" level=info msg=\"auth with the storage account access key\" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync logSource=\"/go/pkg/mod/github.com/vmware-tanzu/[email protected]/pkg/util/azure/storage.go:101\" pluginName=velero-plugin-for-microsoft-azure"}
2025-01-27T04:01:42+01:00    {"time":"2025-01-27T03:01:42.591304615Z","_p":"F","log":"time=\"2025-01-27T03:01:42Z\" level=info msg=\"plugin process exited\" backupLocation=velero/default cmd=/plugins/velero-plugin-for-microsoft-azure controller=backup-sync id=26381 logSource=\"pkg/plugin/clientmgmt/process/logrus_adapter.go:80\" plugin=/plugins/velero-plugin-for-microsoft-azure"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.658807814Z","_p":"F","log":"panic: sync: WaitGroup is reused before previous Wait has returned"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.658841385Z","_p":"F","log":""}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.658845486Z","_p":"F","log":"goroutine 180044 [running]:"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.659631212Z","_p":"F","log":"sync.(*WaitGroup).Wait(0xc0012a0fd0?)"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.659640691Z","_p":"F","log":"\t/usr/local/go/src/sync/waitgroup.go:118 +0x74"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.659644046Z","_p":"F","log":"github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).WaitAllPodVolumesProcessed.func2()"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.659647509Z","_p":"F","log":"\t/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:332 +0x53"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.6596526Z","_p":"F","log":"created by github.com/vmware-tanzu/velero/pkg/podvolume.(*backupper).WaitAllPodVolumesProcessed in goroutine 484"}
2025-01-27T04:01:47+01:00    {"time":"2025-01-27T03:01:47.659663401Z","_p":"F","log":"\t/go/src/github.com/vmware-tanzu/velero/pkg/podvolume/backupper.go:330 +0x113"}
2025-01-27T04:01:48+01:00    {"time":"2025-01-27T03:01:48.28284558Z","_p":"F","log":"time=\"2025-01-27T03:01:48Z\" level=info msg=\"setting log-level to INFO\" logSource=\"pkg/cmd/server/server.go:110\""}
2025-01-27T04:01:48+01:00    {"time":"2025-01-27T03:01:48.282876929Z","_p":"F","log":"time=\"2025-01-27T03:01:48Z\" level=info msg=\"Starting Velero server v1.15.1 (32499fc287815058802c1bc46ef620799cca7392-dirty)\" logSource=\"pkg/cmd/server/server.go:112\""}
2025-01-27T04:01:48+01:00    {"time":"2025-01-27T03:01:48.282880497Z","_p":"F","log":"time=\"2025-01-27T03:01:48Z\" level=info msg=\"1 feature flags enabled [EnableCSI]\" logSource=\"pkg/cmd/server/server.go:114\""}
2025-01-27T04:01:48+01:00    {"time":"2025-01-27T03:01:48.323417205Z","_p":"F","log":"time=\"2025-01-27T03:01:48Z\" level=info msg=\"plugin process exited\" cmd=/velero id=16 logSource=\"pkg/plugin/clientmgmt/process/logrus_adapter.go:80\" plugin=/velero"}

Anything else you would like to add:

Environment:

Velero version (use velero version): 1.15.2
Velero features (use velero client config get features): DataMover is used
Kubernetes version (use kubectl version): 1.30
Kubernetes installer & version: Azure AKS
Cloud provider or hardware configuration: Azure
OS (e.g. from /etc/os-release):

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2025-01-27T14:52:26Z

First of all, the release is 1.15.1 instead of 1.15.2, see logs:
level=info msg=\"Starting Velero server v1.15.1

Secondly, from the first glance, I don't think this problem is related to 1.15.1 or 1.15.2. In another word, this problem may be a legacy problem that also happens in previous releases.
The logics of b.wg in WaitAllPodVolumesProcessed is unchanged in 1.15.

We need further checks, but I guess the cause of the problem is here:

		volumeBackup := newPodVolumeBackup(backup, pod, volume, repoIdentifier, b.uploaderType, pvc)
		if err := veleroclient.CreateRetryGenerateName(b.crClient, b.ctx, volumeBackup); err != nil {
			errs = append(errs, err)
			continue
		}
		b.wg.Add(1)

PVB is created by CreateRetryGenerateName and then b.wg.Add is called.
If the PVB successes/fails immediately, b.wg.Done may be called before b.wg.Add.

@Gui13 Please let us know in which case you saw this panic, and whether it is consistent in your env. Please also share velero log bundle when the problem happened.

cc @reasonerjt @ywk253100

Gui13 changed the title ~~Panic on~~ Panic on WaitAllPodVolumesProcessed in velero 1.15.2 Jan 27, 2025

Lyndon-Li added the area/fs-backup label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic on WaitAllPodVolumesProcessed in velero 1.15.2 #8657

Panic on WaitAllPodVolumesProcessed in velero 1.15.2 #8657

Gui13 commented Jan 27, 2025

Lyndon-Li commented Jan 27, 2025 •

edited

Loading

Panic on WaitAllPodVolumesProcessed in velero 1.15.2 #8657

Panic on WaitAllPodVolumesProcessed in velero 1.15.2 #8657

Comments

Gui13 commented Jan 27, 2025

Lyndon-Li commented Jan 27, 2025 • edited Loading

Lyndon-Li commented Jan 27, 2025 •

edited

Loading