Backup of VM 134 failed - CT is locked (snapshot-delete)

UweOhse · May 11, 2021

Hello,

the backup of one container fails with the message in the title.
when i "pct unlock" it, the next backup works (i used the backup for development purposes, and i'm quite sure that the backup is complete).

That container is the largest one on the machine, one of the two important ones, and the one which has two filesystems (for quite stupid reasons). It also is replicated to another machine.

In the logfiles (log.txt, attached) i found that:
zfs error: cannot destroy snapshot tank/compressed/subvol-134-disk-1@vzdump: dataset is busy

Status as of now:

Bash:

# zfs list -tall -r /containers/compressed/subvol-134-disk-1
NAME                                                               USED  AVAIL     REFER  MOUNTPOINT
tank/compressed/subvol-134-disk-1                                  244G   157G      243G  /containers/compressed/subvol-134-disk-1
tank/compressed/subvol-134-disk-1@vzdump                           873M      -      243G  -
tank/compressed/subvol-134-disk-1@__replicate_134-0_1620709385__   102M      -      243G  -

System:
pve 6.3-3 (Update coming, possibly even soon)
Linux 5.4.78-2-pve
Backup Server pbs 1.1-5

There was nothing in the current kernel/system logs, there are no zfs entries, and there is just on log entry related to the volume in this months kernel logs:
May 3 17:45:07 x9 pvesr[5687]: 134-0: got unexpected replication job error - command 'set -o pipefail && pvesm export compressed:subvol-134-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_134-0_1620055939__ -base __replicate_134-0_1620054180__ | /usr/bin/cstream -t 50000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=PBSHOST' root@IPV6 -- pvesm import compressed:subvol-134-disk-1 zfs - -with-snapshots 1 -allow-rename 0 -base __replicate_134-0_1620054180__' failed: exit code 255
The replication target server went down for hardware reasons.

Will updating to 6.4 help? If not, what can i do to debug this?

Regards, Uwe

fabian · May 11, 2021

can reproduce, there is a race window between backup and replication where the latter can block a vzdump snapshot from removal. filed https://bugzilla.proxmox.com/show_bug.cgi?id=3424 for tracking, feel free to subscribe for updates.

do you have very frequent replication set up? or a low bwlimit for the replication job?

UweOhse · May 11, 2021

Replication is set to */30, and the bwlimit to 50 MB/s.
The backup of that container takes 50 to 60 minutes (too many files changes, but that's hard to get rid of).

So a schedule of */90 would avoid that problem?

Thanks in advance, Uwe

fabian · May 11, 2021

UweOhse said:
Replication is set to */30, and the bwlimit to 50 MB/s.
The backup of that container takes 50 to 60 minutes (too many files changes, but that's hard to get rid of).

So a schedule of */90 would avoid that problem?

Thanks in advance, Uwe

it might make it more rare (by reducing the chance of a replication starting between vzdump snapshot creation and removal), but getting rid of it 100% requires a code fix.

UweOhse · May 15, 2021

I changed the schedule of the replication, so it doesn't run between 00:30 und 01:59. This keeps the backup window replication free, and i will not lose to much data at the time anyway.
This seems to work good enough for now. i'd likely forget to pct unlock the container more often than that schedule will fail.

Thank you.

Search

Search

Backup of VM 134 failed - CT is locked (snapshot-delete)

UweOhse

New Member

Attachments

fabian

Proxmox Staff Member

UweOhse

New Member

fabian

Proxmox Staff Member

UweOhse

New Member