Backup of VM 134 failed - CT is locked (snapshot-delete)

UweOhse

New Member
Jul 20, 2020
5
0
1
58
Hello,

the backup of one container fails with the message in the title.
when i "pct unlock" it, the next backup works (i used the backup for development purposes, and i'm quite sure that the backup is complete).

That container is the largest one on the machine, one of the two important ones, and the one which has two filesystems (for quite stupid reasons). It also is replicated to another machine.

In the logfiles (log.txt, attached) i found that:
zfs error: cannot destroy snapshot tank/compressed/subvol-134-disk-1@vzdump: dataset is busy

Status as of now:
Bash:
# zfs list -tall -r /containers/compressed/subvol-134-disk-1
NAME                                                               USED  AVAIL     REFER  MOUNTPOINT
tank/compressed/subvol-134-disk-1                                  244G   157G      243G  /containers/compressed/subvol-134-disk-1
tank/compressed/subvol-134-disk-1@vzdump                           873M      -      243G  -
tank/compressed/subvol-134-disk-1@__replicate_134-0_1620709385__   102M      -      243G  -

System:
pve 6.3-3 (Update coming, possibly even soon)
Linux 5.4.78-2-pve
Backup Server pbs 1.1-5

There was nothing in the current kernel/system logs, there are no zfs entries, and there is just on log entry related to the volume in this months kernel logs:
May 3 17:45:07 x9 pvesr[5687]: 134-0: got unexpected replication job error - command 'set -o pipefail && pvesm export compressed:subvol-134-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_134-0_1620055939__ -base __replicate_134-0_1620054180__ | /usr/bin/cstream -t 50000000 | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=PBSHOST' root@IPV6 -- pvesm import compressed:subvol-134-disk-1 zfs - -with-snapshots 1 -allow-rename 0 -base __replicate_134-0_1620054180__' failed: exit code 255
The replication target server went down for hardware reasons.

Will updating to 6.4 help? If not, what can i do to debug this?

Regards, Uwe
 

Attachments

  • log.txt
    2 KB · Views: 1
  • log2.txt
    243 bytes · Views: 0
can reproduce, there is a race window between backup and replication where the latter can block a vzdump snapshot from removal. filed https://bugzilla.proxmox.com/show_bug.cgi?id=3424 for tracking, feel free to subscribe for updates.

do you have very frequent replication set up? or a low bwlimit for the replication job?
 
Replication is set to */30, and the bwlimit to 50 MB/s.
The backup of that container takes 50 to 60 minutes (too many files changes, but that's hard to get rid of).

So a schedule of */90 would avoid that problem?

Thanks in advance, Uwe
 
Replication is set to */30, and the bwlimit to 50 MB/s.
The backup of that container takes 50 to 60 minutes (too many files changes, but that's hard to get rid of).

So a schedule of */90 would avoid that problem?

Thanks in advance, Uwe
it might make it more rare (by reducing the chance of a replication starting between vzdump snapshot creation and removal), but getting rid of it 100% requires a code fix.
 
  • Like
Reactions: ramrot
I changed the schedule of the replication, so it doesn't run between 00:30 und 01:59. This keeps the backup window replication free, and i will not lose to much data at the time anyway.
This seems to work good enough for now. i'd likely forget to pct unlock the container more often than that schedule will fail.

Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!