Container locked after backup, stuck at delete vzdump backup snapshot

rholighaus

Well-Known Member
Dec 15, 2016
97
8
48
60
Berlin
I'm running a nightly backup of VMs and LXCs to PBS.

root@carrier:~# pveversion -v
proxmox-ve: 6.2-2 (running kernel: 5.4.65-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.2-7
pve-kernel-helper: 6.2-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-9
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-2
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-19
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve2

Almost every night, one of the containers leaves a vzdump snapshot hanging that I have to manually remove and edit the lxc config file to remove the lock and snapshot from there.

Containers and VMs are stored on zfs.
Any idea what additional information could help to debug this?
 
Last edited:
what does the log of the backup of that container say? does the snapshot removal run into a timeout? does the backup itself work?
 
After cleaning up the snapshot and lxc config, the snapshot of this container is left over again after last night's backup.
And it's locked. No error message as you can see.

100: 2020-11-24 22:00:07 INFO: Starting Backup of VM 100 (lxc)
100: 2020-11-24 22:00:07 INFO: status = running
100: 2020-11-24 22:00:07 INFO: CT Name: zucker
100: 2020-11-24 22:00:07 INFO: including mount point rootfs ('/') in backup
100: 2020-11-24 22:00:07 INFO: backup mode: snapshot
100: 2020-11-24 22:00:07 INFO: ionice priority: 7
100: 2020-11-24 22:00:07 INFO: create storage snapshot 'vzdump'
100: 2020-11-24 22:00:07 INFO: creating Proxmox Backup Server archive 'ct/100/2020-11-24T21:00:07Z'
100: 2020-11-24 22:00:07 INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp21497_100/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --backup-type ct --backup-id 100 --backup-time 1606251607 --repository admin@pbs@backup-server.xxxxx.com:backup-test
100: 2020-11-24 22:00:07 INFO: Starting backup: ct/100/2020-11-24T21:00:07Z
100: 2020-11-24 22:00:07 INFO: Client name: carrier
100: 2020-11-24 22:00:07 INFO: Starting backup protocol: Tue Nov 24 22:00:07 2020
100: 2020-11-24 22:00:12 INFO: Upload config file '/var/tmp/vzdumptmp21497_100/etc/vzdump/pct.conf' to 'admin@pbs@backup-server.xxxxxxxx.com:8007:backup-test' as pct.conf.blob
100: 2020-11-24 22:00:12 INFO: Upload directory '/mnt/vzsnap0' to 'admin@pbs@backup-server.xxxxxxxx.com:8007:backup-test' as root.pxar.didx
100: 2020-11-24 22:02:54 INFO: root.pxar: had to upload 2.15 GiB of 12.97 GiB in 161.86s, average speed 13.60 MiB/s).
100: 2020-11-24 22:02:54 INFO: root.pxar: backup was done incrementally, reused 10.82 GiB (83.4%)
100: 2020-11-24 22:02:54 INFO: Uploaded backup catalog (13.51 MiB)
100: 2020-11-24 22:02:54 INFO: Duration: 167.07s
100: 2020-11-24 22:02:54 INFO: End Time: Tue Nov 24 22:02:54 2020
100: 2020-11-24 22:03:06 INFO: running 'proxmox-backup-client prune' for 'ct/100'
100: 2020-11-24 22:03:13 INFO: remove vzdump snapshot
100: 2020-11-24 22:03:14 INFO: Finished Backup of VM 100 (00:03:07)

I am running znapzend to take hourly snapshots, replicate them to one or more remote sites and keep offsite snapshots for longer periods than onsite (as onsite runs on SSDs = expensive). I have not yet found an alternative solution for this requirement that would be officially supported by PVE.

Could this cause a conflict?

root@carrier:/var/log# zfs list -tall -r rpool/data/subvol-100-disk-1
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/subvol-100-disk-1 15.5G 1.38T 8.02G /rpool/data/subvol-100-disk-1
rpool/data/subvol-100-disk-1@2020-11-18-000000 625M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-19-000000 517M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-20-000000 518M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-21-000000 490M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-22-000000 491M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-23-000000 517M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-23-210000 28.8M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-23-220000 23.0M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-23-230000 23.3M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-24-000000 22.9M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-010000 23.0M - 8.01G -
rpool/data/subvol-100-disk-1@2020-11-24-020000 23.4M - 7.98G -
rpool/data/subvol-100-disk-1@2020-11-24-030000 23.9M - 7.99G -
rpool/data/subvol-100-disk-1@2020-11-24-040000 29.3M - 7.98G -
rpool/data/subvol-100-disk-1@2020-11-24-050000 31.3M - 7.99G -
rpool/data/subvol-100-disk-1@2020-11-24-060000 31.8M - 7.99G -
rpool/data/subvol-100-disk-1@2020-11-24-070000 45.4M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-080000 50.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-090000 27.5M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-100000 52.6M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-110000 78.5M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-120000 78.2M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-130000 74.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-140000 75.2M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-150000 78.3M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-160000 74.2M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-170000 74.2M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-180000 77.4M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-24-190000 24.7M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-200000 23.1M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-210000 24.5M - 8.02G -
rpool/data/subvol-100-disk-1@vzdump 5.75M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-220000 5.80M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-24-230000 47.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-000000 73.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-010000 23.8M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-25-020000 24.8M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-25-030000 25.3M - 7.99G -
rpool/data/subvol-100-disk-1@2020-11-25-040000 29.5M - 7.99G -
rpool/data/subvol-100-disk-1@2020-11-25-050000 31.0M - 8.00G -
rpool/data/subvol-100-disk-1@2020-11-25-060000 31.6M - 8.00G -
rpool/data/subvol-100-disk-1@2020-11-25-070000 31.8M - 8.02G -
rpool/data/subvol-100-disk-1@2020-11-25-080000 48.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-090000 78.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-100000 76.9M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-110000 76.7M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-120000 74.8M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-130000 75.7M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-140000 76.7M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-150000 76.7M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-160000 75.1M - 8.04G -
rpool/data/subvol-100-disk-1@2020-11-25-170000 68.6M - 8.04G -
rpool/data/subvol-100-disk-1@__replicate_100-1_1606321801__ 3.93M - 8.02G -
rpool/data/subvol-100-disk-1@__replicate_100-2_1606321835__ 3.65M - 8.02G -
 
Last edited:
very strange.. I don't see any way that removal could fail without printing an error.. anything in the system log around that time?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!