[SOLVED] Delete stuck snapshot

krystofr

Member
Dec 30, 2016
20
1
23
churchweb.uk
Hi there,

I have an old snapshot on my system, which won't delete. It's not being used and he status says 'delete' but is appears stuck. If I try and delete it again in the GUI the delete fails, and the VM is locked.
I have to unlock the vm manually on the command line.


Could someone tell me how to manually delete the snapshot from the command line?

Thank you in advance
 
Hi,
is there any output in the task log? What kind of disks are/were attached to the VM?

With qm delsnapshot <ID> <snapname> --force you can force removal of the snapshot from the configuration, but it can happen that snapshots on the disks remain.
 
HI Fabian,

Thanks for the response. Sorry I should have included the message from the task log. It is this, TASK ERROR: zfs error: could not find any snapshots to destroy; check snapshot names.

So I'm assuming it was correctly removed from the disk, but not from the configuration, so I'll give the qm command above ago.
 
I ran qm delsnapshot <ID> <snapname> --force, and was given this error msg: zfs error: could not find any snapshots to destroy; check snapshot names

However the problem is resolved, and it has gone from the gui. thanks so much!
 
Still a problem on 8.3.3 and affects CT's as well. Given how long this has been an issue, I don't think there's any fix coming soon, if ever.
 
Hi,
Still a problem on 8.3.3 and affects CT's as well. Given how long this has been an issue, I don't think there's any fix coming soon, if ever.
the situation here is only a symptom of an earlier failure, so nothing that can be "fixed". If you can provide details about why the snapshot removal task failed/couldn't complete, that can be looked into and improved. Please share the full task log of the initial failed snapshot removal task as well as the system logs/journal from around that time and the output of pveversion -v.
 
Hi,

the situation here is only a symptom of an earlier failure, so nothing that can be "fixed". If you can provide details about why the snapshot removal task failed/couldn't complete, that can be looked into and improved. Please share the full task log of the initial failed snapshot removal task as well as the system logs/journal from around that time and the output of pveversion -v.

The backup output that caused the failure:

Code:
INFO: Starting Backup of VM 116 (lxc)
INFO: Backup started at 2025-02-04 16:02:07
INFO: status = running
INFO: CT Name: autobrr-backend
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating Proxmox Backup Server archive 'ct/116/2025-02-04T21:02:07Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=encrypt --keyfd=11 pct.conf:/var/tmp/vzdumptmp318742_116/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 116 --backup-time 1738702927 --change-detection-mode metadata --entries-max 1048576 --repository pve@pbs!pve-token@pbs-backend.unspecnet.com:local_store --ns pve
INFO: Starting backup: [pve]:ct/116/2025-02-04T21:02:07Z   
INFO: Client name: pve   
INFO: Starting backup protocol: Tue Feb  4 16:02:13 2025   
INFO: Using encryption key from file descriptor..   
INFO: Encryption key fingerprint: 0e:cd:81:7f:fc:c9:de:62   
INFO: Downloading previous manifest (Tue Feb  4 15:02:08 2025)   
INFO: Upload config file '/var/tmp/vzdumptmp318742_116/etc/vzdump/pct.conf' to 'pve@pbs!pve-token@pbs-backend.unspecnet.com:8007:local_store' as pct.conf.blob   
INFO: Upload directory '/mnt/vzsnap0' to 'pve@pbs!pve-token@pbs-backend.unspecnet.com:8007:local_store' as root.mpxar.didx   
INFO: Using previous index as metadata reference for 'root.mpxar.didx'   
INFO: Change detection summary:
INFO:  - 18981 total files (7 hardlinks)
INFO:  - 16767 unchanged, reusable files with 1.768 GiB data
INFO:  - 2207 changed or non-reusable files with 172.882 MiB data
INFO:  - 16.537 MiB padding in 52 partially reused chunks
INFO: root.mpxar: had to backup 3.292 MiB of 3.292 MiB (compressed 611.007 KiB) in 3.37 s (average 1001.245 KiB/s)
INFO: root.ppxar: reused 1.784 GiB from previous snapshot for unchanged files (611 chunks)
INFO: root.ppxar: had to backup 139.721 MiB of 1.953 GiB (compressed 27.362 MiB) in 3.37 s (average 41.476 MiB/s)
INFO: root.ppxar: backup was done incrementally, reused 1.817 GiB (93.0%)
INFO: Duration: 3.52s   
INFO: End Time: Tue Feb  4 16:02:16 2025   
INFO: adding notes to backup
INFO: cleanup temporary 'vzdump' snapshot
snapshot 'vzdump' was not (fully) removed - zfs error: cannot destroy snapshot rpool/data/subvol-116-disk-0@vzdump: dataset is busy
INFO: Finished Backup of VM 116 (00:00:09)
INFO: Backup finished at 2025-02-04 16:02:16

System log at time of backup:

Code:
Feb 04 16:00:01 pve pvescheduler[318741]: <root@pam> starting task UPID:pve:0004DD16:001998FD:67A27FD1:vzdump::root@pam:
Feb 04 16:00:01 pve pvescheduler[318742]: INFO: starting new backup job: vzdump --pbs-change-detection-mode metadata --quiet 1 --fleecing 0 --notes-template 'snapshot: {{cluster}}_{{node}}_{{vmid}}_{{guestname}}' --prune-backups 'keep-all=1' --mode snapshot --all 1 --storage pbs_local
Feb 04 16:00:01 pve pvescheduler[318742]: INFO: Starting Backup of VM 100 (qemu)
Feb 04 16:00:04 pve pvescheduler[318742]: INFO: Finished Backup of VM 100 (00:00:03)
Feb 04 16:00:05 pve pvescheduler[318742]: INFO: Starting Backup of VM 101 (lxc)
Feb 04 16:00:14 pve pvescheduler[318742]: INFO: Finished Backup of VM 101 (00:00:09)
Feb 04 16:00:14 pve pvescheduler[318742]: INFO: Starting Backup of VM 102 (lxc)
Feb 04 16:00:17 pve pvescheduler[318742]: INFO: Finished Backup of VM 102 (00:00:03)
Feb 04 16:00:17 pve pvescheduler[318742]: INFO: Starting Backup of VM 103 (qemu)
Feb 04 16:00:23 pve pvescheduler[318742]: INFO: Finished Backup of VM 103 (00:00:06)
Feb 04 16:00:23 pve pvescheduler[318742]: INFO: Starting Backup of VM 104 (qemu)
Feb 04 16:00:25 pve pvescheduler[318742]: INFO: Finished Backup of VM 104 (00:00:02)
Feb 04 16:00:25 pve pvescheduler[318742]: INFO: Starting Backup of VM 105 (lxc)
Feb 04 16:00:35 pve pvescheduler[318742]: INFO: Finished Backup of VM 105 (00:00:10)
Feb 04 16:00:35 pve pvescheduler[318742]: INFO: Starting Backup of VM 106 (lxc)
Feb 04 16:00:39 pve pvescheduler[318742]: INFO: Finished Backup of VM 106 (00:00:04)
Feb 04 16:00:39 pve pvescheduler[318742]: INFO: Starting Backup of VM 107 (lxc)
Feb 04 16:00:44 pve pvescheduler[318742]: INFO: Finished Backup of VM 107 (00:00:05)
Feb 04 16:00:44 pve pvescheduler[318742]: INFO: Starting Backup of VM 108 (lxc)
Feb 04 16:01:00 pve pvescheduler[318742]: INFO: Finished Backup of VM 108 (00:00:16)
Feb 04 16:01:00 pve pvescheduler[318742]: INFO: Starting Backup of VM 109 (qemu)
Feb 04 16:01:00 pve systemd[1]: Started 109.scope.
Feb 04 16:01:02 pve kernel: tap109i0: entered promiscuous mode
Feb 04 16:01:02 pve kernel: vmbr20: port 12(fwpr109p0) entered blocking state
Feb 04 16:01:02 pve kernel: vmbr20: port 12(fwpr109p0) entered disabled state
Feb 04 16:01:02 pve kernel: fwpr109p0: entered allmulticast mode
Feb 04 16:01:02 pve kernel: fwpr109p0: entered promiscuous mode
Feb 04 16:01:02 pve kernel: vmbr20: port 12(fwpr109p0) entered blocking state
Feb 04 16:01:02 pve kernel: vmbr20: port 12(fwpr109p0) entered forwarding state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 1(fwln109i0) entered blocking state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 1(fwln109i0) entered disabled state
Feb 04 16:01:02 pve kernel: fwln109i0: entered allmulticast mode
Feb 04 16:01:02 pve kernel: fwln109i0: entered promiscuous mode
Feb 04 16:01:02 pve kernel: fwbr109i0: port 1(fwln109i0) entered blocking state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 1(fwln109i0) entered forwarding state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 2(tap109i0) entered blocking state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 2(tap109i0) entered disabled state
Feb 04 16:01:02 pve kernel: tap109i0: entered allmulticast mode
Feb 04 16:01:02 pve kernel: fwbr109i0: port 2(tap109i0) entered blocking state
Feb 04 16:01:02 pve kernel: fwbr109i0: port 2(tap109i0) entered forwarding state
Feb 04 16:01:02 pve pvescheduler[318742]: VM 109 started with PID 320151.
Feb 04 16:01:05 pve kernel: tap109i0: left allmulticast mode
Feb 04 16:01:05 pve kernel: fwbr109i0: port 2(tap109i0) entered disabled state
Feb 04 16:01:05 pve kernel: fwbr109i0: port 1(fwln109i0) entered disabled state
Feb 04 16:01:05 pve kernel: vmbr20: port 12(fwpr109p0) entered disabled state
Feb 04 16:01:05 pve kernel: fwln109i0 (unregistering): left allmulticast mode
Feb 04 16:01:05 pve kernel: fwln109i0 (unregistering): left promiscuous mode
Feb 04 16:01:05 pve kernel: fwbr109i0: port 1(fwln109i0) entered disabled state
Feb 04 16:01:05 pve kernel: fwpr109p0 (unregistering): left allmulticast mode
Feb 04 16:01:05 pve kernel: fwpr109p0 (unregistering): left promiscuous mode
Feb 04 16:01:05 pve kernel: vmbr20: port 12(fwpr109p0) entered disabled state
Feb 04 16:01:05 pve qmeventd[1048]: read: Connection reset by peer
Feb 04 16:01:05 pve systemd[1]: 109.scope: Deactivated successfully.
Feb 04 16:01:05 pve systemd[1]: 109.scope: Consumed 3.852s CPU time.
Feb 04 16:01:06 pve pvescheduler[318742]: INFO: Finished Backup of VM 109 (00:00:06)
Feb 04 16:01:07 pve pvescheduler[318742]: INFO: Starting Backup of VM 110 (lxc)
Feb 04 16:01:07 pve qmeventd[320295]: Starting cleanup for 109
Feb 04 16:01:07 pve qmeventd[320295]: Finished cleanup for 109
Feb 04 16:01:16 pve pvescheduler[318742]: INFO: Finished Backup of VM 110 (00:00:10)
Feb 04 16:01:16 pve pvescheduler[318742]: INFO: Starting Backup of VM 111 (lxc)
Feb 04 16:01:27 pve pvescheduler[318742]: INFO: Finished Backup of VM 111 (00:00:11)
Feb 04 16:01:28 pve pvescheduler[318742]: INFO: Starting Backup of VM 112 (lxc)
Feb 04 16:01:44 pve pvescheduler[318742]: INFO: Finished Backup of VM 112 (00:00:16)
Feb 04 16:01:44 pve pvescheduler[318742]: INFO: Starting Backup of VM 113 (lxc)
Feb 04 16:01:52 pve pvescheduler[318742]: INFO: Finished Backup of VM 113 (00:00:08)
Feb 04 16:01:52 pve pvescheduler[318742]: INFO: Starting Backup of VM 114 (lxc)
Feb 04 16:01:56 pve pvescheduler[318742]: INFO: Finished Backup of VM 114 (00:00:04)
Feb 04 16:01:56 pve pvescheduler[318742]: INFO: Starting Backup of VM 115 (lxc)
Feb 04 16:02:07 pve pvescheduler[318742]: INFO: Finished Backup of VM 115 (00:00:11)
Feb 04 16:02:07 pve pvescheduler[318742]: INFO: Starting Backup of VM 116 (lxc)
Feb 04 16:02:16 pve pvescheduler[318742]: snapshot 'vzdump' was not (fully) removed - zfs error: cannot destroy snapshot rpool/data/subvol-116-disk-0@vzdump: dataset is busy
Feb 04 16:02:16 pve pvescheduler[318742]: INFO: Finished Backup of VM 116 (00:00:09)

pveversion -v output:

Code:
proxmox-ve: 8.3.0 (running kernel: 6.8.12-8-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-8
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20241112.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1

This issue occurred this morning, and then again at 4pm EST. It's slowly getting worse and worse. Used to occur maybe once or twice a month or two, then weekly, then daily, and now it appears to be happening twice a day. It's rendering automated backups entirely worthless since I have to baby sit the server.
 
Hello. When I see :
snapshot 'vzdump' was not (fully) removed - zfs error: cannot destroy snapshot rpool/data/subvol-116-disk-0@vzdump: dataset is busy
It seem to me that the filesystem is overbusy.
What do the IO wait graph (MAX) shows during the backup ?
Maybe you can try to limit the bandwidth of the backup process. And you can try to set "ionice: 8" in /etc/vzdump.conf
 
  • Like
Reactions: fiona
I think iodelay hits somewhere in the 10-15% during the problematic backup. Might very well be due to overly busy FS - my VM/CT drive and Proxmox boot drive are one and the same.

Any recommendations for what to limit the bandwidth to? These are mirrored sata III ssd's
 
Look at "advanced" tab in the backup configuration, and change bandwidth limit (first, try to take 75% of the used bandwidth during backups).
But with containers, I fear that IOs will be the problem (plenty of small files)... So maybe try to set the IO workers to 4 ? I never tried this.
 
Look at "advanced" tab in the backup configuration, and change bandwidth limit (first, try to take 75% of the used bandwidth during backups).
But with containers, I fear that IOs will be the problem (plenty of small files)... So maybe try to set the IO workers to 4 ? I never tried this.
I'll try setting ionice: 8 and 50% bandwidth for now. It looks like backups run at ~100MiB/s, so roughly 50 MiB/s.

The tooltip for io workers says "I/O workers in the QEMU process (VMs only)" - doesn't this mean setting IO workers to 4 not do anything for containers?
 
So, setting ionice: 8 and 50% bandwidth absolutely tanks performance. A single VM/CT backup takes 5-6 minutes. Going to see if limiting bandwidth alone is enough.

Edit: ionice doesn't even do anything, actually. Proxmox by default uses Deadline, and ionice is only for CFQ
 
Last edited:
I'll try setting ionice: 8 and 50% bandwidth for now. It looks like backups run at ~100MiB/s, so roughly 50 MiB/s.

The tooltip for io workers says "I/O workers in the QEMU process (VMs only)" - doesn't this mean setting IO workers to 4 not do anything for containers?
Ok.
Indeed, this won't help you when backuping containers.
 
Bandwith
So, setting ionice: 8 and 50% bandwidth absolutely tanks performance. A single VM/CT backup takes 5-6 minutes. Going to see if limiting bandwidth alone is enough.

Edit: ionice doesn't even do anything, actually. Proxmox by default uses Deadline, and ionice is only for CFQ
Bandwidth should apply but maybe IOs will still be high.

Why do you think ionice 8 won't do anything ? From the documentation : https://pve.proxmox.com/pve-docs/vzdump.1.html :
ionice: <integer> (0 - 8) (default = 7)
Set IO priority when using the BFQ scheduler. For snapshot and suspend mode backups of VMs, this only affects the compressor. A value of 8 means the idle priority is used, otherwise the best-effort priority is used with the specified value.
 
Bandwith

Bandwidth should apply but maybe IOs will still be high.

Why do you think ionice 8 won't do anything ? From the documentation : https://pve.proxmox.com/pve-docs/vzdump.1.html :
ionice: <integer> (0 - 8) (default = 7)
Set IO priority when using the BFQ scheduler. For snapshot and suspend mode backups of VMs, this only affects the compressor. A value of 8 means the idle priority is used, otherwise the best-effort priority is used with the specified value.
Note that the BFQ scheduler is often not used, you can check with e.g. for /dev/sda with cat /sys/block/sda/queue/scheduler
 
  • Like
Reactions: ghusson
So Unspec, maybe you should try to change the scheduler algorithm to CFQ ?
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="... elevator=bfq"

update-grub
 
Last edited: