Snapshot function does not clean up properly?

Ahmet Bas

Well-Known Member
Aug 3, 2018
74
0
46
32
Hello,

I came acros a strange issue. One few servers we are unable to create or delete snapshot. In most cases snapshots work but the deletion makes the VM unvailble until the server gets a hard reset. Below some info

VM config
Code:
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0
cores: 4
cpu: Haswell-noTSX,flags=+aes
cpuunits: 1000
ide2: none,media=cdrom
memory: 8192
name: example
net0: virtio=da:68:a7:22:54:7c,bridge=vmbrxxx,firewall=1,rate=125
numa: 0
onboot: 1
ostype: l26
scsi0: data:5800/vm-5800-disk-0.qcow2,cache=writeback,discard=on,size=400G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=473f7c95-87d4-4758-816f-7881dc119a38
sockets: 1
vmgenid: f65dcd1c-41ab-463a-acb8-d4cce14114f5

Qemu is running:

Code:
qm status 5800 --verbose | grep running-qemu
running-qemu: 7.2.0

Snapshots
Code:
# qm listsnapshot 5800
`-> current                                             You are here!

If i check it with the following command it shows this:
Code:
# qemu-img snapshot -l /mnt/data/images/5800/vm-5800-disk-0.qcow2
Snapshot list:
ID        TAG               VM SIZE                DATE     VM CLOCK     ICOUNT
1         Voorderename          0 B 2024-01-08 19:01:32 02:18:30.527

But when I try to force delete the snapshot it seems that there is no snapshot
Code:
~# qm delsnapshot 5800 Voorderename --force
snapshot 'Voorderename' does not exist

We have assigned 400GB to the VM but when I check the host I see this
Code:
--- /mnt/data/images/5800 ----------------------------------------------------------------------------------------------------------------------------------
  529.5 GiB [##########]  vm-5800-disk-0.qcow2

When I create a Proxmox backup the actual usage is
Code:
INFO: 100% (400.0 GiB of 400.0 GiB) in 42m 39s, read: 170.9 MiB/s, write: 91.4 MiB/s
INFO: backup is sparse: 182.67 GiB (45%) total zero data
INFO: transferred 400.00 GiB in 2559 seconds (160.1 MiB/s)
INFO: archive file size: 172.33GB
INFO: adding notes to backup
INFO: Finished Backup of VM 5800 (00:43:44)
INFO: Backup finished at 2024-01-08 21:09:52
INFO: Backup job finished successfully
TASK OK

The disk usage on the VM is around 232GB.

So there are multiple questions:
1. Why is the assigned storage bigger then the assigned storage? (trim on vm did not help)
2. Why does it fail to create/delete snapshot?
3. Based on the usage could it be possible that there is a snapshot left-over?
 
Hi,
is this an NFS storage? Unfortunately, qcow2 snapshots (and deletion) are implemented synchronously, so the VM has to be blocked until the storage part of the snapshot is created/removed. And with NFS+qcow2 that might take a while. It's better to only remove snapshots while the VM is shut down if you face such issues.

While the VM is shut down, you can remove the left-over snapshot with qemu-img snapshot -d Voorderename /mnt/data/images/5800/vm-5800-disk-0.qcow2
 
While the VM is shut down, you can remove the left-over snapshot with qemu-img snapshot -d Voorderename /mnt/data/images/5800/vm-5800-disk-0.qcow2
Thanks for the info, will this remove all leftovers since after every snapshot fails the disk usage gets bigger and bigger? We have similar setups where we don't have this issue only on a few of the servers.

So in our case, if I understand it right, disable freeze will not work?
Code:
freeze-fs-on-backup=0
 
Thanks for the info, will this remove all leftovers since after every snapshot fails the disk usage gets bigger and bigger?
It will remove the snapshot on the disk. But since the configuration doesn't mention that snapshot anymore, it should be safe to remove. Even if the snapshot operation as a whole failed (e.g. timeout/canceled at a bad time), apparently the disk snapshot does exist, so it does require some space.

So in our case, if I understand it right, disable freeze will not work?
Code:
freeze-fs-on-backup=0
No, that option does not affect creating/removing snapshots, only backup. And it doesn't solve the issue with snapshot+qcow2+slow storage being slow.
 
It will remove the snapshot on the disk. But since the configuration doesn't mention that snapshot anymore, it should be safe to remove. Even if the snapshot operation as a whole failed (e.g. timeout/canceled at a bad time), apparently the disk snapshot does exist, so it does require some space.
Is there a way to remove the disk snapshot? To reclaim the unused disk, there is a way to remove the disk and restore from the backup but this is not a preferred way to remove the snapshot from the disk.
 
Is there a way to remove the disk snapshot? To reclaim the unused disk, there is a way to remove the disk and restore from the backup but this is not a preferred way to remove the snapshot from the disk.
While the VM is shut down, you can remove the left-over snapshot with qemu-img snapshot -d Voorderename /mnt/data/images/5800/vm-5800-disk-0.qcow2
 
Yes, but this will only remove the left over of "Voorderename" and not the other failed snapshot right?
 
Hmm, I'm not sure if there are any others on the disk if qemu-img doesn't list any.
 
Hmm, I'm not sure if there are any others on the disk if qemu-img doesn't list any.
Exactly, but the disk image is now around 800GB and the actual data is < 200GB. Everytime snapshot fails it increases
 
Oh, okay. If you don't have any current snapshots, what you can do is move the image to a different storage (or same storage with raw format if you have enough space) and back again. This can also be done while the VM is running.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!