Snapshot removal "jams" the VM

hac3ru

New Member
Mar 6, 2021
8
0
1
30
Hello,

I'm running Jenkins inside a VM ran on a Proxmox VE 7.1.8. The issue is that when I delete a snapshot, the VM enters a locked state, which in turn stops it from responding to requests. I don't know if this is the normal behavior and, if it is, why is it running this way? In my opinion, a snapshot removal should not freeze a VM until the snapshot is removed.
P.S. I'm not talking about snapshots taken weeks/months ago. I just took a snapshot, updated the OS and Jenkins and when I removed the snapshot the VM froze for ~ 1 - 1.5 minutes, until the snapshot was removed.

Any advice is welcome.

Thank you!
 

Dunuin

Famous Member
Jun 30, 2020
6,811
1,585
149
Germany
You didnt told us what storage you are using and if you include a RAM dump or not. There is no univeral snapshoting. PVE uses the snapshot features of the underlaying storage. So depending on if you are using LVM, qcow2, ZFS, ceph and so on you got completely different working snapshots with different features and limitations.
 
Jul 18, 2019
18
0
6
44
Hello,

I have what it looks to be the exact same issue.

Usually when we create a snapshot before doing some updates in a VM and then remove the snapshot if everything went well, it took a few seconds to do so.

Today I did the same routine again but when deleting the snapshot, the VM froze for 2 minutes until the snapshot was finally removed.

I found this strange so I did a test on another VM and the issue was even worse. The VM froze until the snapshot delete command timed out.
As the snapshot delete command timed out the snapshot was still listed and the VM was in a locked state.

I was able to unlock it but then when trying to delete the snapshot again, it says an error occured and the VM went locked again.
Trying to remove the snapshot from command line, qemu returns that the doesn't exists in the related qcow2 file.

But in the <vmid>.conf there was still an entry for the snapshot, so I cleaned it manually and everything went back to normal.

It looks like there is something wrong with snapshots in latest releases.

The storage is a NFS share on a fast Huawei SAN and we never had such issue with snapshots before.

Sample VM config:

Code:
agent: 1
bios: ovmf
boot: order=scsi0
cores: 12
cpu: SandyBridge
efidisk0: Huawei_NFS:121/vm-121-disk-0.qcow2,size=128K
machine: q35
memory: 12228
name: es-node03
net0: virtio=5E:32:B0:13:AB:50,bridge=vmbr0,tag=10
numa: 1
ostype: l26
protection: 1
scsi0: Huawei_NFS:121/vm-121-disk-1.qcow2,discard=on,iothread=1,size=40G,ssd=1
scsi1: Huawei_NFS:121/vm-121-disk-2.qcow2,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=5c3d90f9-90cf-4761-9c7e-f3aa7d1ae38a
sockets: 1
tablet: 0
vga: qxl
vmgenid: 44b49263-be54-46d4-b5ef-953f8f80b3f7


Quick and dirty of write test on the storage:

Code:
root@pm04:/mnt/pve/Huawei_NFS# dd if=/dev/zero of=test bs=4k oflag=direct count=1000
1000+0 records in
1000+0 records out
4096000 bytes (4.1 MB, 3.9 MiB) copied, 0.244436 s, 16.8 MB/s

Code:
root@pm04:/mnt/pve/Huawei_NFS# dd if=/dev/zero of=test bs=1024k oflag=direct count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 2.03062 s, 516 MB/s

Code:
pve-manager 7.2-5
Linux pm04 5.15.35-3-pve #1 SMP PVE 5.15.35-6 (Fri, 17 Jun 2022 13:42:35 +0200) x86_64 GNU/Linux

Any ideas ? :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!