Some issues with snapshots on PVE 7.2

lodex · Jul 12, 2022

The other day, I had to do some updates on several VMs across our network and wanted to take some snapshots including RAM beforehand. The machines are running on different PVE clusters, all clusters consisting of three nodes each with PVE 7.2-4 and hyperconverged ceph 16.2.9 underneath. All VMs have 32GB of RAM. When snapshot tasks of the machines with a lot of RAM utilization didn't complete after some time, I took a look at the logs and noticed that they were still trying to save their RAM with little to no progress over several minutes. A look at the ceph logs revealed that after some time during this process, IOPS went up to about 3000 while write speed dropped to ~3MB/s. When I stopped the snapshot tasks, the VMs crashed, leaving them stopped and locked.
Although the snapshot processes didn't finish they would still be listed, but trying to remove them through the GUI wouldn't work. The only way to remove the faulty snapshots was through the CLI with qm delsnapshot <vmid> <snapshotname> --force.

So, I'm facing two major issues at the moment:

1. Stopping a snapshot process under PVE 7.2-4 ends up crashing and locking the VM
2. while snapshotting RAM of a VM, ceph starts to underperform heavily after some time

I did some testing with another PVE cluster in our network, still running PVE 6.4 and ceph 14.2.22 and didn't encounter any of those issues. Furthermore, I was unable to replicate issue #2 with any other storage backend than ceph.

mira · Aug 2, 2022

For the first issue, my colleague already sent a patch: https://lists.proxmox.com/pipermail/pve-devel/2022-July/053623.html

lodex · Aug 9, 2022

Thanks for the info. Is there any way to check if it was added to the enterprise repos?

mira · Aug 9, 2022

The patch hasn't been accepted yet. Once it is accepted, a new package has to be built containing the fix.
This will then be distributed to the internal staging repository where we're testing it here.
Once we deem it stable it is released to the `pvetest` repository, then to `pve-no-subscription` and if no issues arise during that time, it will be moved to `pve-enterprise`.

So assume at least 2-3 weeks after it was accepted, if not more, before it is released on `pve-enterprise`.

Search

Search

Some issues with snapshots on PVE 7.2

lodex

Member

mira

Proxmox Staff Member

lodex

Member

mira

Proxmox Staff Member

We value your privacy