qcow2 snapshot creation takes too long

ThomasK · Nov 5, 2021

We have a Proxmox 6.1-5 Server running with a VM which has 3 harddisks configured. The virtual disks are all qcow2 files hosted on the local filesystem. The VM performance is ok even under heavy load.

My problem is, that it takes ages to create a snapshot of this VM even if theVM is powered off. The " /usr/bin/qemu-img snapshot -c ..." process is running for minutes (> 10min) for every qcow2 file of the VM. One of the qcow2 files is ~1.5TB in size and the snapshot is not finished after 15min. Once again: the VM is powered down...
While creating the snapshot I see heavy read and write activity on the local disks (iostat...).

It is not the first time I see these extremly long snapshot creation times. This makes snapshotting almost useless. Is this expected behaviour? How could I improve snapshot creation performance?

Regards
Thomas

Pifouney · Nov 7, 2021

Hi,

need some informations....

Witch Host filesystem are you using ?

How is declared the VM_disk storage?

How many snapshot (not backup as snapshot, snapshots only) did you have?

Bests regards,

ThomasK · Nov 7, 2021

Hi,

thanx for answering.
The host filesystems is ZFS (for 2 of the qcow2 disks) and ext4 (1.5TB qcow2). The 1.5TB qcow eventually finished the snapshot but it took ~20min. The 2 smaller qcow2 disks on the local ZFS Store took ~3min each.

There were no other snapshots on this VM.

The VM stores are of type "directory" (2 different mountpoints).

Thanx and regards
Thomas

ThomasK · Nov 10, 2021

Some more details:
This also happens on Proxmox 6.4-13 and the problem is even worse: not only the creation of snapshot takes too long but also deleting snapshots is taking minutes and it locks the VM completely (VM can not perform IO to the disks) and thus eventually crashes.

The log looks like this:

Nov 10 09:07:21 auriga pvedaemon[3996]: <root@pam> delete snapshot VM 103: update
Nov 10 09:07:21 auriga pvedaemon[1352]: <root@pam> starting task UPID:auriga:00000F9C:0ECD885A:618B7DB9:qmdelsnapshot:103:root@pam:
Nov 10 09:07:36 auriga pvedaemon[65109]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries
Nov 10 09:07:38 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries
Nov 10 09:07:39 auriga pvestatd[1659]: status update time (6.397 seconds)
Nov 10 09:07:48 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries
Nov 10 09:07:48 auriga pvestatd[1659]: status update time (6.287 seconds)
Nov 10 09:07:55 auriga pvedaemon[65109]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries
Nov 10 09:07:58 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries
Nov 10 09:07:58 auriga pvestatd[1659]: status update time (6.252 seconds)
...

Interesting: qemu-img info said there are no snapshots right after issuing the "delete snapshot" operation thru the GUI. However: there was quite heavy IO on the store going on for minutes where the qcow2 files are placed.

I had to stop the VM (thru GUI), manually delete the snapshot entry for the VM config. Then I could power up the VM again and everything seems to be running fine now...

This does not happen for all VMs though! It seems to affect only "larger" (with large qcow2 images) VMs with heavy IO... How could I debug this issue? It is not to hard for me to trigger this problem...

Regards
Thomas

Dunuin · Nov 10, 2021

Shouldn't qcow2 ontop of ZFS be slow anyway? You are running CoW ontop of CoW with an additional filesystem in between. So overhead should be way more compared to a ZFS using zvols with "raw".

ThomasK · Nov 10, 2021

Hi Dunuin,

you might be right, that the setup is non optimal from a performance point of view. However: performance of the VM is just fine - even under heavy IO load. It is only that VM snapshots are not really working anymore for me and this was never a problem before I started upgrading to some proxmox 6.xx version.

I wonder if this is related to the problem described in this thread: https://forum.proxmox.com/threads/qmp-command-query-proxmox-support-failed.85564/ (but there is no PBS involved in my setup)
The error "VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries" seems similar...

Regards
Thomas

Search

Search

qcow2 snapshot creation takes too long

ThomasK

Member

Pifouney

Member

ThomasK

Member

ThomasK

Member

Dunuin

Distinguished Member

ThomasK

Member