qcow2 snapshot creation takes too long

ThomasK

Member
May 2, 2016
5
0
21
50
We have a Proxmox 6.1-5 Server running with a VM which has 3 harddisks configured. The virtual disks are all qcow2 files hosted on the local filesystem. The VM performance is ok even under heavy load.

My problem is, that it takes ages to create a snapshot of this VM even if theVM is powered off. The " /usr/bin/qemu-img snapshot -c ..." process is running for minutes (> 10min) for every qcow2 file of the VM. One of the qcow2 files is ~1.5TB in size and the snapshot is not finished after 15min. Once again: the VM is powered down...
While creating the snapshot I see heavy read and write activity on the local disks (iostat...).

It is not the first time I see these extremly long snapshot creation times. This makes snapshotting almost useless. Is this expected behaviour? How could I improve snapshot creation performance?

Regards
Thomas
 

Pifouney

Member
Oct 17, 2021
106
6
18
33
Hi,

need some informations....

Witch Host filesystem are you using ?

How is declared the VM_disk storage?

How many snapshot (not backup as snapshot, snapshots only) did you have?

Bests regards,
 

ThomasK

Member
May 2, 2016
5
0
21
50
Hi,

thanx for answering.
The host filesystems is ZFS (for 2 of the qcow2 disks) and ext4 (1.5TB qcow2). The 1.5TB qcow eventually finished the snapshot but it took ~20min. The 2 smaller qcow2 disks on the local ZFS Store took ~3min each.

There were no other snapshots on this VM.

The VM stores are of type "directory" (2 different mountpoints).

Thanx and regards
Thomas
 

ThomasK

Member
May 2, 2016
5
0
21
50
Some more details:
This also happens on Proxmox 6.4-13 and the problem is even worse: not only the creation of snapshot takes too long but also deleting snapshots is taking minutes and it locks the VM completely (VM can not perform IO to the disks) and thus eventually crashes.

The log looks like this:

Nov 10 09:07:21 auriga pvedaemon[3996]: <root@pam> delete snapshot VM 103: update Nov 10 09:07:21 auriga pvedaemon[1352]: <root@pam> starting task UPID:auriga:00000F9C:0ECD885A:618B7DB9:qmdelsnapshot:103:root@pam: Nov 10 09:07:36 auriga pvedaemon[65109]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries Nov 10 09:07:38 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries Nov 10 09:07:39 auriga pvestatd[1659]: status update time (6.397 seconds) Nov 10 09:07:48 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries Nov 10 09:07:48 auriga pvestatd[1659]: status update time (6.287 seconds) Nov 10 09:07:55 auriga pvedaemon[65109]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries Nov 10 09:07:58 auriga pvestatd[1659]: VM 103 qmp command failed - VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries Nov 10 09:07:58 auriga pvestatd[1659]: status update time (6.252 seconds) ...

Interesting: qemu-img info said there are no snapshots right after issuing the "delete snapshot" operation thru the GUI. However: there was quite heavy IO on the store going on for minutes where the qcow2 files are placed.

I had to stop the VM (thru GUI), manually delete the snapshot entry for the VM config. Then I could power up the VM again and everything seems to be running fine now...

This does not happen for all VMs though! It seems to affect only "larger" (with large qcow2 images) VMs with heavy IO... How could I debug this issue? It is not to hard for me to trigger this problem...

Regards
Thomas
 

Dunuin

Famous Member
Jun 30, 2020
6,811
1,585
149
Germany
Shouldn't qcow2 ontop of ZFS be slow anyway? You are running CoW ontop of CoW with an additional filesystem in between. So overhead should be way more compared to a ZFS using zvols with "raw".
 

ThomasK

Member
May 2, 2016
5
0
21
50
Hi Dunuin,

you might be right, that the setup is non optimal from a performance point of view. However: performance of the VM is just fine - even under heavy IO load. It is only that VM snapshots are not really working anymore for me and this was never a problem before I started upgrading to some proxmox 6.xx version.

I wonder if this is related to the problem described in this thread: https://forum.proxmox.com/threads/qmp-command-query-proxmox-support-failed.85564/ (but there is no PBS involved in my setup)
The error "VM 103 qmp command 'query-proxmox-support' failed - unable to connect to VM 103 qmp socket - timeout after 31 retries" seems similar...

Regards
Thomas
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!