Snapshot Crash VMs

bryambalan

Member
Jul 25, 2020
15
0
21
27
Scenario:

2x Clustered Servers, with StorageZFS replicating between them.

Problem:
When performing the Snapshot of VM 108, several VMs crashed, and I had to reset to go back.


Error log:
Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!


Logs at the moment:
un 19 09:18:42 pve02 pvedaemon[8291]: starting 1 worker(s)
Jun 19 09:18:42 pve02 pvedaemon[8291]: worker 1409810 started
Jun 19 09:21:26 pve02 pvedaemon[387952]: <root@pam> successful auth for user 'root@pam'
Jun 19 09:22:18 pve02 pvedaemon[1269805]: <root@pam> starting task UPID:pve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam:
Jun 19 09:22:18 pve02 pvedaemon[1480725]: starting vnc proxy UPID:pve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam:
Jun 19 09:24:12 pve02 pvedaemon[1269805]: <root@pam> end task UPID:pve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam: OK
Jun 19 09:24:14 pve02 pvedaemon[1515422]: starting vnc proxy UPID:pve02:00171F9E:3C8FCDE0:649048EE:vncproxy:108:root@pam:
Jun 19 09:24:14 pve02 pvedaemon[1269805]: <root@pam> starting task UPID:pve02:00171F9E:3C8FCDE0:649048EE:vncproxy:108:root@pam:
Jun 19 09:24:16 pve02 pveproxy[3771496]: worker exit
Jun 19 09:24:16 pve02 pveproxy[8300]: worker 3771496 finished
Jun 19 09:24:16 pve02 pveproxy[8300]: starting 1 worker(s)
Jun 19 09:24:16 pve02 pveproxy[8300]: worker 1515672 started
Jun 19 09:24:21 pve02 pvedaemon[1517070]: <root@pam> snapshot VM 108: snapshotmade4it
Jun 19 09:24:21 pve02 pvedaemon[387952]: <root@pam> starting task UPID:pve02:0017260E:3C8FD0B9:649048F5:qmsnapshot:108:root@pam:
Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!
Jun 19 09:25:22 pve02 pveproxy[3847103]: worker exit
Jun 19 09:25:22 pve02 pveproxy[8300]: worker 3847103 finished
Jun 19 09:25:22 pve02 pveproxy[8300]: starting 1 worker(s)
Jun 19 09:25:22 pve02 pveproxy[8300]: worker 1621163 started
Jun 19 09:27:07 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:12 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:17 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:17 pve02 pvestatd[8238]: status update time (18.683 seconds)
Jun 19 09:27:28 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - unable to connect to VM 115 qmp socket - timeout after 51 retries
Jun 19 09:27:34 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 51 retries
Jun 19 09:27:39 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 51 retries
Jun 19 09:27:39 pve02 pvestatd[8238]: status update time (21.753 seconds)
Jun 19 09:27:48 pve02 pvedaemon[1269805]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Jun 19 09:27:53 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 51 retries
Jun 19 09:27:58 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 51 retries
Jun 19 09:28:03 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - unable to connect to VM 115 qmp socket - timeout after 51 retries
Jun 19 09:28:04 pve02 pvestatd[8238]: status update time (24.777 seconds)

Is it related to running a SnapShot on a VM in production, affecting all VMs that were in the same Storage-ZFS?
 
Do you have anything installed that automatically mounts devices?
This sounds like something is actually mounting your ZVols, including snapshots:
Code:
 Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!

udisks2 is such a daemon that likes to interfere with normal usage, but there may be others.
 
Do you have anything installed that automatically mounts devices?
This sounds like something is actually mounting your ZVols, including snapshots:
Code:
 Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!

udisks2 is such a daemon that likes to interfere with normal usage, but there may be others.
Negative, all configurations were made only in the Proxmox Front-End.


From what I understand.

We cannot SnapShot a VM that is being replicated via ZFS.

Because the ZFS replication already creates a SnapShot, but it is not visible in the front end, it is only used to manage the replication.

And when we create a snapshot via the front end, "apparently" it causes a conflict and affects Storage-ZFS a lot.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!