Snapshot Crash VMs

bryambalan · Jun 19, 2023

Scenario:

2x Clustered Servers, with StorageZFS replicating between them.

Problem:
When performing the Snapshot of VM 108, several VMs crashed, and I had to reset to go back.

Error log:
Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!

Logs at the moment:
un 19 09:18:42 pve02 pvedaemon[8291]: starting 1 worker(s)
Jun 19 09:18:42 pve02 pvedaemon[8291]: worker 1409810 started
Jun 19 09:21:26 pve02 pvedaemon[387952]: <root@pam> successful auth for user 'root@pam'
Jun 19 09:22:18 pve02 pvedaemon[1269805]: <root@pam> starting task UPID

ve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam:
Jun 19 09:22:18 pve02 pvedaemon[1480725]: starting vnc proxy UPID

ve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam:
Jun 19 09:24:12 pve02 pvedaemon[1269805]: <root@pam> end task UPID

ve02:00169815:3C8FA0B7:6490487A:vncproxy:108:root@pam: OK
Jun 19 09:24:14 pve02 pvedaemon[1515422]: starting vnc proxy UPID

ve02:00171F9E:3C8FCDE0:649048EE:vncproxy:108:root@pam:
Jun 19 09:24:14 pve02 pvedaemon[1269805]: <root@pam> starting task UPID

ve02:00171F9E:3C8FCDE0:649048EE:vncproxy:108:root@pam:
Jun 19 09:24:16 pve02 pveproxy[3771496]: worker exit
Jun 19 09:24:16 pve02 pveproxy[8300]: worker 3771496 finished
Jun 19 09:24:16 pve02 pveproxy[8300]: starting 1 worker(s)
Jun 19 09:24:16 pve02 pveproxy[8300]: worker 1515672 started
Jun 19 09:24:21 pve02 pvedaemon[1517070]: <root@pam> snapshot VM 108: snapshotmade4it
Jun 19 09:24:21 pve02 pvedaemon[387952]: <root@pam> starting task UPID

ve02:0017260E:3C8FD0B9:649048F5:qmsnapshot:108:root@pam:
Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!
Jun 19 09:25:22 pve02 pveproxy[3847103]: worker exit
Jun 19 09:25:22 pve02 pveproxy[8300]: worker 3847103 finished
Jun 19 09:25:22 pve02 pveproxy[8300]: starting 1 worker(s)
Jun 19 09:25:22 pve02 pveproxy[8300]: worker 1621163 started
Jun 19 09:27:07 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:12 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:17 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - got timeout
Jun 19 09:27:17 pve02 pvestatd[8238]: status update time (18.683 seconds)
Jun 19 09:27:28 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - unable to connect to VM 115 qmp socket - timeout after 51 retries
Jun 19 09:27:34 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 51 retries
Jun 19 09:27:39 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 51 retries
Jun 19 09:27:39 pve02 pvestatd[8238]: status update time (21.753 seconds)
Jun 19 09:27:48 pve02 pvedaemon[1269805]: VM 108 qmp command failed - VM 108 qmp command 'query-proxmox-support' failed - unable to connect to VM 108 qmp socket - timeout after 51 retries
Jun 19 09:27:53 pve02 pvestatd[8238]: VM 106 qmp command failed - VM 106 qmp command 'query-proxmox-support' failed - unable to connect to VM 106 qmp socket - timeout after 51 retries
Jun 19 09:27:58 pve02 pvestatd[8238]: VM 104 qmp command failed - VM 104 qmp command 'query-proxmox-support' failed - unable to connect to VM 104 qmp socket - timeout after 51 retries
Jun 19 09:28:03 pve02 pvestatd[8238]: VM 115 qmp command failed - VM 115 qmp command 'query-proxmox-support' failed - unable to connect to VM 115 qmp socket - timeout after 51 retries
Jun 19 09:28:04 pve02 pvestatd[8238]: status update time (24.777 seconds)

Is it related to running a SnapShot on a VM in production, affecting all VMs that were in the same Storage-ZFS?

mira · Jun 19, 2023

Do you have anything installed that automatically mounts devices?
This sounds like something is actually mounting your ZVols, including snapshots:

Code:

 Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!

udisks2 is such a daemon that likes to interfere with normal usage, but there may be others.

bryambalan · Jun 19, 2023

mira said:
Do you have anything installed that automatically mounts devices?
This sounds like something is actually mounting your ZVols, including snapshots:

Code:

Jun 19 09:24:25 pve02 kernel: [10160562.899184] debugfs: Directory 'zd112' with parent 'block' already present!

udisks2 is such a daemon that likes to interfere with normal usage, but there may be others.

Negative, all configurations were made only in the Proxmox Front-End.

From what I understand.

We cannot SnapShot a VM that is being replicated via ZFS.

Because the ZFS replication already creates a SnapShot, but it is not visible in the front end, it is only used to manage the replication.

And when we create a snapshot via the front end, "apparently" it causes a conflict and affects Storage-ZFS a lot.

Search

Search

Snapshot Crash VMs

bryambalan

Member

mira

Proxmox Staff Member

bryambalan

Member