Hi
We have a big problem with our proxmox cluster
We are having a lot of errors in syslog for
below is what we get for example in some of the log , it started with a deletion of qcow2 snapshot this time , tough we get it randomly on all updated Proxmox hosts
each updated hosts gets a randomly qmp failure then the VMS start crashing one by one they appear as turned on but they are stuck , some of them can be migrated to another host and then they keep working , though some of them needs a stop migrate then start on another Proxmox host.
it started happening since we have updated our version to 6.3.4+
this is our proxmox version:
btw , we are using proxmox backup server to backup all the VMS which are on NFS/ Qcow2
Thanks
We have a big problem with our proxmox cluster
We are having a lot of errors in syslog for
qmp command 'query-proxmox-support' failed - got timeout
below is what we get for example in some of the log , it started with a deletion of qcow2 snapshot this time , tough we get it randomly on all updated Proxmox hosts
each updated hosts gets a randomly qmp failure then the VMS start crashing one by one they appear as turned on but they are stuck , some of them can be migrated to another host and then they keep working , though some of them needs a stop migrate then start on another Proxmox host.
it started happening since we have updated our version to 6.3.4+
Mar 21 10:17:30 ilnode07 pvedaemon[83218]: <root@pam> starting task UPID:ilnode07:0000CDE5:04F7A306:60570F2A:qmdelsnapshot:1421:root@pam:
Mar 21 10:17:30 ilnode07 pvedaemon[52709]: <root@pam> delete snapshot VM 1421: test
Mar 21 10:17:36 ilnode07 pvedaemon[52036]: VM 1421 qmp command failed - VM 1421 qmp command 'query-proxmox-support' failed - got timeout
Mar 21 10:17:36 ilnode07 systemd[1]: Stopping User Manager for UID 0...
Mar 21 10:17:36 ilnode07 systemd[44126]: Stopped target Default.
Mar 21 10:17:36 ilnode07 systemd[44126]: Stopped target Basic System.
Mar 21 10:17:36 ilnode07 systemd[44126]: Stopped target Sockets.
Mar 21 10:17:36 ilnode07 systemd[44126]: gpg-agent.socket: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Closed GnuPG cryptographic agent and passphrase cache.
Mar 21 10:17:36 ilnode07 systemd[44126]: gpg-agent-browser.socket: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
Mar 21 10:17:36 ilnode07 systemd[44126]: dirmngr.socket: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Closed GnuPG network certificate management daemon.
Mar 21 10:17:36 ilnode07 systemd[44126]: Stopped target Timers.
Mar 21 10:17:36 ilnode07 systemd[44126]: Stopped target Paths.
Mar 21 10:17:36 ilnode07 systemd[44126]: gpg-agent-extra.socket: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
Mar 21 10:17:36 ilnode07 systemd[44126]: gpg-agent-ssh.socket: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Closed GnuPG cryptographic agent (ssh-agent emulation).
Mar 21 10:17:36 ilnode07 systemd[44126]: Reached target Shutdown.
Mar 21 10:17:36 ilnode07 systemd[44126]: systemd-exit.service: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[44126]: Started Exit the Session.
Mar 21 10:17:36 ilnode07 systemd[44126]: Reached target Exit the Session.
Mar 21 10:17:36 ilnode07 systemd[1]: user@0.service: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[1]: Stopped User Manager for UID 0.
Mar 21 10:17:36 ilnode07 systemd[1]: Stopping User Runtime Directory /run/user/0...
Mar 21 10:17:36 ilnode07 systemd[1]: run-user-0.mount: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[1]: user-runtime-dir@0.service: Succeeded.
Mar 21 10:17:36 ilnode07 systemd[1]: Stopped User Runtime Directory /run/user/0.
Mar 21 10:17:36 ilnode07 systemd[1]: Removed slice User Slice of UID 0.
Mar 21 10:17:43 ilnode07 pvestatd[2604]: VM 1421 qmp command failed - VM 1421 qmp command 'query-proxmox-support' failed - unable to connect to VM 1421 qmp socket - timeout after 31 retries
Mar 21 10:17:44 ilnode07 pvedaemon[83218]: <root@pam> end task UPID:ilnode07:0000CDE5:04F7A306:60570F2A:qmdelsnapshot:1421:root@pam: OK
Mar 21 10:17:46 ilnode07 pvestatd[2604]: VM 1529 qmp command failed - VM 1529 qmp command 'query-proxmox-support' failed - got timeout
Mar 21 10:17:46 ilnode07 pvestatd[2604]: status update time (9.486 seconds)
Mar 21 10:17:53 ilnode07 pvestatd[2604]: VM 1529 qmp command failed - VM 1529 qmp command 'query-proxmox-support' failed - unable to connect to VM 1529 qmp socket - timeout after 31 retries
this is our proxmox version:
pveversion
pve-manager/6.3-6/2184247e (running kernel: 5.4.103-1-pve)
root@ilnode07:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.103-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-3
pve-xtermjs: 4.7.0-3
pve-zsync: 2.0-4
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2
btw , we are using proxmox backup server to backup all the VMS which are on NFS/ Qcow2
Thanks