Hello all,
after latest upgrade of no-subscription 3-node cluster system, that contained only several pve packages (pve/ha manager and kernel among them), almost every migration of any virtual machine will result in successful migration (task finished fine) but in reality the (virtual) machine would be hanging and it's CPU would be on 100%
The only way to recover is to reset the virtual machine (qm reset <vmid> on the command line of the affected node). Please note that this system was running for over a year without ever experiencing something remotely similar to this issue. The direction of migration is also not important and this issue would appear regardless of source or destination pve host.
After the reset, affected VM would run without any issues and continue running stable for days.
The system is based on Ryzen 9 CPU's and there were no recent BIOS or microcode changes that can be tied to the problem. System runs under very low utilization anyway (it is a lab).
Any pointer on where to look for relevant information or how to rollback to previous versions of relevant packages would be appreciated.
Kind regards.
Sasha
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
after latest upgrade of no-subscription 3-node cluster system, that contained only several pve packages (pve/ha manager and kernel among them), almost every migration of any virtual machine will result in successful migration (task finished fine) but in reality the (virtual) machine would be hanging and it's CPU would be on 100%
The only way to recover is to reset the virtual machine (qm reset <vmid> on the command line of the affected node). Please note that this system was running for over a year without ever experiencing something remotely similar to this issue. The direction of migration is also not important and this issue would appear regardless of source or destination pve host.
After the reset, affected VM would run without any issues and continue running stable for days.
The system is based on Ryzen 9 CPU's and there were no recent BIOS or microcode changes that can be tied to the problem. System runs under very low utilization anyway (it is a lab).
Any pointer on where to look for relevant information or how to rollback to previous versions of relevant packages would be appreciated.
Kind regards.
Sasha
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1