[SOLVED] Problems after upgrade from pve 6 to pve 7

mrbeanzg

Member
Sep 22, 2020
21
0
21
41
Hi all,

I have 4 node cluster connected to hpe 3par via 10gb nexus network switch. this 4 nodes are at version pve-manager/6.0-4.

I have added fifth server to same cluster with version pve-manager/7.1-4.

If i migrate high load windows vm to that fifth server, after some time ( random times , hour or similar ) the vm become unresponsive and i cant do nothing but stop vm, the vm wont stop.i have to stop that action and then the vm is in stop state so i can migrate back vm to some of the 4 older nodes.The migrated vm work without problems on pve 6.

I have 2 low load windows vm on that server with pve 7 that works perfectly.

in dmesg i dont see anything that can be wrong with this high load vm.

I need help what to do? what other logs can i review?

Thanks and best regards
 
This is the only error that is recorded when the vm become unresponsive.
Jan 10 20:00:34 prsvr5 pvedaemon[341452]: VM 115 qmp command failed - VM 115 qmp command 'guest-network-get-interfaces' failed - got timeout

What can i do about that?
 
Hi,
if the VM has a SATA controller, it's likely the issue described here. Try changing the Async IO setting (or controller to IDE) as a workaround or upgrade to kernel package pve-kernel-5.13.19-2-pve which solves (a big part of) the issue. Or kernel 5.15 to make really sure.
 
This thread that you include have the same errors as i have. How can i check the version and upgrade if needed?

Thanks
 
You can use pveversion -v to display the installed Proxmox-VE-related packages. Then run apt update to get the list of available updates and apt dist-upgrade to actually upgrade.

EDIT: After upgrading QEMU, you need to stop/start VM or migrate it away and back for it to pick up the new version.
 
Last edited:
This is what i have :

root@prsvr5:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-4 (running version: 7.1-4/ca457116)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-2
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3


So i need to upgrade?
 
If you upgraded the kernel, it's recommended, otherwise it shouldn't be necessary.
 
Hi all,Just to report that upgrade to pve-qemu-kvm: 6.1.0-3 works. high load windows vm dont crash anymore.

Best regards
 
Just in case this helps someone else. I had a similar issue today. I moved a bunch of VM's from 12G dell servers to newer 14G dell servers. All high load windows VM's would loose networking and require a reboot after a few minutes of load. After a day of banging my head against the wall it turned out that changing "Async IO" from "io_uring" to "Threads" fixed it. I can't explain it. Found another post where someone suggested it.