[SOLVED] Windows 11 VM stop responding but still running when using nested virtualization

h0t1ce

New Member
Apr 10, 2021
3
2
3
Hi,

I seem to have an issue where my Win11 VM hard stalls somewhat randomly but only when using nested virtualization (svm=on). I use nested virtualization for Docker-Desktop and WSL and it works well as long as the VM doesn't stall. If I turn off svm, the VM is stable (no crashes after multiple days) but I lose access to Docker-Desktop and WSL.

The stalls can happen after a few minutes or sometimes hours of running, At that point if I was connected via RDP, the connection closes, web gui shows the VM as running with CPU usage stuck above 100% constantly but the QEMU agent is not longer working. The noVNC console shows a static image of the lock screen and the VM doesn't respond to pings on it's usual static ip. The only way I can get it working again is doing a "stop" as it won't respond to an acpi shutdown.

In the syslog there is not much to work with. No errors on startup but when VM stalls I get one or multiple lines similar to this:
Aug 22 07:27:14 dolos pvedaemon[335364]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout

proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.11.22-2-pve: 5.11.22-4
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.8-1
proxmox-backup-file-restore: 2.0.8-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

acpi: 1
agent: 1
args: -cpu 'host,hypervisor=off,svm=on,kvm=off'
balloon: 8192
bios: ovmf
boot: order=scsi0
cores: 8
description: Win11 Dev
efidisk0: slowpool2:vm-106-disk-0,size=4M
hotplug: 0
kvm: 1
machine: pc-q35-6.0
memory: 16384
name: Win11PC
net0: virtio=<mac addr>,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-106-disk-1,discard=on,iothread=1,size=120G,ssd=1
scsihw: virtio-scsi-single
sockets: 1
tablet: 1
smbios1: uuid=<long uuid>
vmgenid: <long guid>

root 392198 84.5 10.8 18018804 5334788 ? SLl 09:47 0:14 /usr/bin/kvm -id 106 -name Win11PC -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/106.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/106.pid -daemonize -smbios type=1,uuid=<long uuid> -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd -drive if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/slowpool2/vm-106-disk-0 -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/106.vnc,password=on -no-hpet -cpu kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 16384 -object iothread,id=iothread-virtioscsi0 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device vmgenid,guid=<long guid> -device usb-tablet,id=tablet,bus=ehci.0,port=1 -device VGA,id=vga,bus=pcie.0,addr=0x1 -chardev socket,path=/var/run/qemu-server/106.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi initiator-name=iqn.1993-08.org.debian:01:169ddee05d95 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/dev/pve/vm-106-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100 -netdev type=tap,id=net0,ifname=tap106i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=<mac addr>,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,base=localtime -machine type=pc-q35-6.0+pve0 -global kvm-pit.lost_tick_policy=discard -cpu host,hypervisor=off,svm=on,kvm=off

quiet amd_iommu=on iommu=pt video=vesa:eek:ff video=efifb:eek:ff nomodeset nofb

options kvm ignore_msrs=1 report_ignored_msrs=0
options kvm-amd npt=1 nested=1
 

Attachments

  • syslog.txt
    341.9 KB · Views: 6
Last edited:

Stefan_R

Proxmox Retired Staff
Retired Staff
Jun 4, 2019
1,300
278
88
Vienna
Hm, not saying it has to be related, but is there a reason you disable the hypervisor flag (args: -cpu 'host,hypervisor=off,svm=on,kvm=off')? This disables the entire Hyper-V enlightenment stack and can cause the VM to run a lot slower. I also wouldn't be surprised if it causes issues with nested virt. I suggest setting cpu: host instead.
 

h0t1ce

New Member
Apr 10, 2021
3
2
3
Hi Stefan,

I think the cpu flags were leftovers of a few tests I've done in the past and forgot to remove them. So now I only use -cpu 'host' and have been stable for more than 12 hours.

I'm not sure if the removal of the flags did the trick or if it's related to updates that were done on the Proxmox node or Windows guest since I've last tried nested-virtualization with this VM.

In any case, WSL2 and Docker are both running fine now without any crashes so you can mark this thread as resolved.

Thanks for your time!
 
  • Like
Reactions: Stefan_R

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!