[SOLVED] Windows 11 VM w/ Passthrough Hangs on Reboot

Apr 3, 2022
123
51
33
A little backstory to start things off...

I recently decided to convert my existing windows workstation into a Proxmox host. I upped the RAM to 64GB, found a good deal on a used 10850K and added an additional NVME drive to run Proxmox from. I planned to used passthrough with my RTX 3060 Ti and the existing NVME drive to keep my "desktop experience", but move my 4 drive RAID10 over to ZFS on Proxmox and run things like Plex or Roon from LXC containers.

To be honest, it went a lot better than I expected. Everything just works for the most part. I had to get a couple USB controllers to handle all of my various DACs and peripherals. I've learned so much and I'm excited to start playing around technology like Kubernetes in my new "lab environment".

I've got one weird problem that I have been able to sort out. When I reboot Windows from inside the VM, it hangs when coming back up and I need to reboot Proxmox to get it running again. However, if I shutdown and then use my smartphone (or something else available) to restart the VM... no problems at all. I guess I'm looking for recommendations on how I should go about troubleshooting this.

Here's a bunch of stuff I dumped from my setup...

Code:
IOMMU Group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:9b33] (rev 05)
IOMMU Group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)
IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2489] (rev a1)
IOMMU Group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
IOMMU Group 2 00:02.0 Display controller [0380]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc5] (rev 05)
IOMMU Group 3 00:14.0 USB controller [0c03]: Intel Corporation Comet Lake USB 3.1 xHCI Host Controller [8086:06ed]
IOMMU Group 3 00:14.2 RAM memory [0500]: Intel Corporation Comet Lake PCH Shared SRAM [8086:06ef]
IOMMU Group 4 00:15.0 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH Serial IO I2C Controller #0 [8086:06e8]
IOMMU Group 4 00:15.1 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH Serial IO I2C Controller #1 [8086:06e9]
IOMMU Group 5 00:16.0 Communication controller [0780]: Intel Corporation Comet Lake HECI Controller [8086:06e0]
IOMMU Group 6 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:06d2]
IOMMU Group 7 00:1b.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #17 [8086:06c0] (rev f0)
IOMMU Group 8 00:1b.4 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #21 [8086:06ac] (rev f0)
IOMMU Group 9 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:06b8] (rev f0)
IOMMU Group 10 00:1c.4 PCI bridge [0604]: Intel Corporation Device [8086:06bc] (rev f0)
IOMMU Group 11 00:1c.5 PCI bridge [0604]: Intel Corporation Device [8086:06bd] (rev f0)
IOMMU Group 12 00:1c.7 PCI bridge [0604]: Intel Corporation Device [8086:06bf] (rev f0)
IOMMU Group 13 00:1d.0 PCI bridge [0604]: Intel Corporation Comet Lake PCI Express Root Port #9 [8086:06b0] (rev f0)
IOMMU Group 14 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:0685]
IOMMU Group 14 00:1f.4 SMBus [0c05]: Intel Corporation Comet Lake PCH SMBus Controller [8086:06a3]
IOMMU Group 14 00:1f.5 Serial bus controller [0c80]: Intel Corporation Comet Lake PCH SPI Controller [8086:06a4]
IOMMU Group 15 03:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 16 05:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 02)
IOMMU Group 17 06:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912:0014] (rev 03)
IOMMU Group 18 07:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller [1912:0014] (rev 03)
IOMMU Group 19 08:00.0 Non-Volatile memory controller [0108]: Phison Electronics Corporation E12 NVMe Controller [1987:5012] (rev 01)

Code:
root@pve:~# cat /proc/cmdline
initrd=\EFI\proxmox\5.13.19-6-pve\initrd.img-5.13.19-6-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt video=efifb:off

root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2489,10de:228b,1987:5012,1912:0014 disable_vga=1

root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Code:
root@pve:~# cat /etc/pve/qemu-server/100.conf
agent: 1
balloon: 0
bios: ovmf
boot: order=hostpci0
cores: 8
cpu: host
hostpci0: 0000:08:00,pcie=1
hostpci1: 0000:01:00,pcie=1,x-vga=1
hostpci2: 0000:06:00,pcie=1
hostpci3: 0000:07:00,pcie=1
machine: pc-q35-6.1
memory: 16384
meta: creation-qemu=6.1.1,ctime=1648601835
name: raven
net0: virtio=5E:42:11:7E:16:FA,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
scsihw: virtio-scsi-pci
smbios1: uuid=5d6ae88a-76b6-4ab0-8aac-9ae36dcb518c
sockets: 1
tpmstate0: local-zfs:vm-100-disk-0,size=4M,version=v2.0
vmgenid: f3f742fa-3520-452f-98d6-1c6c5fbb2fb5
 
Got a similar problem here where rebooting Win10 (for example when a restart is needed for installing a Win update). Somehow the VM get stuck and won't respond any longer. To fix it I need to hard stop that VM and start it again. So maybe that works for you too instead of rebooting the complete server.
 
  • Like
Reactions: nick.kopas
Got a similar problem here where rebooting Win10 (for example when a restart is needed for installing a Win update). Somehow the VM get stuck and won't respond any longer. To fix it I need to hard stop that VM and start it again. So maybe that works for you too instead of rebooting the complete server.
What's the process you use to hard stop the VM? Because if I just try a stop through the GUI, I get the following error:
TASK ERROR: VM quit/powerdown failed
 
What's the process you use to hard stop the VM? Because if I just try a stop through the GUI, I get the following error:
TASK ERROR: VM quit/powerdown failed
First you need to stop all running "shutdown", "start" and "reboot" tasks, by clicking the task in the task list at the bottom and then there hitting the "stop" button. Because as long as there is such a task running (which might get stuck when the VM, the QEMU guest agent or ACPI isn't responding any longer) all other tasks won't start. When there is no other task running you can start a "stop" or "reset" task using the webUI.
 
First you need to stop all running "shutdown", "start" and "reboot" tasks, by clicking the task in the task list at the bottom and then there hitting the "stop" button. Because as long as there is such a task running (which might get stuck when the VM, the QEMU guest agent or ACPI isn't responding any longer) all other tasks won't start. When there is no other task running you can start a "stop" or "reset" task using the webUI.
I'll give that a try next time... I'm all for "less annoying" alternatives. :)
 
So, I was finally able to resolve this. Ended up the problem was the I didn't have an EFI Disk provisioned. I was passing through an NVME drive with it's own EFI partition, so didn't think I needed one. I saw another thread someone mentioning that rebooting a VM wasn't working until they recreated the EFI Disk. I decided to give it a try and now I can soft reboot without the guest locking up!
 
Last edited: