I have come across numerous similar posts but none that have helped me to resolve my current dilemma. Thanks in advance - I deeply appreciate any help from this excellent community.
What did I do: Attempt Proxmox (bare metal) GPU passthrough to Windows 10 QEMU VM. I followed a few different guides:
What was the exact result: Unable to boot the guest. Also, my system load jumps to ~100% seemingly attempting to boot Windows 10 until, eventually, the PC restarts.
What did I expect to happen: Boot Windows 10, configure the GPU from within the guest system.
Hardware specifics:
In the BIOS:
What did I do: Attempt Proxmox (bare metal) GPU passthrough to Windows 10 QEMU VM. I followed a few different guides:
- this guide
- more recent guide
- also this one
- I also verified various steps against the official wiki 1 and 2
What was the exact result: Unable to boot the guest. Also, my system load jumps to ~100% seemingly attempting to boot Windows 10 until, eventually, the PC restarts.
What did I expect to happen: Boot Windows 10, configure the GPU from within the guest system.
Hardware specifics:
- Motherboard: ASUS ROG Strix B550-F
- CPU: AMD Ryzen 5600x
- GPU: MSI GEFORCE GTX 1080 ARMOR 8G OC
- SSD: SAMSUNG 980 PRO 500GB PCIe NVMe Gen4 SSD M.2 (MZ-V8P500B)
- RAM: Crucial 32GB Kit (2x16GB) DDR4 2666 MHz CL19 CT2K16G4DFRA266
- PSU: Seasonic FOCUS GX-650, 650W 80+ Gold, SSR-650FX
- Host: Proxmox v6.4-8 (additional versions below)
- Guest: Windows 10 Pro
In the BIOS:
- IOMMU was set to Auto by default but I changed it to Enabled
- Fast Boot is Disabled
- CSM is described as "Launch CSM" and is set to Enabled
- Boot Device Control was UEFI and Legacy OPROM but I recently changed that to UEFI only (I'm not sure if this matters meaning the original bare metal Proxmox install somehow fell into Legacy Boot based)
- The only options I see for Secure Boot are OS Type "Other OS" and "Windows UEFI mode" and Key Management, which I will refrain from typing or snapping a photo out unless someone specifically asks for it
Code:
root@pve:~# tail -50 /var/log/syslog
Jun 12 16:39:00 pve pvestatd[1195]: status update time (6.174 seconds)
Jun 12 16:39:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 12 16:39:07 pve systemd[1]: pvesr.service: Succeeded.
Jun 12 16:39:07 pve systemd[1]: Started Proxmox VE replication runner.
Jun 12 16:39:09 pve pvedaemon[1223]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:10 pve pvestatd[1195]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:10 pve pvestatd[1195]: status update time (6.362 seconds)
Jun 12 16:39:15 pve pvedaemon[2590]: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=e8de33c1-da40-40a5-9950-a1ad71ba3445' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=13d5191c-b441-49d5-810e-102d2538b91e' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1f2c5c3ff70' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=16:8A:C1:28:3E:1A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Jun 12 16:39:15 pve pvedaemon[1221]: <root@pam> end task UPID:amdatacenter:00000A1E:00009303:60C54583:qmstart:102:root@pam: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=e8de33c1-da40-40a5-9950-a1ad71ba3445' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=13d5191c-b441-49d5-810e-102d2538b91e' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1f2c5c3ff70' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=16:8A:C1:28:3E:1A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Jun 12 16:39:20 pve pvestatd[1195]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:21 pve pvestatd[1195]: status update time (7.263 seconds)
Code:
root@pve:~# pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.11.21-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.11: 7.0-2~bpo10
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.11.21-1-pve: 5.11.21-1~bpo10
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.9-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
Code:
root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
root@pve:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1
root@pve:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
root@pve:~# cat /etc/modprobe.d/blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE
# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
root@pve:~# lspci -n -s 07:00
07:00.0 0300: 10de:1b80 (rev a1)
07:00.1 0403: 10de:10f0 (rev a1)
root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1b80,10de:10f0 disable_vga=1
root@pve:~# cat /etc/pve/qemu-server/102.conf
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 12
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-102-disk-1,size=4M
hostpci0: 0000:07:00,pcie=1,x-vga=on
ide2: local:iso/Windows10.iso,media=cdrom
machine: pc-q35-5.2
memory: 32768
name: Win10
net0: e1000=4A:21:7E:B7:C8:CA,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-102-disk-0,cache=writeback,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6
sockets: 1
vmgenid: f4088c04-17d2-462b-97d6-fdd6c268c6ac
Code:
root@amdatacenter:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.11.21-1-pve
Found initrd image: /boot/initrd.img-5.11.21-1-pve
Found linux image: /boot/vmlinuz-5.4.119-1-pve
Found initrd image: /boot/initrd.img-5.4.119-1-pve
Found linux image: /boot/vmlinuz-5.4.106-1-pve
Found initrd image: /boot/initrd.img-5.4.106-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
Adding boot menu entry for EFI firmware configuration
done
root@amdatacenter:~# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.11.21-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.119-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.106-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Code:
root@pve:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.21-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off
Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[ 0.954451] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.957843] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.958577] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
root@pve:~# dmesg | grep 'remapping'
[ 0.957846] AMD-Vi: Interrupt remapping enabled
Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.991152] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294a5a):
[ 0.991910] AMD-Vi: Interrupt remapping enabled
[ 0.992002] AMD-Vi: Lazy IO/TLB flushing enabled
[ 0.992367] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
root@pve:~# dmesg | grep -i vfio
[ 5.250670] VFIO - User Level meta-driver version: 0.3
[ 5.254631] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 5.277813] vfio_pci: add [10de:1b80[ffffffff:ffffffff]] class 0x000000/00000000
[ 5.297921] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000
root@pve:~# lspci -nn | grep NVID
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
07:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep 07:00
/sys/kernel/iommu_groups/23/devices/0000:07:00.1
/sys/kernel/iommu_groups/22/devices/0000:07:00.0