GPU Passthrough Issues

InvaderGur

New Member
May 5, 2021
17
1
3
33
I have come across numerous similar posts but none that have helped me to resolve my current dilemma. Thanks in advance - I deeply appreciate any help from this excellent community.

What did I do: Attempt Proxmox (bare metal) GPU passthrough to Windows 10 QEMU VM. I followed a few different guides:
  1. this guide
  2. more recent guide
  3. also this one
  4. I also verified various steps against the official wiki 1 and 2

What was the exact result: Unable to boot the guest. Also, my system load jumps to ~100% seemingly attempting to boot Windows 10 until, eventually, the PC restarts.

What did I expect to happen: Boot Windows 10, configure the GPU from within the guest system.

Hardware specifics:

In the BIOS:
  • IOMMU was set to Auto by default but I changed it to Enabled
  • Fast Boot is Disabled
  • CSM is described as "Launch CSM" and is set to Enabled
  • Boot Device Control was UEFI and Legacy OPROM but I recently changed that to UEFI only (I'm not sure if this matters meaning the original bare metal Proxmox install somehow fell into Legacy Boot based)
  • The only options I see for Secure Boot are OS Type "Other OS" and "Windows UEFI mode" and Key Management, which I will refrain from typing or snapping a photo out unless someone specifically asks for it
Code:
root@pve:~# tail -50 /var/log/syslog
Jun 12 16:39:00 pve pvestatd[1195]: status update time (6.174 seconds)
Jun 12 16:39:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 12 16:39:07 pve systemd[1]: pvesr.service: Succeeded.
Jun 12 16:39:07 pve systemd[1]: Started Proxmox VE replication runner.
Jun 12 16:39:09 pve pvedaemon[1223]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:10 pve pvestatd[1195]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:10 pve pvestatd[1195]: status update time (6.362 seconds)
Jun 12 16:39:15 pve pvedaemon[2590]: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=e8de33c1-da40-40a5-9950-a1ad71ba3445' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=13d5191c-b441-49d5-810e-102d2538b91e' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1f2c5c3ff70' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=16:8A:C1:28:3E:1A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Jun 12 16:39:15 pve pvedaemon[1221]: <root@pam> end task UPID:amdatacenter:00000A1E:00009303:60C54583:qmstart:102:root@pam: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=e8de33c1-da40-40a5-9950-a1ad71ba3445' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=13d5191c-b441-49d5-810e-102d2538b91e' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:1f2c5c3ff70' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=16:8A:C1:28:3E:1A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=101' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout
Jun 12 16:39:20 pve pvestatd[1195]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 12 16:39:21 pve pvestatd[1195]: status update time (7.263 seconds)

Code:
root@pve:~# pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.11.21-1-pve)
pve-manager: 6.4-8 (running version: 6.4-8/185e14db)
pve-kernel-5.11: 7.0-2~bpo10
pve-kernel-5.4: 6.4-3
pve-kernel-helper: 6.4-3
pve-kernel-5.11.21-1-pve: 5.11.21-1~bpo10
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.9-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-6
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
Code:
root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

root@pve:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

root@pve:~# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1

root@pve:~# cat /etc/modprobe.d/blacklist.conf
# This file contains a list of modules which are not supported by Proxmox VE 

# nidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb

root@pve:~# lspci -n -s 07:00
07:00.0 0300: 10de:1b80 (rev a1)
07:00.1 0403: 10de:10f0 (rev a1)

root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1b80,10de:10f0 disable_vga=1

root@pve:~# cat /etc/pve/qemu-server/102.conf 
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 12
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-102-disk-1,size=4M
hostpci0: 0000:07:00,pcie=1,x-vga=on
ide2: local:iso/Windows10.iso,media=cdrom
machine: pc-q35-5.2
memory: 32768
name: Win10
net0: e1000=4A:21:7E:B7:C8:CA,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-102-disk-0,cache=writeback,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6
sockets: 1
vmgenid: f4088c04-17d2-462b-97d6-fdd6c268c6ac
Code:
root@amdatacenter:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.11.21-1-pve
Found initrd image: /boot/initrd.img-5.11.21-1-pve
Found linux image: /boot/vmlinuz-5.4.119-1-pve
Found initrd image: /boot/initrd.img-5.4.119-1-pve
Found linux image: /boot/vmlinuz-5.4.106-1-pve
Found initrd image: /boot/initrd.img-5.4.106-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
Adding boot menu entry for EFI firmware configuration
done
root@amdatacenter:~# update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.11.21-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.119-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
update-initramfs: Generating /boot/initrd.img-5.4.106-1-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Code:
root@pve:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.21-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off

Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[ 0.954451] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.957843] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.958577] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep 'remapping'
[ 0.957846] AMD-Vi: Interrupt remapping enabled

Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.991152] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.991906] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.991906] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294a5a):
[    0.991910] AMD-Vi: Interrupt remapping enabled
[    0.992002] AMD-Vi: Lazy IO/TLB flushing enabled
[    0.992367] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep -i vfio
[    5.250670] VFIO - User Level meta-driver version: 0.3
[    5.254631] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    5.277813] vfio_pci: add [10de:1b80[ffffffff:ffffffff]] class 0x000000/00000000
[    5.297921] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000

root@pve:~# lspci -nn | grep NVID
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
07:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep 07:00
/sys/kernel/iommu_groups/23/devices/0000:07:00.1
/sys/kernel/iommu_groups/22/devices/0000:07:00.0
 
What was the exact result: Unable to boot the guest. Also, my system load jumps to ~100% seemingly attempting to boot Windows 10 until, eventually, the PC restarts.
That's a bit vague - do you mean the host resets? How do you start the VM, via the GUI or the CLI? Are there any errors messages printed, if you do so via the latter ('qm start <vmid>')? Anything in the logs (keep 'dmesg -w', 'journalctl -f' open in a second terminal session, or look at '/var/log/kern.log' after it reboots)?

Config looks good on first glance, though I'd recommend trying without the 'pcie_acs_override' kernel command line argument first, as that can cause instability and security issues - there's a reason it's not the default. Then check your IOMMU groups again (find /sys/kernel/iommu_groups/ -type l), and please post the full output, as it's important to see if your GPU shares a group with anything else.

Then of course there's always hardware issues that can crop up - have you tried a different PCIe slot? Different GPU/PCIe device (often even an onboard USB controller or such might work)?
 
Thanks very much for the reply, Stefan.

My apologies on the vague information. Yes, I did mean the host resets. I have attempted to start the VM via the GUI and the CLI but the CLI is much more informative. With that said, I have followed your advice in reverse order. First, I removed the 'pcie_acs_override' term, then I ran 'qm start <vmid>' and have included outputs from that as well as 'dmesg -w' and 'journalctl -f' below:

Code:
root@pve:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.21-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:08:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:05.0
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/15/devices/0000:02:00.2
/sys/kernel/iommu_groups/15/devices/0000:02:00.0
/sys/kernel/iommu_groups/15/devices/0000:03:09.0
/sys/kernel/iommu_groups/15/devices/0000:06:00.0
/sys/kernel/iommu_groups/15/devices/0000:03:08.0
/sys/kernel/iommu_groups/15/devices/0000:02:00.1
/sys/kernel/iommu_groups/5/devices/0000:00:03.1
/sys/kernel/iommu_groups/13/devices/0000:00:18.3
/sys/kernel/iommu_groups/13/devices/0000:00:18.1
/sys/kernel/iommu_groups/13/devices/0000:00:18.6
/sys/kernel/iommu_groups/13/devices/0000:00:18.4
/sys/kernel/iommu_groups/13/devices/0000:00:18.2
/sys/kernel/iommu_groups/13/devices/0000:00:18.0
/sys/kernel/iommu_groups/13/devices/0000:00:18.7
/sys/kernel/iommu_groups/13/devices/0000:00:18.5
/sys/kernel/iommu_groups/3/devices/0000:00:02.0
/sys/kernel/iommu_groups/21/devices/0000:09:00.4
/sys/kernel/iommu_groups/11/devices/0000:00:08.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.1
/sys/kernel/iommu_groups/18/devices/0000:09:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:07.0
/sys/kernel/iommu_groups/16/devices/0000:07:00.0
/sys/kernel/iommu_groups/16/devices/0000:07:00.1
/sys/kernel/iommu_groups/6/devices/0000:00:04.0
/sys/kernel/iommu_groups/14/devices/0000:01:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:03.0
/sys/kernel/iommu_groups/12/devices/0000:00:14.3
/sys/kernel/iommu_groups/12/devices/0000:00:14.0
/sys/kernel/iommu_groups/2/devices/0000:00:01.2
/sys/kernel/iommu_groups/20/devices/0000:09:00.3
/sys/kernel/iommu_groups/10/devices/0000:00:08.0
/sys/kernel/iommu_groups/0/devices/0000:00:01.0
/sys/kernel/iommu_groups/19/devices/0000:09:00.1
/sys/kernel/iommu_groups/9/devices/0000:00:07.1

Code:
root@pve:~# qm start 102
start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=f4088c04-17d2-462b-97d6-fdd6c268c6ac' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9c6ab4b52fd6' -drive 'file=/var/lib/vz/template/iso/Windows10.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4A:21:7E:B7:C8:CA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard' -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'' failed: got timeout

I have not yet swapped between different PCIe slots but can do that this evening. Also, I am unsure what the physical configuration would look like plugging a PCIe GPU into an onboard USB controller?
 
Splitting into multiple messages as the original was too long.

Without experience reading dmesg output, here is the output that I believe came after the 'qm start 102' command. Please let me know if additional information is required.
Code:
root@pve:~# dmesg -w
[ 1483.861852] device tap102i0 entered promiscuous mode
[ 1483.875080] fwbr102i0: port 1(fwln102i0) entered blocking state
[ 1483.875082] fwbr102i0: port 1(fwln102i0) entered disabled state
[ 1483.875121] device fwln102i0 entered promiscuous mode
[ 1483.875143] fwbr102i0: port 1(fwln102i0) entered blocking state
[ 1483.875144] fwbr102i0: port 1(fwln102i0) entered forwarding state
[ 1483.876980] vmbr0: port 4(fwpr102p0) entered blocking state
[ 1483.876981] vmbr0: port 4(fwpr102p0) entered disabled state
[ 1483.877007] device fwpr102p0 entered promiscuous mode
[ 1483.877019] vmbr0: port 4(fwpr102p0) entered blocking state
[ 1483.877020] vmbr0: port 4(fwpr102p0) entered forwarding state
[ 1483.878761] fwbr102i0: port 2(tap102i0) entered blocking state
[ 1483.878762] fwbr102i0: port 2(tap102i0) entered disabled state
[ 1483.878803] fwbr102i0: port 2(tap102i0) entered blocking state
[ 1483.878804] fwbr102i0: port 2(tap102i0) entered forwarding state
 
Last edited:
IOMMU group 16 looks fine. Is this the only GPU in the system, and also shows the BIOS boot and Proxmox console screen? If it is, you might want to search for single GPU passthrough, which tends to be more tricky to get working.
 
Splitting into multiple messages as the original was too long.

Code:
root@pve:~# journalctl -f
-- Logs begin at Wed 2021-06-16 08:55:57 PDT. --
Jun 16 09:18:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 16 09:18:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 16 09:18:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 16 09:19:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 16 09:19:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 16 09:19:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 16 09:19:12 pve sshd[5854]: Accepted publickey for root from 192.168.1.123 port 55063 ssh2: RSA SHA256:JJOZKFlAp1xlffa7ZA/zRllQ997ezwapySZl8PONTto
Jun 16 09:19:12 pve sshd[5854]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 16 09:19:12 pve systemd-logind[1066]: New session 4 of user root.
Jun 16 09:19:12 pve systemd[1]: Started Session 4 of user root.
Jun 16 09:20:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 16 09:20:01 pve systemd[1]: pvesr.service: Succeeded.
Jun 16 09:20:01 pve systemd[1]: Started Proxmox VE replication runner.
Jun 16 09:20:26 pve sshd[6091]: Accepted publickey for root from 192.168.1.123 port 55152 ssh2: RSA SHA256:JJOZKFlAp1xlffa7ZA/zRllQ997ezwapySZl8PONTto
Jun 16 09:20:26 pve sshd[6091]: pam_unix(sshd:session): session opened for user root by (uid=0)
Jun 16 09:20:26 pve systemd-logind[1066]: New session 5 of user root.
Jun 16 09:20:26 pve systemd[1]: Started Session 5 of user root.
Jun 16 09:20:36 pve qm[6127]: <root@pam> starting task UPID:pve:000017F0:0002437A:60CA24D4:qmstart:102:root@pam:
Jun 16 09:20:36 pve qm[6128]: start VM 102: UPID:pve:000017F0:0002437A:60CA24D4:qmstart:102:root@pam:
Jun 16 09:20:36 pve systemd[1]: Started 102.scope.
Jun 16 09:20:36 pve systemd-udevd[6145]: Using default interface naming scheme 'v240'.
Jun 16 09:20:36 pve systemd-udevd[6145]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 16 09:20:36 pve systemd-udevd[6145]: Could not generate persistent MAC address for tap102i0: No such file or directory
Jun 16 09:20:36 pve kernel: device tap102i0 entered promiscuous mode
Jun 16 09:20:36 pve systemd-udevd[6145]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 16 09:20:36 pve systemd-udevd[6145]: Could not generate persistent MAC address for fwbr102i0: No such file or directory
Jun 16 09:20:36 pve systemd-udevd[6144]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 16 09:20:36 pve systemd-udevd[6144]: Using default interface naming scheme 'v240'.
Jun 16 09:20:36 pve systemd-udevd[6144]: Could not generate persistent MAC address for fwpr102p0: No such file or directory
Jun 16 09:20:36 pve systemd-udevd[6164]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Jun 16 09:20:36 pve systemd-udevd[6164]: Using default interface naming scheme 'v240'.
Jun 16 09:20:36 pve systemd-udevd[6164]: Could not generate persistent MAC address for fwln102i0: No such file or directory
Jun 16 09:20:36 pve kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Jun 16 09:20:36 pve kernel: device fwln102i0 entered promiscuous mode
Jun 16 09:20:36 pve kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 1(fwln102i0) entered forwarding state
Jun 16 09:20:36 pve kernel: vmbr0: port 4(fwpr102p0) entered blocking state
Jun 16 09:20:36 pve kernel: vmbr0: port 4(fwpr102p0) entered disabled state
Jun 16 09:20:36 pve kernel: device fwpr102p0 entered promiscuous mode
Jun 16 09:20:36 pve kernel: vmbr0: port 4(fwpr102p0) entered blocking state
Jun 16 09:20:36 pve kernel: vmbr0: port 4(fwpr102p0) entered forwarding state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 2(tap102i0) entered disabled state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 2(tap102i0) entered blocking state
Jun 16 09:20:36 pve kernel: fwbr102i0: port 2(tap102i0) entered forwarding state
Jun 16 09:20:48 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - got timeout
Jun 16 09:20:49 pve pvestatd[1384]: status update time (6.661 seconds)
Jun 16 09:20:58 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 16 09:20:58 pve pvestatd[1384]: status update time (6.238 seconds)
Jun 16 09:21:00 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 16 09:21:03 pve systemd[1]: pvesr.service: Succeeded.
Jun 16 09:21:03 pve systemd[1]: Started Proxmox VE replication runner.
Jun 16 09:21:08 pve qm[6128]: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=f4088c04-17d2-462b-97d6-fdd6c268c6ac' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9c6ab4b52fd6' -drive 'file=/var/lib/vz/template/iso/Windows10.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4A:21:7E:B7:C8:CA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard' -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'' failed: got timeout
Jun 16 09:21:08 pve qm[6127]: <root@pam> end task UPID:pve:000017F0:0002437A:60CA24D4:qmstart:102:root@pam: start failed: command '/usr/bin/kvm -id 102 -name Win10 -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-102-disk-1' -smp '12,sockets=1,cores=12,maxcpus=12' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid' -m 32768 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=f4088c04-17d2-462b-97d6-fdd6c268c6ac' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:07:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=0000:07:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:9c6ab4b52fd6' -drive 'file=/var/lib/vz/template/iso/Windows10.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/pve/vm-102-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4A:21:7E:B7:C8:CA,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-5.2+pve0' -global 'kvm-pit.lost_tick_policy=discard' -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'' failed: got timeout
Jun 16 09:21:08 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 16 09:21:13 pve pvestatd[1384]: status update time (10.826 seconds)
Jun 16 09:21:19 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 31 retries
Jun 16 09:21:20 pve pvestatd[1384]: status update time (7.436 seconds)
Jun 16 09:21:32 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 26 retries
Jun 16 09:21:37 pve pve-firewall[1385]: firewall update time (5.064 seconds)
Jun 16 09:21:41 pve pvestatd[1384]: status update time (17.814 seconds)
Jun 16 09:21:56 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 24 retries
Jun 16 09:21:59 pve pvestatd[1384]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Jun 16 09:22:02 pve systemd[1]: Starting Proxmox VE replication runner...
Jun 16 09:22:13 pve pve-firewall[1385]: firewall update time (10.382 seconds)
Jun 16 09:22:25 pve pve-firewall[1385]: firewall update time (10.199 seconds)
Jun 16 09:22:37 pve pve-firewall[1385]: firewall update time (12.548 seconds)
Jun 16 09:22:39 pve pvestatd[1384]: got timeout
Jun 16 09:22:45 pve pvestatd[1384]: status update time (63.588 seconds)
Jun 16 09:22:45 pve pve-firewall[1385]: firewall update time (6.588 seconds)
Jun 16 09:22:58 pve pve-firewall[1385]: firewall update time (9.417 seconds)
Jun 16 09:23:12 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 17 retries
Jun 16 09:23:19 pve pve-firewall[1385]: firewall update time (18.819 seconds)
Jun 16 09:23:30 pve pve-firewall[1385]: firewall update time (8.586 seconds)
Jun 16 09:23:39 pve pve-firewall[1385]: firewall update time (8.006 seconds)
Jun 16 09:23:52 pve pve-firewall[1385]: firewall update time (8.780 seconds)
Jun 16 09:24:01 pve pvestatd[1384]: got timeout
Jun 16 09:24:04 pve pve-firewall[1385]: firewall update time (12.163 seconds)
Jun 16 09:24:05 pve pvestatd[1384]: status update time (79.482 seconds)
Jun 16 09:24:35 pve pvestatd[1384]: VM 101 qmp command failed - VM 101 qmp command 'query-proxmox-support' failed - got timeout
Jun 16 09:24:40 pve pve-firewall[1385]: firewall update time (31.807 seconds)
Jun 16 09:24:44 pve pvestatd[1384]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 5 retries
Jun 16 09:24:49 pve pve-firewall[1385]: firewall update time (8.873 seconds)
Jun 16 09:25:08 pve pve-firewall[1385]: firewall update time (16.707 seconds)
Jun 16 09:25:19 pve pvestatd[1384]: got timeout
Jun 16 09:25:43 pve pvestatd[1384]: got timeout
Jun 16 09:25:44 pve pvestatd[1384]: unable to activate storage 'local' - directory '/var/lib/vz' does not exist or is unreachable
Jun 16 09:26:06 pve smartd[1054]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 232 to 216
Jun 16 09:26:08 pve smartd[1054]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 232 to 209
Jun 16 09:26:25 pve pve-firewall[1385]: firewall update time (75.775 seconds)
Jun 16 09:27:47 pve pve-ha-lrm[1426]: loop take too long (33 seconds)
Jun 16 09:28:19 pve pve-firewall[1385]: firewall update time (113.805 seconds)
Jun 16 09:28:37 pve pve-ha-crm[1417]: loop take too long (33 seconds)
Jun 16 09:28:39 pve pve-firewall[1385]: status update error: can't lock file '/var/lock/pvefw.lck' - got timeout
Jun 16 09:28:39 pve pve-firewall[1385]: firewall update time (19.004 seconds)
Jun 16 09:28:46 pve pve-ha-lrm[1426]: loop take too long (38 seconds)
 
IOMMU group 16 looks fine. Is this the only GPU in the system, and also shows the BIOS boot and Proxmox console screen? If it is, you might want to search for single GPU passthrough, which tends to be more tricky to get working.
Thanks for the feedback.

Yes it is the only GPU in the system.

I'm not sure what you mean by
and also shows the BIOS boot and Proxmox console screen?

By search for single GPU passthrough, I understand that you're suggesting I look up a step by step guide for this - thanks for the advice and I will work on that later today.
 
I think the problem is with your GPU, I can't see anywhere where you talk about GPU added to your motherboard. The problem comes from your GPU soldered to your motherboard. Add an Nvidia GPU to your servers and you'll see that everything will work fine. Look at my Datacenter with Nvidia GPUs added to K8S vm workers, everything is working fine.
 

Attachments

  • Cluster-K8S+GPU.png
    Cluster-K8S+GPU.png
    243.2 KB · Views: 34
  • GPU-NVIDIA-PCI-E-1.png
    GPU-NVIDIA-PCI-E-1.png
    170.1 KB · Views: 33
  • GPU-NVIDIA-VM-Worker.png
    GPU-NVIDIA-VM-Worker.png
    353.3 KB · Views: 32
  • Nvidia-GPU-Worker.png
    Nvidia-GPU-Worker.png
    157.8 KB · Views: 31
I think the problem is with your GPU, I can't see anywhere where you talk about GPU added to your motherboard. The problem comes from your GPU soldered to your motherboard. Add an Nvidia GPU to your servers and you'll see that everything will work fine. Look at my Datacenter with Nvidia GPUs added to K8S vm workers, everything is working fine.
Thanks very much for the input. I apologize if I was unclear on any details but am having trouble understanding what it is that you are suggesting.

I do not have a GPU soldered to my motherboard (ASUS ROG STRIX B550-F), my CPU does not have an integrated GPU (AMD Ryzen 5600x), and I am using an Nvidia GPU (MSI GEFORCE GTX 1080 ARMOR 8G OC) inserted in the first PCIe slot of my motherboard.

Based on your second screenshot "GPU-NVIDIA-PCI-E-1.png" I believe you are asking for information that was included in my first post from the 102.conf file. I have copied this below for reference and also attached a screen shot from the GUI in case that helps.
Code:
root@pve:~# cat /etc/pve/qemu-server/102.conf
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 12
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-102-disk-1,size=4M
hostpci0: 0000:07:00,pcie=1,x-vga=on
ide2: local:iso/Windows10.iso,media=cdrom
machine: pc-q35-5.2
memory: 32768
name: Win10
net0: e1000=4A:21:7E:B7:C8:CA,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-102-disk-0,cache=writeback,discard=on,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=1bc261f6-3745-484d-9d35-1c264fc4e3c6
sockets: 1
vmgenid: f4088c04-17d2-462b-97d6-fdd6c268c6ac


If this response has not addressed your suggestion, can you please clarify? Thanks in advance.
 

Attachments

  • Screen Shot 2021-06-16 at 11.45.36 AM.png
    Screen Shot 2021-06-16 at 11.45.36 AM.png
    143.4 KB · Views: 13
Hi,
I see that your GPU card is detected in PCI-E. and well configured. Have you tried installing a Gnu/linux distribution ( Debian or Ubuntu for example) with the Nvidia drivers? I know absolutely nothing about windows :D
Remember that you can only use one vm with GPU.

Thank's.
 
IOMMU group 16 looks fine. Is this the only GPU in the system, and also shows the BIOS boot and Proxmox console screen? If it is, you might want to search for single GPU passthrough, which tends to be more tricky to get working.
I agree with this, though it should be fine with nofb nomodeset video=vesafb:off set - that is, unless the BIOS itself corrupts the state already, in which case maybe a ROM file might help?

Also, in general, remove the line args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' from your config - it is not necessary, we do all that automatically. I don't think this causes your issues though.

I have not yet swapped between different PCIe slots but can do that this evening. Also, I am unsure what the physical configuration would look like plugging a PCIe GPU into an onboard USB controller?
No, I didn't mean "plug the GPU into USB", that won't work - I meant pass the USB controller through to the VM and see if it starts. Passthrough can theoretically handle any PCIe device, not just GPUs, and USB controllers are in general a lot simpler than a graphics card, so the chances are it works better. If it also doesn't work, you know that it isn't related to your GPU, but your mainboard/CPU/software/whatever else.
 
and also shows the BIOS boot and Proxmox console screen
I just now am understanding what this means. It does show the BIOS boot but does not make it to the Proxmox console screen. As long as I have the vfio.conf file in place, it stays on a screen that shows the following:
Code:
[ 1.179286 ] igc 0000:06:00.0 no suspend buffer for PTM
Reading all physical volumes. This may take a while...
Found. volume group "pve" using metadata type lvm2
9 logical volume(s) in volume group "pve" now active
/dev/mapper/pve-root: clean, 67420/6291456 files, 4558317/25165824 blocks

I agree with this, though it should be fine with nofb nomodeset video=vesafb:off set - that is, unless the BIOS itself corrupts the state already, in which case maybe a ROM file might help?
I have attempted defining a ROM file following three methods:
1. Dumping using the following commands:
Code:
cd /sys/bus/pci/devices/0000:07:00.0/
echo 1 > rom
cat rom > /usr/share/kvm/vbios.bin
echo 0 > rom
2. Downloading ROM file from here.
3. Patching both ROM files using this script.

Also, in general, remove the line args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' from your config - it is not necessary, we do all that automatically. I don't think this causes your issues though.
I removed that line, thank you for the input.

No, I didn't mean "plug the GPU into USB", that won't work - I meant pass the USB controller through to the VM and see if it starts. Passthrough can theoretically handle any PCIe device, not just GPUs, and USB controllers are in general a lot simpler than a graphics card, so the chances are it works better. If it also doesn't work, you know that it isn't related to your GPU, but your mainboard/CPU/software/whatever else.
Great troubleshooting step, I tested that I can indeed pass through my mouse and keyboard successfully.

What's the next step?

To chase down the single GPU issue, I plugged a spare HD5450 into my second PCIe slot but am having trouble configuring my motherboard to boot from it. So, I more recently swapped the HD5450 into the first PCIe slot and have my 1080 in the second PCIe slot. This, among the various input that you have provided above, did not resolve my guest boot problem.

It may be noteworthy that I received a different initial response during one of these configurations with a second GPU. That response was this: kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
 
Last edited:
It may be noteworthy that I received a different initial response during one of these configurations with a second GPU. That response was this: kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
Disable the 'pcid' flag in your CPU config (via the GUI or remove it from the config). It shows up only now since previously your -args line would override it. But this is only a warning and shoulnd't cause your issue.

I have attempted defining a ROM file following three methods:
Just to make sure, have you then also specified the ROM in the VM config? You need to edit the config in '/etc/pve/qemu-server/' and add the 'romfile' parameter to your 'hostpci' line for the GPU, after putting your extracted version into '/usr/share/kvm'.

At this point I think it might get to the point of having to assume faulty hardware involved somewhere (mainboard, CPU, GPU, maybe power supply?). Especially the fact that your machine hard reboots would usually point to some more grave error somewhere, since it happens even when passing through a non-boot GPU as you demonstrated with your AMD card installed.

Maybe as a last shot, I've had PCIe AER do some funky stuff on my boards in the past, so you could try disabling it (at least auto recovery AFAIU) by appending pcie=noaer to your kernel command line - but do note that this might compromise stability of your system in other parts.
 
Just to make sure, have you then also specified the ROM in the VM config?
Yes, I did. My apologies for not being more clear on this. The config now looks something like this: hostpci0: 0000:07:00,pcie=1,x-vga=on,romfile=vbios.bin where vbios.bin is located in /usr/share/kvm.

At this point I think it might get to the point of having to assume faulty hardware involved somewhere (mainboard, CPU, GPU, maybe power supply?). Especially the fact that your machine hard reboots would usually point to some more grave error somewhere, since it happens even when passing through a non-boot GPU as you demonstrated with your AMD card installed.
I agree that it feels hardware-related. However, I will note that the host has not hard rebooted since the beginning of this thread. AND it has definitely not happened when passing through a non-boot GPU. It does still load to 100% and become non-responsive, though.

I have a question on this line of thought. It appears to me that Proxmox has accepted and booted properly with Kernel driver in use: vfio-pci as well as the outputs in the spoiler below? Therefore, is it not worth investigating further into the guest configuration?

root@pve:~# dmesg | grep -e DMAR -e IOMMU
[ 0.954451] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.957843] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.958577] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep 'remapping'
[ 0.957846] AMD-Vi: Interrupt remapping enabled

root@pve:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.991152] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294a5a):
[ 0.991910] AMD-Vi: Interrupt remapping enabled
[ 0.992002] AMD-Vi: Lazy IO/TLB flushing enabled
[ 0.992367] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep -i vfio
[ 5.250670] VFIO - User Level meta-driver version: 0.3
[ 5.254631] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none: owns=io+mem
[ 5.277813] vfio_pci: add [10de:1b80[ffffffff:ffffffff]] class 0x000000/00000000
[ 5.297921] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000

root@pve:~# lspci -nn | grep NVID
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
07:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep 07:00
/sys/kernel/iommu_groups/23/devices/0000:07:00.1
/sys/kernel/iommu_groups/22/devices/0000:07:00.0

I will try pcie-noaer and report back!
 
Yes, I did. My apologies for not being more clear on this. The config now looks something like this: hostpci0: 0000:07:00,pcie=1,x-vga=on,romfile=vbios.bin where vbios.bin is located in /usr/share/kvm.


I agree that it feels hardware-related. However, I will note that the host has not hard rebooted since the beginning of this thread. AND it has definitely not happened when passing through a non-boot GPU. It does still load to 100% and become non-responsive, though.

I have a question on this line of thought. It appears to me that Proxmox has accepted and booted properly with Kernel driver in use: vfio-pci as well as the outputs in the spoiler below? Therefore, is it not worth investigating further into the guest configuration?

root@pve:~# dmesg | grep -e DMAR -e IOMMU
[ 0.954451] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.957843] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.958577] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep 'remapping'
[ 0.957846] AMD-Vi: Interrupt remapping enabled

root@pve:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.991152] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.991906] pci 0000:00:00.2: AMD-Vi: Extended features (0x58f77ef22294a5a):
[ 0.991910] AMD-Vi: Interrupt remapping enabled
[ 0.992002] AMD-Vi: Lazy IO/TLB flushing enabled
[ 0.992367] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

root@pve:~# dmesg | grep -i vfio
[ 5.250670] VFIO - User Level meta-driver version: 0.3
[ 5.254631] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none: owns=io+mem
[ 5.277813] vfio_pci: add [10de:1b80[ffffffff:ffffffff]] class 0x000000/00000000
[ 5.297921] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000

root@pve:~# lspci -nn | grep NVID
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
07:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

root@pve:~# find /sys/kernel/iommu_groups/ -type l | grep 07:00
/sys/kernel/iommu_groups/23/devices/0000:07:00.1
/sys/kernel/iommu_groups/22/devices/0000:07:00.0

I will try pcie-noaer and report back!
How it went actually?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!