GPU Passthrough - Out of ideas

WolfpacK

Member
Nov 11, 2018
10
0
21
43
I've hit a wall as far as ideas to get GPU passthrough working. Everything I know seems to suggest that it should be working and supported by my system. The error I'm receiving when trying to launch a Windows 10 guest with an NVidia 1060 GTX passed through is:
Code:
"failed to set iommu for container: Operation not permitted"
From DMESG after attempting to launch the VM:
Code:
vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform

Here's the complete output:
Code:
root@pve:~# qm start 102
kvm: -device vfio-pci,host=04:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio error: 0000:04:00.0: failed to setup container for group 13: failed to set iommu for container: Operation not permitted
start failed: command '/usr/bin/kvm -id 102 -name Win10GPU -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qemu-server/102-event.qmp,server,nowait' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=ac42b263-2a6f-41c2-b7fd-18ade5b74fae' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=qcow2,id=drive-efidisk0,file=/var/lib/vz/data/images/102/vm-102-disk-2.qcow2' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=04:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=04:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:78dc0d4a33b' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/var/lib/vz/data/images/102/vm-102-disk-1.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/var/lib/vz/data/images/102/vm-102-disk-0.qcow2,if=none,id=drive-scsi1,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/var/lib/vz/template/iso/virtio-win.iso,if=none,id=drive-sata0,media=cdrom,aio=threads' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=200' -drive 'if=none,id=drive-sata1,media=cdrom,aio=threads' -device 'ide-drive,bus=ahci0.1,drive=drive-sata1,id=sata1,bootindex=201' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=76:40:4A:FB:EB:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35' -global 'kvm-pit.lost_tick_policy=discard'' failed: exit code 1

Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x00000000AF3B3668 0001A8 (v01 DELL   PE_SC3   00000001 DELL 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.000000] DMAR-IR: This system BIOS has enabled interrupt remapping
[    1.130145] DMAR: Host address width 40
[    1.130146] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    1.130169] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020fe
[    1.130170] DMAR: RMRR base: 0x000000af4c8000 end: 0x000000af4dffff
[    1.130171] DMAR: RMRR base: 0x000000af4b1000 end: 0x000000af4bffff
[    1.130172] DMAR: RMRR base: 0x000000af4a1000 end: 0x000000af4a1fff
[    1.130173] DMAR: RMRR base: 0x000000af4a3000 end: 0x000000af4a3fff
[    1.130174] DMAR: RMRR base: 0x000000af4a5000 end: 0x000000af4a5fff
[    1.130175] DMAR: RMRR base: 0x000000af4a7000 end: 0x000000af4a7fff
[    1.130176] DMAR: RMRR base: 0x000000af4c0000 end: 0x000000af4c0fff
[    1.130177] DMAR: RMRR base: 0x000000af4c2000 end: 0x000000af4c2fff
[    1.130178] DMAR: ATSR flags: 0x0
[    1.130456] DMAR: dmar0: Using Queued invalidation
[    1.130470] DMAR: Setting RMRR:
[    1.130687] DMAR: Setting identity map for device 0000:00:1d.7 [0xaf4c2000 - 0xaf4c2fff]
[    1.130924] DMAR: Setting identity map for device 0000:00:1a.7 [0xaf4c0000 - 0xaf4c0fff]
[    1.131145] DMAR: Setting identity map for device 0000:00:1d.1 [0xaf4a7000 - 0xaf4a7fff]
[    1.131399] DMAR: Setting identity map for device 0000:00:1d.0 [0xaf4a5000 - 0xaf4a5fff]
[    1.131668] DMAR: Setting identity map for device 0000:00:1a.1 [0xaf4a3000 - 0xaf4a3fff]
[    1.131947] DMAR: Setting identity map for device 0000:00:1a.0 [0xaf4a1000 - 0xaf4a1fff]
[    1.131963] DMAR: Setting identity map for device 0000:00:1a.0 [0xaf4b1000 - 0xaf4bffff]
[    1.131965] DMAR: Setting identity map for device 0000:00:1a.1 [0xaf4b1000 - 0xaf4bffff]
[    1.131967] DMAR: Setting identity map for device 0000:00:1d.0 [0xaf4b1000 - 0xaf4bffff]
[    1.131968] DMAR: Setting identity map for device 0000:00:1d.1 [0xaf4b1000 - 0xaf4bffff]
[    1.131970] DMAR: Setting identity map for device 0000:00:1a.7 [0xaf4c8000 - 0xaf4dffff]
[    1.131972] DMAR: Setting identity map for device 0000:00:1d.7 [0xaf4c8000 - 0xaf4dffff]
[    1.131975] DMAR: Prepare 0-16MiB unity mapping for LPC
[    1.132236] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    1.132402] DMAR: Intel(R) Virtualization Technology for Directed I/O
[25690.312900] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[30307.456765] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[31016.561766] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[62769.291934] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform

Code:
root@pve:~# lspci -v -n -s 04:
04:00.0 0300: 10de:1c03 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: 3842:6267
    Flags: fast devsel, IRQ 15
    Memory at db000000 (32-bit, non-prefetchable) [disabled] [size=16M]
    Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M]
    Memory at be000000 (64-bit, prefetchable) [disabled] [size=32M]
    I/O ports at ec80 [disabled] [size=128]
    Expansion ROM at da000000 [disabled] [size=512K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Legacy Endpoint, MSI 00
    Capabilities: [100] Virtual Channel
    Capabilities: [250] Latency Tolerance Reporting
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [420] Advanced Error Reporting
    Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900] #19
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
04:00.1 0403: 10de:10f1 (rev a1)
    Subsystem: 3842:6267
    Flags: bus master, fast devsel, latency 0, IRQ 14
    Memory at daffc000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

Code:
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:fe:00.1
/sys/kernel/iommu_groups/17/devices/0000:fe:00.0
/sys/kernel/iommu_groups/7/devices/0000:03:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.4
/sys/kernel/iommu_groups/25/devices/0000:ff:03.2
/sys/kernel/iommu_groups/25/devices/0000:ff:03.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.1
/sys/kernel/iommu_groups/15/devices/0000:07:00.0
/sys/kernel/iommu_groups/15/devices/0000:07:00.1
/sys/kernel/iommu_groups/15/devices/0000:06:02.0
/sys/kernel/iommu_groups/5/devices/0000:00:14.1
/sys/kernel/iommu_groups/5/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:14.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.1
/sys/kernel/iommu_groups/13/devices/0000:04:00.1
/sys/kernel/iommu_groups/13/devices/0000:04:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:07.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.3
/sys/kernel/iommu_groups/21/devices/0000:fe:05.1
/sys/kernel/iommu_groups/21/devices/0000:fe:05.2
/sys/kernel/iommu_groups/11/devices/0000:01:00.0
/sys/kernel/iommu_groups/11/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.1
/sys/kernel/iommu_groups/28/devices/0000:ff:06.2
/sys/kernel/iommu_groups/28/devices/0000:ff:06.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.5
/sys/kernel/iommu_groups/18/devices/0000:fe:02.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.1
/sys/kernel/iommu_groups/18/devices/0000:fe:02.4
/sys/kernel/iommu_groups/18/devices/0000:fe:02.2
/sys/kernel/iommu_groups/18/devices/0000:fe:02.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.1
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.7
/sys/kernel/iommu_groups/26/devices/0000:ff:04.2
/sys/kernel/iommu_groups/26/devices/0000:ff:04.0
/sys/kernel/iommu_groups/26/devices/0000:ff:04.3
/sys/kernel/iommu_groups/26/devices/0000:ff:04.1
/sys/kernel/iommu_groups/16/devices/0000:08:00.0
/sys/kernel/iommu_groups/16/devices/0000:06:04.0
/sys/kernel/iommu_groups/16/devices/0000:08:00.1
/sys/kernel/iommu_groups/6/devices/0000:00:1a.1
/sys/kernel/iommu_groups/6/devices/0000:00:1a.0
/sys/kernel/iommu_groups/6/devices/0000:00:1a.7
/sys/kernel/iommu_groups/24/devices/0000:ff:02.5
/sys/kernel/iommu_groups/24/devices/0000:ff:02.3
/sys/kernel/iommu_groups/24/devices/0000:ff:02.1
/sys/kernel/iommu_groups/24/devices/0000:ff:02.4
/sys/kernel/iommu_groups/24/devices/0000:ff:02.2
/sys/kernel/iommu_groups/24/devices/0000:ff:02.0
/sys/kernel/iommu_groups/14/devices/0000:05:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:09.0
/sys/kernel/iommu_groups/22/devices/0000:fe:06.3
/sys/kernel/iommu_groups/22/devices/0000:fe:06.1
/sys/kernel/iommu_groups/22/devices/0000:fe:06.2
/sys/kernel/iommu_groups/22/devices/0000:fe:06.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.2
/sys/kernel/iommu_groups/20/devices/0000:fe:04.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.3
/sys/kernel/iommu_groups/20/devices/0000:fe:04.1
/sys/kernel/iommu_groups/10/devices/0000:00:1f.2
/sys/kernel/iommu_groups/10/devices/0000:00:1f.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/19/devices/0000:fe:03.1
/sys/kernel/iommu_groups/19/devices/0000:fe:03.4
/sys/kernel/iommu_groups/19/devices/0000:fe:03.2
/sys/kernel/iommu_groups/19/devices/0000:fe:03.0
/sys/kernel/iommu_groups/9/devices/0000:09:03.0
/sys/kernel/iommu_groups/9/devices/0000:00:1e.0
/sys/kernel/iommu_groups/27/devices/0000:ff:05.3
/sys/kernel/iommu_groups/27/devices/0000:ff:05.1
/sys/kernel/iommu_groups/27/devices/0000:ff:05.2
/sys/kernel/iommu_groups/27/devices/0000:ff:05.0

As shown in the spoiler details above, IOMMU appears to be enabled. The GPU appears to be bound to the VFIO-PCI driver. The GPU appears to be isolated in it's own IOMMU group (group 13). DMAR: Intel(R) Virtualization Technology for Directed I/O appears to be enabled.

The only hints I see which might indicate the cause of the problem are "failed to set iommu for container: Operation not permitted" and in DMESG "vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform"

The easy thing is to "allow_unsafe_interrupts" which I believe will work. However, I've read that this can lead to data loss and possible system stability issues. I have been able to get passthrough to work on this system, but with different system settings. Not really sure what the difference is, other than I followed this guide during this setup attempt: https://forum.proxmox.com/threads/gpu-passthrough-tutorial-reference.34303/

This is my third install of proxmox (for reasons) attempting this so I'm getting pretty good at configuring and troubleshooting this. This attempt is the closest I've come to getting the GPU passthrough set up just right (I hope). Previous attempts have had poor performance in the guest, code 43 errors, or various other problems. In my previous attempts, I did not set "machine: q35" and I did not use "pcie=1" in the host config file. Also, I may have been using allow_unsafe_interrupts.

I'm going to try allow_unsafe_interrupts, but if there is any way to get this working without that, I am led to believe either performance or stability would be improved. In other words, I would prefer not to need allow_unsafe_interrupts.

Additonal system information:
Code:
pve-manager/5.2-10/6f892b40 (running kernel: 4.15.18-8-pve)

System: Dell R610 Gen II, Dual Intel Xeon X5650, 48GB Ram
GPU: Nvidia GTX 1060 6GB
Host: Proxmox 5.2
Guest: Windows 10

I also have a Windows 7 guest. I'm testing both. Obviously, only trying to run one at a time with GPU passthrough.

Also, I don't think it's relevant, but the original Dell R610 BIOS version 3.0.0 gave an error in Linux that Vt-d was disabled due to a BIOS bug. I upgraded the BIOS to version 6.6.0, which resolved that error.

Hopefully someone has an idea on what to try next. I feel that I am so close - I've spent weeks trying to get this tuned just right. Let me know if there are any other details I can provide which might be helpful.

Related:
https://forum.proxmox.com/threads/problems-with-pcie-passthrough-dell-r510-h310.35811/
https://forums.unraid.net/topic/57218-gpu-for-dell-r710-vm/
 
Last edited:
Update: I tried adding "iommu=pt" to grub even though it wasn't specified as a fix for this problem, but that didn't help. Same result.

I tried enabling allow_unsafe_interrupts. This works - the VM is able to boot and installing most recent version of NVidia drivers and software as I write this.

Still... I don't understand why this was needed with my system setup. All my research shows that the allow_unsafe_interrupts shouldn't be needed.

This isn't resolved, since I pretty much already knew the unsafe interrupts workaround would work. If anyone has any ideas or additional information to provide me, that would be helpful.
 
Still... I don't understand why this was needed with my system setup. All my research shows that the allow_unsafe_interrupts shouldn't be needed.
mhmm... looking at your post, i could not identify either why you would need this
searching for this, i found some people with dell (r610 and r510) which have the same problem, so i guess the platform does not really support that
 
Super unfortunate.

At least the system seems to be running great. Gameplay is smooth with Parsec. The only noticable slowdown is when moving or resizing application windows. SUPER LAGGY. Not sure what the cause is for that. I'm hoping it's caused by the use of an x1 riser - not enough GPU bandwidth. If that's not it, I would lean towards it being an issue with the hypervisor. Eventually, I have an idea for a new type of PCIe riser - whenever I get around to making it.
 
hello, did you manage to get it working without "allow_unsafe_interrupts"? Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!