I've hit a wall as far as ideas to get GPU passthrough working. Everything I know seems to suggest that it should be working and supported by my system. The error I'm receiving when trying to launch a Windows 10 guest with an NVidia 1060 GTX passed through is:
From DMESG after attempting to launch the VM:
Here's the complete output:
As shown in the spoiler details above, IOMMU appears to be enabled. The GPU appears to be bound to the VFIO-PCI driver. The GPU appears to be isolated in it's own IOMMU group (group 13). DMAR: Intel(R) Virtualization Technology for Directed I/O appears to be enabled.
The only hints I see which might indicate the cause of the problem are "failed to set iommu for container: Operation not permitted" and in DMESG "vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform"
The easy thing is to "allow_unsafe_interrupts" which I believe will work. However, I've read that this can lead to data loss and possible system stability issues. I have been able to get passthrough to work on this system, but with different system settings. Not really sure what the difference is, other than I followed this guide during this setup attempt: https://forum.proxmox.com/threads/gpu-passthrough-tutorial-reference.34303/
This is my third install of proxmox (for reasons) attempting this so I'm getting pretty good at configuring and troubleshooting this. This attempt is the closest I've come to getting the GPU passthrough set up just right (I hope). Previous attempts have had poor performance in the guest, code 43 errors, or various other problems. In my previous attempts, I did not set "machine: q35" and I did not use "pcie=1" in the host config file. Also, I may have been using allow_unsafe_interrupts.
I'm going to try allow_unsafe_interrupts, but if there is any way to get this working without that, I am led to believe either performance or stability would be improved. In other words, I would prefer not to need allow_unsafe_interrupts.
Additonal system information:
System: Dell R610 Gen II, Dual Intel Xeon X5650, 48GB Ram
GPU: Nvidia GTX 1060 6GB
Host: Proxmox 5.2
Guest: Windows 10
I also have a Windows 7 guest. I'm testing both. Obviously, only trying to run one at a time with GPU passthrough.
Also, I don't think it's relevant, but the original Dell R610 BIOS version 3.0.0 gave an error in Linux that Vt-d was disabled due to a BIOS bug. I upgraded the BIOS to version 6.6.0, which resolved that error.
Hopefully someone has an idea on what to try next. I feel that I am so close - I've spent weeks trying to get this tuned just right. Let me know if there are any other details I can provide which might be helpful.
Related:
https://forum.proxmox.com/threads/problems-with-pcie-passthrough-dell-r510-h310.35811/
https://forums.unraid.net/topic/57218-gpu-for-dell-r710-vm/
Code:
"failed to set iommu for container: Operation not permitted"
Code:
vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Here's the complete output:
Code:
root@pve:~# qm start 102
kvm: -device vfio-pci,host=04:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio error: 0000:04:00.0: failed to setup container for group 13: failed to set iommu for container: Operation not permitted
start failed: command '/usr/bin/kvm -id 102 -name Win10GPU -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qemu-server/102-event.qmp,server,nowait' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=ac42b263-2a6f-41c2-b7fd-18ade5b74fae' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=qcow2,id=drive-efidisk0,file=/var/lib/vz/data/images/102/vm-102-disk-2.qcow2' -smp '8,sockets=1,cores=8,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -no-hpet -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=proxmox,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_reset,hv_vpindex,hv_runtime,hv_relaxed,hv_synic,hv_stimer,kvm=off' -m 16384 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -readconfig /usr/share/qemu-server/pve-q35.cfg -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=04:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on' -device 'vfio-pci,host=04:00.1,id=hostpci0.1,bus=ich9-pcie-port-1,addr=0x0.1' -chardev 'socket,path=/var/run/qemu-server/102.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:78dc0d4a33b' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/var/lib/vz/data/images/102/vm-102-disk-1.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=/var/lib/vz/data/images/102/vm-102-disk-0.qcow2,if=none,id=drive-scsi1,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/var/lib/vz/template/iso/virtio-win.iso,if=none,id=drive-sata0,media=cdrom,aio=threads' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=200' -drive 'if=none,id=drive-sata1,media=cdrom,aio=threads' -device 'ide-drive,bus=ahci0.1,drive=drive-sata1,id=sata1,bootindex=201' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=76:40:4A:FB:EB:C4,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35' -global 'kvm-pit.lost_tick_policy=discard'' failed: exit code 1
Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[ 0.000000] ACPI: DMAR 0x00000000AF3B3668 0001A8 (v01 DELL PE_SC3 00000001 DELL 00000001)
[ 0.000000] DMAR: IOMMU enabled
[ 0.000000] DMAR-IR: This system BIOS has enabled interrupt remapping
[ 1.130145] DMAR: Host address width 40
[ 1.130146] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[ 1.130169] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020fe
[ 1.130170] DMAR: RMRR base: 0x000000af4c8000 end: 0x000000af4dffff
[ 1.130171] DMAR: RMRR base: 0x000000af4b1000 end: 0x000000af4bffff
[ 1.130172] DMAR: RMRR base: 0x000000af4a1000 end: 0x000000af4a1fff
[ 1.130173] DMAR: RMRR base: 0x000000af4a3000 end: 0x000000af4a3fff
[ 1.130174] DMAR: RMRR base: 0x000000af4a5000 end: 0x000000af4a5fff
[ 1.130175] DMAR: RMRR base: 0x000000af4a7000 end: 0x000000af4a7fff
[ 1.130176] DMAR: RMRR base: 0x000000af4c0000 end: 0x000000af4c0fff
[ 1.130177] DMAR: RMRR base: 0x000000af4c2000 end: 0x000000af4c2fff
[ 1.130178] DMAR: ATSR flags: 0x0
[ 1.130456] DMAR: dmar0: Using Queued invalidation
[ 1.130470] DMAR: Setting RMRR:
[ 1.130687] DMAR: Setting identity map for device 0000:00:1d.7 [0xaf4c2000 - 0xaf4c2fff]
[ 1.130924] DMAR: Setting identity map for device 0000:00:1a.7 [0xaf4c0000 - 0xaf4c0fff]
[ 1.131145] DMAR: Setting identity map for device 0000:00:1d.1 [0xaf4a7000 - 0xaf4a7fff]
[ 1.131399] DMAR: Setting identity map for device 0000:00:1d.0 [0xaf4a5000 - 0xaf4a5fff]
[ 1.131668] DMAR: Setting identity map for device 0000:00:1a.1 [0xaf4a3000 - 0xaf4a3fff]
[ 1.131947] DMAR: Setting identity map for device 0000:00:1a.0 [0xaf4a1000 - 0xaf4a1fff]
[ 1.131963] DMAR: Setting identity map for device 0000:00:1a.0 [0xaf4b1000 - 0xaf4bffff]
[ 1.131965] DMAR: Setting identity map for device 0000:00:1a.1 [0xaf4b1000 - 0xaf4bffff]
[ 1.131967] DMAR: Setting identity map for device 0000:00:1d.0 [0xaf4b1000 - 0xaf4bffff]
[ 1.131968] DMAR: Setting identity map for device 0000:00:1d.1 [0xaf4b1000 - 0xaf4bffff]
[ 1.131970] DMAR: Setting identity map for device 0000:00:1a.7 [0xaf4c8000 - 0xaf4dffff]
[ 1.131972] DMAR: Setting identity map for device 0000:00:1d.7 [0xaf4c8000 - 0xaf4dffff]
[ 1.131975] DMAR: Prepare 0-16MiB unity mapping for LPC
[ 1.132236] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[ 1.132402] DMAR: Intel(R) Virtualization Technology for Directed I/O
[25690.312900] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[30307.456765] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[31016.561766] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[62769.291934] vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Code:
root@pve:~# lspci -v -n -s 04:
04:00.0 0300: 10de:1c03 (rev a1) (prog-if 00 [VGA controller])
Subsystem: 3842:6267
Flags: fast devsel, IRQ 15
Memory at db000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at c0000000 (64-bit, prefetchable) [disabled] [size=256M]
Memory at be000000 (64-bit, prefetchable) [disabled] [size=32M]
I/O ports at ec80 [disabled] [size=128]
Expansion ROM at da000000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
04:00.1 0403: 10de:10f1 (rev a1)
Subsystem: 3842:6267
Flags: bus master, fast devsel, latency 0, IRQ 14
Memory at daffc000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
Code:
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:fe:00.1
/sys/kernel/iommu_groups/17/devices/0000:fe:00.0
/sys/kernel/iommu_groups/7/devices/0000:03:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:1c.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.4
/sys/kernel/iommu_groups/25/devices/0000:ff:03.2
/sys/kernel/iommu_groups/25/devices/0000:ff:03.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.1
/sys/kernel/iommu_groups/15/devices/0000:07:00.0
/sys/kernel/iommu_groups/15/devices/0000:07:00.1
/sys/kernel/iommu_groups/15/devices/0000:06:02.0
/sys/kernel/iommu_groups/5/devices/0000:00:14.1
/sys/kernel/iommu_groups/5/devices/0000:00:14.2
/sys/kernel/iommu_groups/5/devices/0000:00:14.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.1
/sys/kernel/iommu_groups/13/devices/0000:04:00.1
/sys/kernel/iommu_groups/13/devices/0000:04:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:07.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.3
/sys/kernel/iommu_groups/21/devices/0000:fe:05.1
/sys/kernel/iommu_groups/21/devices/0000:fe:05.2
/sys/kernel/iommu_groups/11/devices/0000:01:00.0
/sys/kernel/iommu_groups/11/devices/0000:01:00.1
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.1
/sys/kernel/iommu_groups/28/devices/0000:ff:06.2
/sys/kernel/iommu_groups/28/devices/0000:ff:06.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.5
/sys/kernel/iommu_groups/18/devices/0000:fe:02.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.1
/sys/kernel/iommu_groups/18/devices/0000:fe:02.4
/sys/kernel/iommu_groups/18/devices/0000:fe:02.2
/sys/kernel/iommu_groups/18/devices/0000:fe:02.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.1
/sys/kernel/iommu_groups/8/devices/0000:00:1d.0
/sys/kernel/iommu_groups/8/devices/0000:00:1d.7
/sys/kernel/iommu_groups/26/devices/0000:ff:04.2
/sys/kernel/iommu_groups/26/devices/0000:ff:04.0
/sys/kernel/iommu_groups/26/devices/0000:ff:04.3
/sys/kernel/iommu_groups/26/devices/0000:ff:04.1
/sys/kernel/iommu_groups/16/devices/0000:08:00.0
/sys/kernel/iommu_groups/16/devices/0000:06:04.0
/sys/kernel/iommu_groups/16/devices/0000:08:00.1
/sys/kernel/iommu_groups/6/devices/0000:00:1a.1
/sys/kernel/iommu_groups/6/devices/0000:00:1a.0
/sys/kernel/iommu_groups/6/devices/0000:00:1a.7
/sys/kernel/iommu_groups/24/devices/0000:ff:02.5
/sys/kernel/iommu_groups/24/devices/0000:ff:02.3
/sys/kernel/iommu_groups/24/devices/0000:ff:02.1
/sys/kernel/iommu_groups/24/devices/0000:ff:02.4
/sys/kernel/iommu_groups/24/devices/0000:ff:02.2
/sys/kernel/iommu_groups/24/devices/0000:ff:02.0
/sys/kernel/iommu_groups/14/devices/0000:05:00.0
/sys/kernel/iommu_groups/4/devices/0000:00:09.0
/sys/kernel/iommu_groups/22/devices/0000:fe:06.3
/sys/kernel/iommu_groups/22/devices/0000:fe:06.1
/sys/kernel/iommu_groups/22/devices/0000:fe:06.2
/sys/kernel/iommu_groups/22/devices/0000:fe:06.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:02:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.2
/sys/kernel/iommu_groups/20/devices/0000:fe:04.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.3
/sys/kernel/iommu_groups/20/devices/0000:fe:04.1
/sys/kernel/iommu_groups/10/devices/0000:00:1f.2
/sys/kernel/iommu_groups/10/devices/0000:00:1f.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/19/devices/0000:fe:03.1
/sys/kernel/iommu_groups/19/devices/0000:fe:03.4
/sys/kernel/iommu_groups/19/devices/0000:fe:03.2
/sys/kernel/iommu_groups/19/devices/0000:fe:03.0
/sys/kernel/iommu_groups/9/devices/0000:09:03.0
/sys/kernel/iommu_groups/9/devices/0000:00:1e.0
/sys/kernel/iommu_groups/27/devices/0000:ff:05.3
/sys/kernel/iommu_groups/27/devices/0000:ff:05.1
/sys/kernel/iommu_groups/27/devices/0000:ff:05.2
/sys/kernel/iommu_groups/27/devices/0000:ff:05.0
As shown in the spoiler details above, IOMMU appears to be enabled. The GPU appears to be bound to the VFIO-PCI driver. The GPU appears to be isolated in it's own IOMMU group (group 13). DMAR: Intel(R) Virtualization Technology for Directed I/O appears to be enabled.
The only hints I see which might indicate the cause of the problem are "failed to set iommu for container: Operation not permitted" and in DMESG "vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform"
The easy thing is to "allow_unsafe_interrupts" which I believe will work. However, I've read that this can lead to data loss and possible system stability issues. I have been able to get passthrough to work on this system, but with different system settings. Not really sure what the difference is, other than I followed this guide during this setup attempt: https://forum.proxmox.com/threads/gpu-passthrough-tutorial-reference.34303/
This is my third install of proxmox (for reasons) attempting this so I'm getting pretty good at configuring and troubleshooting this. This attempt is the closest I've come to getting the GPU passthrough set up just right (I hope). Previous attempts have had poor performance in the guest, code 43 errors, or various other problems. In my previous attempts, I did not set "machine: q35" and I did not use "pcie=1" in the host config file. Also, I may have been using allow_unsafe_interrupts.
I'm going to try allow_unsafe_interrupts, but if there is any way to get this working without that, I am led to believe either performance or stability would be improved. In other words, I would prefer not to need allow_unsafe_interrupts.
Additonal system information:
Code:
pve-manager/5.2-10/6f892b40 (running kernel: 4.15.18-8-pve)
System: Dell R610 Gen II, Dual Intel Xeon X5650, 48GB Ram
GPU: Nvidia GTX 1060 6GB
Host: Proxmox 5.2
Guest: Windows 10
I also have a Windows 7 guest. I'm testing both. Obviously, only trying to run one at a time with GPU passthrough.
Also, I don't think it's relevant, but the original Dell R610 BIOS version 3.0.0 gave an error in Linux that Vt-d was disabled due to a BIOS bug. I upgraded the BIOS to version 6.6.0, which resolved that error.
Hopefully someone has an idea on what to try next. I feel that I am so close - I've spent weeks trying to get this tuned just right. Let me know if there are any other details I can provide which might be helpful.
Related:
https://forum.proxmox.com/threads/problems-with-pcie-passthrough-dell-r510-h310.35811/
https://forums.unraid.net/topic/57218-gpu-for-dell-r710-vm/
Last edited: