GPU passthrough: still problems

Ecoll

Active Member
Jan 31, 2019
8
0
41
46
Hi everybody

I know there are plenty of posts on this subject, but I can't seem to solve my problem

My system:
HP DL380 G7
2 CPU xeon X5687 (4 cores / 8 threads, 3.6gHz)
48 Gbit DDR3
AMD firepro W2100

I try to place the firepro in a VM under windows 10.

QM config ID
Code:
balloon: 0
bios: ovmf
bootdisk: sata0
cores: 16
cpu: host
description: unused0%3A local-lvm%3Avm-110-disk-0
efidisk0: local-lvm:vm-400-disk-0,size=4M
hostpci0: 08:00.0,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 49152
name: Windows
net0: e1000=96:BC:36:7D:DF:C3,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
sata0: local-lvm:vm-400-disk-1,size=64G
sata1: local-lvm:vm-400-disk-2,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=e6a84121-8202-412c-a01a-d5e0035adf72
sockets: 1
vmgenid: c6451afa-43b5-4b23-8f3b-2b5816f51d9f

I followed this post
https://pve.proxmox.com/wiki/Pci_passthrough

When i execute this cmd: dmesg | grep 'remapping'
Code:
[    0.388593] DMAR-IR: This system BIOS has enabled interrupt remapping
               on a chipset that contains an erratum making that
               feature unstable.  To maintain system stability
               interrupt remapping is being disabled.  Please
               contact your BIOS vendor for an update

when i try find /sys/kernel/iommu_groups/ -type l
Code:
/sys/kernel/iommu_groups/17/devices/0000:00:1f.2
/sys/kernel/iommu_groups/17/devices/0000:00:1f.0
/sys/kernel/iommu_groups/7/devices/0000:00:07.0
/sys/kernel/iommu_groups/25/devices/0000:3e:06.3
/sys/kernel/iommu_groups/25/devices/0000:3e:06.1
/sys/kernel/iommu_groups/25/devices/0000:3e:06.2
/sys/kernel/iommu_groups/25/devices/0000:3e:06.0
/sys/kernel/iommu_groups/15/devices/0000:00:1d.3
/sys/kernel/iommu_groups/15/devices/0000:00:1d.1
/sys/kernel/iommu_groups/15/devices/0000:00:1d.2
/sys/kernel/iommu_groups/15/devices/0000:00:1d.0
/sys/kernel/iommu_groups/15/devices/0000:00:1d.7
/sys/kernel/iommu_groups/5/devices/0000:00:05.0
/sys/kernel/iommu_groups/23/devices/0000:3e:04.2
/sys/kernel/iommu_groups/23/devices/0000:3e:04.0
/sys/kernel/iommu_groups/23/devices/0000:3e:04.3
/sys/kernel/iommu_groups/23/devices/0000:3e:04.1
/sys/kernel/iommu_groups/13/devices/0000:00:14.1
/sys/kernel/iommu_groups/13/devices/0000:00:14.2
/sys/kernel/iommu_groups/13/devices/0000:00:14.0
/sys/kernel/iommu_groups/31/devices/0000:3f:06.1
/sys/kernel/iommu_groups/31/devices/0000:3f:06.2
/sys/kernel/iommu_groups/31/devices/0000:3f:06.0
/sys/kernel/iommu_groups/31/devices/0000:3f:06.3
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/21/devices/0000:3e:02.5
/sys/kernel/iommu_groups/21/devices/0000:3e:02.3
/sys/kernel/iommu_groups/21/devices/0000:3e:02.1
/sys/kernel/iommu_groups/21/devices/0000:3e:02.4
/sys/kernel/iommu_groups/21/devices/0000:3e:02.2
/sys/kernel/iommu_groups/21/devices/0000:3e:02.0
/sys/kernel/iommu_groups/11/devices/0000:00:0d.0
/sys/kernel/iommu_groups/11/devices/0000:00:0d.5
/sys/kernel/iommu_groups/11/devices/0000:00:0d.3
/sys/kernel/iommu_groups/11/devices/0000:00:0d.1
/sys/kernel/iommu_groups/11/devices/0000:00:0d.6
/sys/kernel/iommu_groups/11/devices/0000:00:0d.4
/sys/kernel/iommu_groups/11/devices/0000:00:0d.2
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/28/devices/0000:3f:03.4
/sys/kernel/iommu_groups/28/devices/0000:3f:03.2
/sys/kernel/iommu_groups/28/devices/0000:3f:03.0
/sys/kernel/iommu_groups/28/devices/0000:3f:03.1
/sys/kernel/iommu_groups/18/devices/0000:05:00.0
/sys/kernel/iommu_groups/8/devices/0000:00:08.0
/sys/kernel/iommu_groups/26/devices/0000:3f:00.0
/sys/kernel/iommu_groups/26/devices/0000:3f:00.1
/sys/kernel/iommu_groups/16/devices/0000:01:03.0
/sys/kernel/iommu_groups/16/devices/0000:00:1e.0
/sys/kernel/iommu_groups/6/devices/0000:00:06.0
/sys/kernel/iommu_groups/24/devices/0000:3e:05.0
/sys/kernel/iommu_groups/24/devices/0000:3e:05.3
/sys/kernel/iommu_groups/24/devices/0000:3e:05.1
/sys/kernel/iommu_groups/24/devices/0000:3e:05.2
/sys/kernel/iommu_groups/14/devices/0000:03:00.0
/sys/kernel/iommu_groups/14/devices/0000:00:1c.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.2
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:04:00.1
/sys/kernel/iommu_groups/14/devices/0000:03:00.1
/sys/kernel/iommu_groups/14/devices/0000:00:1c.4
/sys/kernel/iommu_groups/14/devices/0000:04:00.0
/sys/kernel/iommu_groups/14/devices/0000:00:1c.2
/sys/kernel/iommu_groups/14/devices/0000:02:00.4
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/22/devices/0000:3e:03.1
/sys/kernel/iommu_groups/22/devices/0000:3e:03.4
/sys/kernel/iommu_groups/22/devices/0000:3e:03.2
/sys/kernel/iommu_groups/22/devices/0000:3e:03.0
/sys/kernel/iommu_groups/12/devices/0000:00:0e.3
/sys/kernel/iommu_groups/12/devices/0000:00:0e.1
/sys/kernel/iommu_groups/12/devices/0000:00:0e.4
/sys/kernel/iommu_groups/12/devices/0000:00:0e.2
/sys/kernel/iommu_groups/12/devices/0000:00:0e.0
/sys/kernel/iommu_groups/30/devices/0000:3f:05.3
/sys/kernel/iommu_groups/30/devices/0000:3f:05.1
/sys/kernel/iommu_groups/30/devices/0000:3f:05.2
/sys/kernel/iommu_groups/30/devices/0000:3f:05.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/20/devices/0000:3e:00.0
/sys/kernel/iommu_groups/20/devices/0000:3e:00.1
/sys/kernel/iommu_groups/10/devices/0000:00:0a.0
/sys/kernel/iommu_groups/29/devices/0000:3f:04.2
/sys/kernel/iommu_groups/29/devices/0000:3f:04.0
/sys/kernel/iommu_groups/29/devices/0000:3f:04.3
/sys/kernel/iommu_groups/29/devices/0000:3f:04.1
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/19/devices/0000:08:00.0 <- firepro W2100
/sys/kernel/iommu_groups/19/devices/0000:08:00.1
/sys/kernel/iommu_groups/9/devices/0000:00:09.0
/sys/kernel/iommu_groups/27/devices/0000:3f:02.5
/sys/kernel/iommu_groups/27/devices/0000:3f:02.3
/sys/kernel/iommu_groups/27/devices/0000:3f:02.1
/sys/kernel/iommu_groups/27/devices/0000:3f:02.4
/sys/kernel/iommu_groups/27/devices/0000:3f:02.2
/sys/kernel/iommu_groups/27/devices/0000:3f:02.0

And dmesg | grep -e DMAR -e IOMMU
Code:
[    0.007175] ACPI: DMAR 0x00000000CF62FE80 000168 (v01 HP     ProLiant 00000001 \xd2?   0000162E)
[    0.196656] DMAR: IOMMU enabled
[    0.388593] DMAR-IR: This system BIOS has enabled interrupt remapping
[    1.395177] DMAR: Host address width 39
[    1.395178] DMAR: DRHD base: 0x000000d7ffe000 flags: 0x1
[    1.395201] DMAR: dmar0: reg_base_addr d7ffe000 ver 1:0 cap c90780106f0462 ecap f0207e
[    1.395202] DMAR: RMRR base: 0x000000cf7fc000 end: 0x000000cf7fdfff
[    1.395203] DMAR: RMRR base: 0x000000cf7f5000 end: 0x000000cf7fafff
[    1.395204] DMAR: RMRR base: 0x000000cf63e000 end: 0x000000cf63ffff
[    1.395205] DMAR: ATSR flags: 0x0
[    1.395498] DMAR: dmar0: Using Queued invalidation
[    1.408137] DMAR: Intel(R) Virtualization Technology for Directed I/O
[   10.116356] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[   10.116357] AMD-Vi: AMD IOMMUv2 functionality not available on this system

and now run the VM with qm showcmd ID
Code:
/usr/bin/kvm -id 400 -name Windows -chardev 'socket,id=qmp,path=/var/run/qemu-server/400.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/400.pid -daemonize -smbios 'type=1,uuid=e6a84121-8202-412c-a01a-d5e0035adf72' -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/pve/vm-400-disk-0' -smp '16,sockets=1,cores=16,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/400.vnc,password -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt' -m 49152 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=c6451afa-43b5-4b23-8f3b-2b5816f51d9f' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:08:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'VGA,id=vga,bus=pcie.0,addr=0x1' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:685c1e4fedc9' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=/dev/pve/vm-400-disk-1,if=none,id=drive-sata0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -drive 'file=/dev/pve/vm-400-disk-2,if=none,id=drive-sata1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ahci0.1,drive=drive-sata1,id=sata1' -netdev 'type=tap,id=net0,ifname=tap400i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=96:BC:36:7D:DF:C3,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -machine 'type=q35+pve0' -global 'kvm-pit.lost_tick_policy=discard'

and result
Code:
kvm: -device vfio-pci,host=0000:08:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:08:00.0: failed to open /dev/vfio/19: No such file or directory

somebody can help me ?
 
lspci -v -s 08:00.0
Code:
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland GL [FirePro W2100] (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Oland GL [FirePro W2100]
        Physical Slot: 1
        Flags: fast devsel, IRQ 10
        Memory at e0000000 (64-bit, prefetchable) [disabled] [size=256M]
        Memory at fbfc0000 (64-bit, non-prefetchable) [disabled] [size=256K]
        I/O ports at 5000 [disabled] [size=256]
        [virtual] Expansion ROM at fbf00000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Kernel modules: radeon, amdgpu
 
Hello,

Your GPU has loaded radeon driver so I think you cannot bind vfio on it.

in the guide, have you followed this section ?
Code:
echo "options vfio-pci ids=10de:1381,10de:0fbc" > /etc/modprobe.d/vfio.conf

and blacklisted radeon driver ?
Code:
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf

if so, you should try adding grub options:
Code:
 video=efifb:off,vesafb=off
 
Also, please try with UEFI. Most cards work with UEFI, but don't with BIOS.

bios: ovmf
i have try the two solutions

Hello,

Your GPU has loaded radeon driver so I think you cannot bind vfio on it.

thank you
i have check all config and adding grub options
after reboot, lspci -v -s 08:00.0

Code:
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland GL [FirePro W2100] (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company Oland GL [FirePro W2100]
        Physical Slot: 1
        Flags: fast devsel, IRQ 10
        Memory at e0000000 (64-bit, prefetchable) [disabled] [size=256M]
        Memory at fbfc0000 (64-bit, non-prefetchable) [disabled] [size=256K]
        I/O ports at 5000 [disabled] [size=256]
        [virtual] Expansion ROM at fbf00000 [disabled] [size=128K]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [200] #15
        Capabilities: [270] #19
        Kernel driver in use: vfio-pci
        Kernel modules: radeon, amdgpu

But i have a new message
dmesg | grep -e DMAR -e IOMMU
Code:
[    0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[    0.006584] ACPI: DMAR 0x00000000CF62FE80 000168 (v01 HP     ProLiant 00000001 \xd2?   0000162E)
[    0.196639] DMAR: IOMMU enabled
[    0.386734] DMAR-IR: This system BIOS has enabled interrupt remapping
[    1.353991] DMAR: Host address width 39
[    1.353993] DMAR: DRHD base: 0x000000d7ffe000 flags: 0x1
[    1.354008] DMAR: dmar0: reg_base_addr d7ffe000 ver 1:0 cap c90780106f0462 ecap f0207e
[    1.354009] DMAR: RMRR base: 0x000000cf7fc000 end: 0x000000cf7fdfff
[    1.354009] DMAR: RMRR base: 0x000000cf7f5000 end: 0x000000cf7fafff
[    1.354010] DMAR: RMRR base: 0x000000cf63e000 end: 0x000000cf63ffff
[    1.354011] DMAR: ATSR flags: 0x0
[    1.354289] DMAR: dmar0: Using Queued invalidation
[    1.368037] DMAR: Intel(R) Virtualization Technology for Directed I/O
[   10.087329] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de>
[   10.087331] AMD-Vi: AMD IOMMUv2 functionality not available on this system
[ 3058.155053] vfio-pci 0000:08:00.1: DMAR: Device was ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 3058.155065] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 3083.175705] vfio-pci 0000:08:00.1: DMAR: Device was ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 3083.175718] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 3290.597677] vfio-pci 0000:08:00.1: DMAR: Device was ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 3290.597690] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 3301.877433] vfio-pci 0000:08:00.1: DMAR: Device was ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 3301.877445] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
[ 3320.508995] vfio-pci 0000:08:00.1: DMAR: Device was ineligible for IOMMU domain attach due to platform RMRR requirement. Patch is in effect.
[ 3320.509008] vfio_iommu_type1_attach_group: No interrupt remapping support.  Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform

I forgot that I made this patch too
https://forum.proxmox.com/threads/c...ntel-iommu-driver-to-remove-rmrr-check.36374/
 
Last edited:
Take care, in your lspci for your GPU I can see
Code:
       Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+

That could lead to a code 43.
check in other posts how to enable MSI in windows guest VM