IOMMU GPU Passtrough: KVM segfault

margau

New Member
Dec 17, 2020
4
0
1
27
Hello together,

TLDR:
kvm segfault when starting VM with PCIe-GPU.

I'm currently trying to pass through an GeForce GTX 1050 Ti via IOMMU using an E3-1240 V2 on a Supermicro X9-SCA F Board.
Got iommu working, as far as I can tell from dmesg:
Code:
[    0.069454] DMAR: IOMMU enabled
[    0.133808] DMAR: Host address width 36
[    0.133809] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.133812] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f010da
[    0.133813] DMAR: RMRR base: 0x000000cdd4d000 end: 0x000000cdd69fff
[    0.133814] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed90000 IOMMU 0
[    0.133815] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[    0.133816] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.134038] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.692264] DMAR: No ATSR found
[    0.692301] DMAR: dmar0: Using Queued invalidation
[    0.692634] DMAR: Intel(R) Virtualization Technology for Directed I/O

The card is mapped to the correct driver:
Code:
lspci -nnk
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GP107 [GeForce GTX 1050 Ti] [1043:862a]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
    Subsystem: ASUSTeK Computer Inc. GP107GL High Definition Audio Controller [1043:862a]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel

The iommu group is a separate one, managed by using "intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction" in the cmdline. Without them, the card is placed together with some CPU components.

The following VM config is used (extract):
Code:
cpu: host,hidden=1,flags=+pcid
hostpci0: 01:00,pcie=1
machine: q35
numa: 0
vga: none

So far, so good. When I'm starting the VM, I get the following errors in the kernel log:
Code:
[   66.844527] vfio-pci 0000:01:00.0: enabling device (0000 -> 0001)
[   66.844780] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   66.845516] kvm[6449]: segfault at a8 ip 000055ec62bb8e27 sp 00007ffc6f417e20 error 4 in qemu-system-x86_64[55ec62ab8000+4af000]
[   66.845521] Code: 0f 1f 00 55 53 48 89 fb 48 83 ec 08 48 8b 6f 58 e8 1e b0 f0 ff 48 8b 7b 40 83 05 47 5c 89 00 01 48 85 ff 74 05 e8 79 51 27 00 <48> 8b 85 a8 00 00 00 48 85 c0 74 29 8b 93 a0 00 00 00 39 90 a0 00
Obviously, the start does not succeed.

Of course, I'm using the latest available proxmox (5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100))

Does anyone have an idea, where the problem may is?

Thanks!
Best regards
margau
 
Hi,
of course, the drivers are blacklisted (as you can see in the "lspci -nnk"-output, vfio is chosen for the card).
I also tried the options from the thread above, no success, still the same segfault.

Thanks!
margau
 
Yes, I know the wiki article and worked through it.
Have downgraded Kernel and KVM:
Code:
uname -a
Linux x 5.0.12-1-pve

Code:
kvm -version
QEMU emulator version 4.0.0 (pve-qemu-kvm_4.0.0)

But still getting the segfault at a8 ip 0000557a331ab3a7 sp 00007fffa910bd80 error 4 in qemu-system-x86_64[557a33144000+486000].

Best regards
margau
 
hostpci0: 01:00,pcie=1
machine: q35
vga: none
I guess not related to the segfault, but you may set the x-vga=on, so that the vga parameter will be ignored. And the rombar=off.
 
  • Like
Reactions: margau
Hi,
an update:
Tried it with seabios instead of OVMF, and altered the cmdline to contain pcie_acs_override=downstream. No success yet, still stuck on the segfault in KVM.

Best regards
margau