GPU PCIe Passthrough stopped working after update

Feb 21, 2019
17
1
8
39
I have been using PCIE passthrough with Windows 10 guest for a NVIDIA GeForce 1650 GPU without any issues until yesterday. After updating to pve-manager/6.2-4/9824574a I am suddenly getting the following error when trying to start the VM:

Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument
device does not support requested feature x-vga

I went back through the wiki and checked that all of my grub/modules/modprobe.d settings are still intact. My VM conf file hasn't changed:

Code:
agent: 1
bootdisk: sata0
cores: 10
cpu: host
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53

The relevant output of lspcie -nnk:
Code:
65:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU107 [10de:1f82] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. TU107 [19da:1546]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
65:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1546]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel


I've played with other hostpci0 options without any luck, i can deselect Primary-GPU and get the VM to boot up but the guest then fails to allocate resources for the GPU with the following error (from Windows guest):
Code:
This device cannot find enough free resources that it can use. (Code 12)
 
do:
Code:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
update-initramfs -u

Edit vm conf file
edit: cpu: host,hidden=1,flags=+pcid
add: args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
 
Done, rebooted host - same result, same error message.

Code:
> cat /etc/modprobe.d/blacklist.conf
blacklist nvidia
blacklist nvidiafb
blacklist nouveau

Code:
>cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

Code:
> cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

Code:
> cat /etc/pve/qemu-server/107.conf
agent: 1
bootdisk: sata0
cores: 10
cpu: host,hidden=1,flags=+pcid
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53
 
Did you update GPU drivers too? I read the newest one block passthrough.

You should put in your vm.conf hostpci0: 65:00,pcie=1,x-vga=1,romfile=yourbiosname.rom
 
Did you update GPU drivers too? I read the newest one block passthrough.
to be honest I didn't notice if nvidia drivers got updated when i was running apt dist-upgrade but the driver is blacklisted so it shouldn't matter

You should put in your vm.conf hostpci0: 65:00,pcie=1,x-vga=1,romfile=yourbiosname.rom
The setup worked without that before, although in my experiments since I tried that as well with the same result.
 
Code:
> dmesg | grep -e BAR -e iommu -e IOMMU -e vfio -e "[b|B]"u -e DMAR
[    0.000000] Linux version 5.4.41-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.41-1 (Fri, 15 May 2020 15:06:08 +0200) ()
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.41-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
[    0.000000] [Firmware Bug]: TSC ADJUST: CPU0: -463055971124404 force to 0
[    0.010397] ACPI: DMAR 0x000000003BF59C30 0000E8 (v01 ALASKA A M I    00000001 INTL 20091013)
[    0.128966] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.128967] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.129493] Built 1 zonelists, mobility grouping on.  Total pages: 16439979
[    0.129495] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.41-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
[    0.129586] DMAR: IOMMU enabled
[    0.270626] DMAR: Host address width 46
[    0.270627] DMAR: DRHD base: 0x000000b5ffc000 flags: 0x0
[    0.270631] DMAR: dmar0: reg_base_addr b5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270632] DMAR: DRHD base: 0x000000d8ffc000 flags: 0x0
[    0.270635] DMAR: dmar1: reg_base_addr d8ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270636] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    0.270638] DMAR: dmar2: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270639] DMAR: DRHD base: 0x00000092ffc000 flags: 0x1
[    0.270641] DMAR: dmar3: reg_base_addr 92ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270642] DMAR: RMRR base: 0x0000003bf8d000 end: 0x0000003bf8ffff
[    0.270643] DMAR: ATSR flags: 0x0
[    0.270645] DMAR-IR: IOAPIC id 12 under DRHD base  0xfbffc000 IOMMU 2
[    0.270645] DMAR-IR: IOAPIC id 11 under DRHD base  0xd8ffc000 IOMMU 1
[    0.270646] DMAR-IR: IOAPIC id 10 under DRHD base  0xb5ffc000 IOMMU 0
[    0.270646] DMAR-IR: IOAPIC id 8 under DRHD base  0x92ffc000 IOMMU 3
[    0.270647] DMAR-IR: IOAPIC id 9 under DRHD base  0x92ffc000 IOMMU 3
[    0.270647] DMAR-IR: HPET id 0 under DRHD base 0x92ffc000
[    0.270648] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.270648] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.271451] DMAR-IR: Enabled IRQ remapping in xapic mode
[    0.292732] TAA: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.292732] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.000564] [Firmware Bug]: TSC ADJUST differs within socket(s), fixing all errors
[    0.308051] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.308051] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[    0.328428] EISA bus registered
[    0.330706] ACPI: bus type PCI registered
[    0.330740] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x60000000-0x6fffffff] (base 0x60000000)
[    0.417246] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.440337] ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-15])
[    0.441265] PCI host bridge to bus 0000:00
[    0.441270] pci_bus 0000:00: root bus resource [bus 00-15]
[    0.446696] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.447047] pci 0000:00:1c.2: PCI bridge to [bus 02]
[    0.447321] pci 0000:00:1d.0: PCI bridge to [bus 03]
[    0.447337] pci_bus 0000:00: on NUMA node 0
[    0.448055] ACPI: PCI Root Bridge [PC01] (domain 0000 [bus 16-63])
[    0.448870] PCI host bridge to bus 0000:16
[    0.462694] pci 0000:16:00.0: PCI bridge to [bus 17]
[    0.462713] pci_bus 0000:16: on NUMA node 0
[    0.462870] ACPI: PCI Root Bridge [PC02] (domain 0000 [bus 64-b1])
[    0.463642] PCI host bridge to bus 0000:64
[    0.465532] pci 0000:65:00.0: BAR 3: assigned to efifb
[    0.474689] pci 0000:64:00.0: PCI bridge to [bus 65]
[    0.474699] pci_bus 0000:64: on NUMA node 0
[    0.474791] ACPI: PCI Root Bridge [PC03] (domain 0000 [bus b2-ff])
[    0.475579] PCI host bridge to bus 0000:b2
[    0.475580] pci_bus 0000:b2: root bus resource [io  0xc000-0xffff window]
[    0.475581] pci_bus 0000:b2: root bus resource [mem 0xd9000000-0xfbffffff window]
[    0.475582] pci_bus 0000:b2: root bus resource [mem 0x3800c0000000-0x3800ffffffff window]
[    0.475583] pci_bus 0000:b2: root bus resource [bus b2-ff]
[    0.476140] pci_bus 0000:b2: on NUMA node 0
[    0.476774] iommu: Default domain type: Translated
[    0.476774] ACPI: bus type USB registered
[    0.481764] e820: reserve RAM buffer [mem 0x36590000-0x37ffffff]
[    0.481765] e820: reserve RAM buffer [mem 0x38b0b000-0x3bffffff]
[    0.481766] e820: reserve RAM buffer [mem 0x39ee2000-0x3bffffff]
[    0.481766] e820: reserve RAM buffer [mem 0x3b1f6000-0x3bffffff]
[    0.499853] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.499861] pci 0000:00:1c.2: PCI bridge to [bus 02]
[    0.499867] pci 0000:00:1d.0: PCI bridge to [bus 03]
...
[    0.500021] pci_bus 0000:65: resource 0 [io  0xb000-0xbfff]
[    0.500021] pci_bus 0000:65: resource 1 [mem 0xd7000000-0xd80fffff]
[    0.500022] pci_bus 0000:65: resource 2 [mem 0xc0000000-0xd1ffffff 64bit pref]
...
[    0.967421] DMAR: dmar1: Using Queued invalidation
[    0.967423] DMAR: dmar0: Using Queued invalidation
[    0.967424] DMAR: dmar3: Using Queued invalidation
[    0.967689] pci 0000:00:00.0: Adding to iommu group 0
...
[    0.974003] pci 0000:65:00.0: Adding to iommu group 32
[    0.974041] pci 0000:65:00.1: Adding to iommu group 32
...
[    0.974831] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    0.978954] workingset: timestamp_bits=36 max_order=24 bucket_order=0
[    0.979860] zbud: loaded
[    0.986480] efifb: framebuffer at 0xd1000000, using 3072k, total 3072k
[    0.986509] fb0: EFI VGA frame buffer device
[    0.987133] input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input0
[    0.987154] ACPI: Sleep Button [SLPB]
[    0.987193] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
[    0.987204] ACPI: Power Button [PWRB]
[    0.987222] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[    0.987245] ACPI: Power Button [PWRF]
[    0.992448] libphy: Fixed MDIO Bus: probed
[    0.993169] platform eisa.0: Probing EISA bus 0
[    1.022423] zswap: loaded using pool lzo/zbud
[    1.032700] evm: Initialising EVM extended attributes:
[    1.337198] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    1.337232] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    1.338245] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[    1.340650] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[    1.784337] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 3
[    1.840205] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 4
[    2.204794] Console: switching to colour frame buffer device 128x48
[    3.720258] vfio-pci 0000:65:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    3.738794] vfio_pci: add [10de:1f82[ffffffff:ffffffff]] class 0x000000/00000000
[    3.758753] vfio_pci: add [10de:10fa[ffffffff:ffffffff]] class 0x000000/00000000
[    3.775437] Disabling lock debugging due to kernel taint
[   17.983477] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[   21.946913] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[   21.946930] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   61.344254] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[   61.344271] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[48414.287166] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[48414.287183] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
 
Thanks, interesting. A few assumptions and observations.

  • Based on the output i'd suggest you look for a BIOS upgrade and install it, here referring to BUG notifications on TSC, TAA, MDS and others.
  • Because of [ 0.476774] iommu: Default domain type: Translated present i assume your system is not properly configured just yet.
  • configure vfio-pci to work with the IDS for the GPU and Audio
  • if you are using cpu:host it may not be required to load options for KVM at all
  • consider as boot options to add in /etc/kernel/cmdline, put these at the start of the cmdline file
iommu=pt intel_iommu=on video=efifb:off
next run the commandline: pve-efitboot-tool refresh
and reboot​
running: dmesg | grep -e BAR -e iommu -e IOMMU -e vfio -e "[b|B]"ug -e DMAR
should at one point show iommu: Default domain type: Passthrough (set via kernel command line)
 
Last edited:
  • Because of [ 0.476774] iommu: Default domain type: Translated present i assume your system is not properly configured just yet.
  • configure vfio-pci to work with the IDS for the GPU and Audio
Thanks for the additional suggestions, could you expand on those two points a bit more, I'm not clear what action you have in mind.
 
Thanks for the additional suggestions, could you expand on those two points a bit more, I'm not clear what action you have in mind.

eh, just read what i wrote, all is there. iommu: Default domain type: Translated must read ommu: Default domain type: passthrough for it to work, at least imho

and something like
options vfio-pci ids=10de:1f82,10de:10fa
 
Last edited:
  • Like
Reactions: wpowiertowski
eh, just read what i wrote, all is there. iommu: Default domain type: Translated must read ommu: Default domain type: passthrough for it to work, at least imho

and something like
options vfio-pci ids=10de:1f82,10de:10fa
Yeah - this was already part of my configuration - like I said the config was working fine for last two months

Code:
> cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f82,10de:10fa disable_vga=1
 
I looked at some older kernel logs and the IOMMU default domain type: Translated was still there when everything was working fine. This is really frustrating that suddenly without any configuration change thing are broken :(
 
Btw. while adding iommu=pt to kernel command line does yield
Code:
[    0.476514] iommu: Default domain type: Passthrough (set via kernel command line)
in dmesg, the KVM error still remains the same:
Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on,romfile=/usr/share/kvm/vbios.bin: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument
 
Btw. while adding iommu=pt to kernel command line does yield
Code:
[    0.476514] iommu: Default domain type: Passthrough (set via kernel command line)
in dmesg, the KVM error still remains the same:
Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on,romfile=/usr/share/kvm/vbios.bin: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument

Ah! I completely missed that error message.


cat /etc/pve/qemu-server/107.conf
agent: 1
bootdisk: sata0
cores: 10
cpu: host,hidden=1,flags=+pcid
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vga: virtio
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53


?
 
Try changing the line
Code:
machine: q35
to
Code:
machine: pc-q35-3.1
and leave x-vga=1 and vga: none as you had them (you technically didn't have vga: none, you just ommitted it which is the same thing).

If that doesn't do it, add the machine: pc-q35-3.1 line and add the following line at the end of the XXX.conf file:
Code:
args: -machine type=q35,kernel_irqchip=on