GPU PCIe Passthrough stopped working after update

Feb 21, 2019
17
1
8
39
I have been using PCIE passthrough with Windows 10 guest for a NVIDIA GeForce 1650 GPU without any issues until yesterday. After updating to pve-manager/6.2-4/9824574a I am suddenly getting the following error when trying to start the VM:

Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument
device does not support requested feature x-vga

I went back through the wiki and checked that all of my grub/modules/modprobe.d settings are still intact. My VM conf file hasn't changed:

Code:
agent: 1
bootdisk: sata0
cores: 10
cpu: host
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53

The relevant output of lspcie -nnk:
Code:
65:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU107 [10de:1f82] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. TU107 [19da:1546]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
65:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
    Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1546]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel


I've played with other hostpci0 options without any luck, i can deselect Primary-GPU and get the VM to boot up but the guest then fails to allocate resources for the GPU with the following error (from Windows guest):
Code:
This device cannot find enough free resources that it can use. (Code 12)
 
do:
Code:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
update-initramfs -u

Edit vm conf file
edit: cpu: host,hidden=1,flags=+pcid
add: args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
 
Done, rebooted host - same result, same error message.

Code:
> cat /etc/modprobe.d/blacklist.conf
blacklist nvidia
blacklist nvidiafb
blacklist nouveau

Code:
>cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

Code:
> cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0

Code:
> cat /etc/pve/qemu-server/107.conf
agent: 1
bootdisk: sata0
cores: 10
cpu: host,hidden=1,flags=+pcid
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53
 
Did you update GPU drivers too? I read the newest one block passthrough.

You should put in your vm.conf hostpci0: 65:00,pcie=1,x-vga=1,romfile=yourbiosname.rom
 
Did you update GPU drivers too? I read the newest one block passthrough.
to be honest I didn't notice if nvidia drivers got updated when i was running apt dist-upgrade but the driver is blacklisted so it shouldn't matter

You should put in your vm.conf hostpci0: 65:00,pcie=1,x-vga=1,romfile=yourbiosname.rom
The setup worked without that before, although in my experiments since I tried that as well with the same result.
 
Code:
> dmesg | grep -e BAR -e iommu -e IOMMU -e vfio -e "[b|B]"u -e DMAR
[    0.000000] Linux version 5.4.41-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.41-1 (Fri, 15 May 2020 15:06:08 +0200) ()
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.41-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
[    0.000000] [Firmware Bug]: TSC ADJUST: CPU0: -463055971124404 force to 0
[    0.010397] ACPI: DMAR 0x000000003BF59C30 0000E8 (v01 ALASKA A M I    00000001 INTL 20091013)
[    0.128966] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.128967] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.129493] Built 1 zonelists, mobility grouping on.  Total pages: 16439979
[    0.129495] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.4.41-1-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
[    0.129586] DMAR: IOMMU enabled
[    0.270626] DMAR: Host address width 46
[    0.270627] DMAR: DRHD base: 0x000000b5ffc000 flags: 0x0
[    0.270631] DMAR: dmar0: reg_base_addr b5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270632] DMAR: DRHD base: 0x000000d8ffc000 flags: 0x0
[    0.270635] DMAR: dmar1: reg_base_addr d8ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270636] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    0.270638] DMAR: dmar2: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270639] DMAR: DRHD base: 0x00000092ffc000 flags: 0x1
[    0.270641] DMAR: dmar3: reg_base_addr 92ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.270642] DMAR: RMRR base: 0x0000003bf8d000 end: 0x0000003bf8ffff
[    0.270643] DMAR: ATSR flags: 0x0
[    0.270645] DMAR-IR: IOAPIC id 12 under DRHD base  0xfbffc000 IOMMU 2
[    0.270645] DMAR-IR: IOAPIC id 11 under DRHD base  0xd8ffc000 IOMMU 1
[    0.270646] DMAR-IR: IOAPIC id 10 under DRHD base  0xb5ffc000 IOMMU 0
[    0.270646] DMAR-IR: IOAPIC id 8 under DRHD base  0x92ffc000 IOMMU 3
[    0.270647] DMAR-IR: IOAPIC id 9 under DRHD base  0x92ffc000 IOMMU 3
[    0.270647] DMAR-IR: HPET id 0 under DRHD base 0x92ffc000
[    0.270648] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.270648] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.271451] DMAR-IR: Enabled IRQ remapping in xapic mode
[    0.292732] TAA: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.292732] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[    0.000564] [Firmware Bug]: TSC ADJUST differs within socket(s), fixing all errors
[    0.308051] MDS CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html for more details.
[    0.308051] TAA CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html for more details.
[    0.328428] EISA bus registered
[    0.330706] ACPI: bus type PCI registered
[    0.330740] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0x60000000-0x6fffffff] (base 0x60000000)
[    0.417246] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.440337] ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00-15])
[    0.441265] PCI host bridge to bus 0000:00
[    0.441270] pci_bus 0000:00: root bus resource [bus 00-15]
[    0.446696] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.447047] pci 0000:00:1c.2: PCI bridge to [bus 02]
[    0.447321] pci 0000:00:1d.0: PCI bridge to [bus 03]
[    0.447337] pci_bus 0000:00: on NUMA node 0
[    0.448055] ACPI: PCI Root Bridge [PC01] (domain 0000 [bus 16-63])
[    0.448870] PCI host bridge to bus 0000:16
[    0.462694] pci 0000:16:00.0: PCI bridge to [bus 17]
[    0.462713] pci_bus 0000:16: on NUMA node 0
[    0.462870] ACPI: PCI Root Bridge [PC02] (domain 0000 [bus 64-b1])
[    0.463642] PCI host bridge to bus 0000:64
[    0.465532] pci 0000:65:00.0: BAR 3: assigned to efifb
[    0.474689] pci 0000:64:00.0: PCI bridge to [bus 65]
[    0.474699] pci_bus 0000:64: on NUMA node 0
[    0.474791] ACPI: PCI Root Bridge [PC03] (domain 0000 [bus b2-ff])
[    0.475579] PCI host bridge to bus 0000:b2
[    0.475580] pci_bus 0000:b2: root bus resource [io  0xc000-0xffff window]
[    0.475581] pci_bus 0000:b2: root bus resource [mem 0xd9000000-0xfbffffff window]
[    0.475582] pci_bus 0000:b2: root bus resource [mem 0x3800c0000000-0x3800ffffffff window]
[    0.475583] pci_bus 0000:b2: root bus resource [bus b2-ff]
[    0.476140] pci_bus 0000:b2: on NUMA node 0
[    0.476774] iommu: Default domain type: Translated
[    0.476774] ACPI: bus type USB registered
[    0.481764] e820: reserve RAM buffer [mem 0x36590000-0x37ffffff]
[    0.481765] e820: reserve RAM buffer [mem 0x38b0b000-0x3bffffff]
[    0.481766] e820: reserve RAM buffer [mem 0x39ee2000-0x3bffffff]
[    0.481766] e820: reserve RAM buffer [mem 0x3b1f6000-0x3bffffff]
[    0.499853] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.499861] pci 0000:00:1c.2: PCI bridge to [bus 02]
[    0.499867] pci 0000:00:1d.0: PCI bridge to [bus 03]
...
[    0.500021] pci_bus 0000:65: resource 0 [io  0xb000-0xbfff]
[    0.500021] pci_bus 0000:65: resource 1 [mem 0xd7000000-0xd80fffff]
[    0.500022] pci_bus 0000:65: resource 2 [mem 0xc0000000-0xd1ffffff 64bit pref]
...
[    0.967421] DMAR: dmar1: Using Queued invalidation
[    0.967423] DMAR: dmar0: Using Queued invalidation
[    0.967424] DMAR: dmar3: Using Queued invalidation
[    0.967689] pci 0000:00:00.0: Adding to iommu group 0
...
[    0.974003] pci 0000:65:00.0: Adding to iommu group 32
[    0.974041] pci 0000:65:00.1: Adding to iommu group 32
...
[    0.974831] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    0.978954] workingset: timestamp_bits=36 max_order=24 bucket_order=0
[    0.979860] zbud: loaded
[    0.986480] efifb: framebuffer at 0xd1000000, using 3072k, total 3072k
[    0.986509] fb0: EFI VGA frame buffer device
[    0.987133] input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input0
[    0.987154] ACPI: Sleep Button [SLPB]
[    0.987193] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input1
[    0.987204] ACPI: Power Button [PWRB]
[    0.987222] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
[    0.987245] ACPI: Power Button [PWRF]
[    0.992448] libphy: Fixed MDIO Bus: probed
[    0.993169] platform eisa.0: Probing EISA bus 0
[    1.022423] zswap: loaded using pool lzo/zbud
[    1.032700] evm: Initialising EVM extended attributes:
[    1.337198] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[    1.337232] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
[    1.338245] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
[    1.340650] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
[    1.784337] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 3
[    1.840205] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 4
[    2.204794] Console: switching to colour frame buffer device 128x48
[    3.720258] vfio-pci 0000:65:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    3.738794] vfio_pci: add [10de:1f82[ffffffff:ffffffff]] class 0x000000/00000000
[    3.758753] vfio_pci: add [10de:10fa[ffffffff:ffffffff]] class 0x000000/00000000
[    3.775437] Disabling lock debugging due to kernel taint
[   17.983477] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/l1tf.html for details.
[   21.946913] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[   21.946930] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   61.344254] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[   61.344271] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[48414.287166] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[48414.287183] vfio-pci 0000:65:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
 
Thanks, interesting. A few assumptions and observations.

  • Based on the output i'd suggest you look for a BIOS upgrade and install it, here referring to BUG notifications on TSC, TAA, MDS and others.
  • Because of [ 0.476774] iommu: Default domain type: Translated present i assume your system is not properly configured just yet.
  • configure vfio-pci to work with the IDS for the GPU and Audio
  • if you are using cpu:host it may not be required to load options for KVM at all
  • consider as boot options to add in /etc/kernel/cmdline, put these at the start of the cmdline file
iommu=pt intel_iommu=on video=efifb:off
next run the commandline: pve-efitboot-tool refresh
and reboot​
running: dmesg | grep -e BAR -e iommu -e IOMMU -e vfio -e "[b|B]"ug -e DMAR
should at one point show iommu: Default domain type: Passthrough (set via kernel command line)
 
Last edited:
  • Because of [ 0.476774] iommu: Default domain type: Translated present i assume your system is not properly configured just yet.
  • configure vfio-pci to work with the IDS for the GPU and Audio
Thanks for the additional suggestions, could you expand on those two points a bit more, I'm not clear what action you have in mind.
 
Thanks for the additional suggestions, could you expand on those two points a bit more, I'm not clear what action you have in mind.

eh, just read what i wrote, all is there. iommu: Default domain type: Translated must read ommu: Default domain type: passthrough for it to work, at least imho

and something like
options vfio-pci ids=10de:1f82,10de:10fa
 
Last edited:
  • Like
Reactions: wpowiertowski
eh, just read what i wrote, all is there. iommu: Default domain type: Translated must read ommu: Default domain type: passthrough for it to work, at least imho

and something like
options vfio-pci ids=10de:1f82,10de:10fa
Yeah - this was already part of my configuration - like I said the config was working fine for last two months

Code:
> cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f82,10de:10fa disable_vga=1
 
I looked at some older kernel logs and the IOMMU default domain type: Translated was still there when everything was working fine. This is really frustrating that suddenly without any configuration change thing are broken :(
 
Btw. while adding iommu=pt to kernel command line does yield
Code:
[    0.476514] iommu: Default domain type: Passthrough (set via kernel command line)
in dmesg, the KVM error still remains the same:
Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on,romfile=/usr/share/kvm/vbios.bin: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument
 
Btw. while adding iommu=pt to kernel command line does yield
Code:
[    0.476514] iommu: Default domain type: Passthrough (set via kernel command line)
in dmesg, the KVM error still remains the same:
Code:
kvm: -device vfio-pci,host=0000:65:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on,romfile=/usr/share/kvm/vbios.bin: vfio 0000:65:00.0: failed getting region info for VGA region index 8: Invalid argument

Ah! I completely missed that error message.


cat /etc/pve/qemu-server/107.conf
agent: 1
bootdisk: sata0
cores: 10
cpu: host,hidden=1,flags=+pcid
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
hostpci0: 65:00,pcie=1,x-vga=1
ide2: none,media=cdrom
machine: q35
memory: 24576
name: Windows10
net0: e1000=CE:B6:D8:EC:F1:E5,bridge=vmbr0
net1: virtio=A6:FA:FB:8E:7F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
sata0: tank:vm-107-disk-0,size=150G,ssd=1
sata1: tank:vm-107-disk-1,size=200G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=17efdf00-377c-4ac4-b337-3ff7c4fe1d79
sockets: 1
usb0: host=1-3.1.2.4,usb3=1
vga: virtio
vmgenid: 74cbd8bf-841a-4b27-b2ca-07cecfaffd53


?
 
Try changing the line
Code:
machine: q35
to
Code:
machine: pc-q35-3.1
and leave x-vga=1 and vga: none as you had them (you technically didn't have vga: none, you just ommitted it which is the same thing).

If that doesn't do it, add the machine: pc-q35-3.1 line and add the following line at the end of the XXX.conf file:
Code:
args: -machine type=q35,kernel_irqchip=on
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!