[SOLVED] Problem with GPU Passthrough

krafit · May 8, 2022

jart said:
Solved!

Using:
echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
echo 1 > /sys/bus/pci/rescan

Thank you very much, it solved my issue too. I got same "vfio-pci BAR 0: can't reserve memory" issue just after pve 7.2-3 upgrade. Also i needed to add "video=efifb:off" to kernel boot options, so i even could passthrough primary GPU in x16 slot.

blackpaw · May 10, 2022

jart said:
Solved!

Using:
echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
echo 1 > /sys/bus/pci/rescan

You can create a .sh chmod +x and add it to cr

Thankyou! That actually fixed my BAR 3 problems where all else failed.

ZaxLofful · May 10, 2022

I just want to reiterate what @leesteken said....Adding the parameters with commas is not valid, this is a misconception from how the pci.ids works; as well as other command line software

I saw other posts afterwards, that still have it wrong (looking at @Tardar [sorry!])

leesteken said:
You don't need amd_iommu=on because it is on by default. Also video=vesafb,efifb:eek:ff is invalid. You probably wannt video=vesafb:off video=efifb:off (and with 5.15 also video=simplefb:off).

They have to be separated like this (otherwise it won't work):

Code:

video=vesafb:off video=efifb:off video=simplefb:off

EDIT:
I have an AMD so the end result looks like this (yes the PCI.ids can be comma separated [they made this hard, rofl]):

Code:

amd_iommu=on iommu=pt textonly vfio_iommu_type1.allow_unsafe_interrupts=1 nofb nomodeset vfio-pci.ids=00XX:00XX,00XX:00XX,00XX:00XX,00XX:00XX video=vesafb:off video=efifb:off video=simplefb:off

leesteken · May 10, 2022

ZaxLofful said:
I have an AMD so the end result looks like this (yes the PCI.ids can be comma separated [they made this hard, rofl]):

Code:

amd_iommu=on iommu=pt textonly vfio_iommu_type1.allow_unsafe_interrupts=1 nofb nomodeset vfio-pci.ids=00XX:00XX,00XX:00XX,00XX:00XX,00XX:00XX video=vesafb:off video=efifb:off video=simplefb:off

amd_iommu=on is not necessary and actually invalid because amd_iommu is on by default. You probably don't need textonly nofb nomodeset because you already disable the framebuffers and do early binding to vfio_pci. I doubt that you see any performance improvement for not passed through devices by using iommu=pt. Sorry, couldn't resist after you summoned me ;-)

ZaxLofful · May 11, 2022

leesteken said:
amd_iommu=on is not necessary and actually invalid because amd_iommu is on by default. You probably don't need textonly nofb nomodeset because you already disable the framebuffers and do early binding to vfio_pci. I doubt that you see any performance improvement for not passed through devices by using iommu=pt. Sorry, couldn't resist after you summoned me ;-)

It cannot be both un-necessary and invalid; especially if its already on. Just means its redundant, as already mentioned doesn't cause any harm; and actually teaches people about what is happening.

"Probably" and "doubt" is the key factor to your post, because with my setup; I had to do all of this and more to get it to actually work!

Explicitly defining these parameters, has no foreseeable downsides; if a headless configuration is desired.

jbattermann · May 13, 2022

After upgrading to 7.2.3 I saw the same problem with one of my VMs (the BAR 1: can't reserve .. messages) and when checking the dmesg output, I see the following two to three lines:

Code:

[    0.000000] Command line: initrd=\EFI\proxmox\5.15.35-1-pve\initrd.img-5.15.35-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt pcie_acs_override=downstream pcie_aspm=off pci=noaer video=vesafb:off video=efifb:off video=simplefb:off
[    0.192647] Kernel command line: initrd=\EFI\proxmox\5.15.35-1-pve\initrd.img-5.15.35-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt pcie_acs_override=downstream pcie_aspm=off pci=noaer video=vesafb:off video=efifb:off video=simplefb:off
[    0.802065] pci 0000:47:00.0: BAR 1: assigned to efifb

That last one is interesting.. I assumed video=efifbff disabled efifb and it from 'reserving' that graphics card, but apparently it does not. This system has an AST2500 (for the BMC/IPMI) and ideally I'd want to hard-code the console output to that one, so I also tried the following kernal command line:

Code:

[    0.000000] Command line: initrd=\EFI\proxmox\5.15.35-1-pve\initrd.img-5.15.35-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs amd_iommu=on iommu=pt pcie_acs_override=downstream pcie_aspm=off pci=noaer textonly video=astdrmfb video=efifb:off video=vesafb:off video=efifb:off video=simplefb:off

..to no avail:

Code:

[    0.798432] pci 0000:47:00.0: BAR 1: assigned to efifb

What's going on / what might I be missing here? Does anyone see what's wrong/going on? This used to work perfectly fine pre 7.2.x :-/

Thanks!

Lefuneste · May 14, 2022

jart said:
Solved!

Using:
echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
echo 1 > /sys/bus/pci/rescan

You can create a .sh chmod +x and add it to cron

File: /root/fix_gpu_pass.sh

//Note Change "0000\:0X\:00.0" for your GPU PCI ID

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan

Add to cron:

crontab -e

add:

@reboot /root/fix_gpu_pass.sh

Published in other post

Thank you for your post. I have just spent 2 days and 2 nights trying to eradicate this nasty BOOTFB memory lock brought by the broken 5.15 kernels. Obviously this awful regression happens (as usual) when I am in the process of installing and configuring a new GPU with pass-through, making the entire process much more time consuming and painful than it should be.

I pinpointed the problem, as many others, to the new kernel just ignoring any boot argument, and locking a GPU reserved memory range whatever the cmdline parameter you throw at it.

root=ZFS=rpool/ROOT/pve-1 boot=zfs iommu=pt amd_iommu=on kvm_amd.npt=1 kvm_amd.avic=1 pcie_acs_override=downstream,multifunction vfio_iommu_type1 allow_unsafe_interrupts=1 video=efifbff video=vesafbff video=simplefbff nofb nomodeset quiet

I downgraded to 5.13 and realized that I had wasted another week-end to kernel regressions, as the VM would get its GPU passthrough working straight away. Not willing to pin the kernel version, I was looking for a way to eradicate this ridiculous BOOTFB lock, as I already had to do the same in a previous life. But I could not get it to work again as the command was different.

To make things a bit clearer for desperate souls wandering this forum, here is what you get when you are plagued by this bug once you've done all you can to get the pass-through working (enabling IOMMU, changing boot parameters, getting Dev ID into vfio.conf...) :

Symptom : in dmesg you get one instance (or several thousand) of the super infamous

vfio-pci 0000:06:00.0: BAR 0: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]

It means that the vfio-pci pass-through driver is trying to lock the device (GPU) but is unable to do so, because another process is already accessing the same memory range.

Important to know that the actual memory range is indicated in the log message with 0x HEX prefix (0xd0000000-0xdfffffff)

To dig a bit further the secret command required is :

cat /proc/iomem which returns the memory reservation ranges

Then you look at the memory range used by your GPU PCIe device (in my case 0000:06:00.0)

c0000000-fec2ffff : PCI Bus 0000:00 <-- Start of the PCIE bus reservation
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI Bus 0000:06 <-- Hey here's my GPU!
d0000000-dfffffff : 0000:06:00.0 <-- Look Same address as in the error message !
d0000000-d02fffff : BOOTFB <-- WHAT THE Fùµ) is this?
e0000000-e01fffff : 0000:06:00.0

This BOOTFB frame buffer is NOT there when using the 5.13 kernel. It is supposed to be blocked by he cmdline parameter video=efifbff
Another of these frame buffers is simplefb which has the same behavior but gets correctly blocked by video=simplefbff

This is what it should look like after applying your deep cleansing method.
No more BOOTFB BS!

c0000000-fec2ffff : PCI Bus 0000:00
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI Bus 0000:06
d0000000-dfffffff : 0000:06:00.0
e0000000-e01fffff : 0000:06:00.0

And here's what happens once the GPU gets reserved for the pass-through driver (vfio-pci) and the VM is running in the background

c0000000-fec2ffff : PCI Bus 0000:00
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI Bus 0000:06
d0000000-dfffffff : 0000:06:00.0
d0000000-dfffffff : vfio-pci <-- Pass-through driver correctly garbing the PCIE device!
e0000000-e01fffff : 0000:06:00.0
e0000000-e01fffff : vfio-pci

Your post saved the day and I could happily kick this nasty BOOTFB bugger out of my GPU reserved memory.

Obviously, as any flame thrower, it is not perfect to light up a cigarette, but it'll do the job for the moment...

Last important note for AMD GPU users : /etc/modprobe.d/vfio.conf does not need to be set to the GPU Ids. Apparently, the pass-though still works even with amdgpu driver being used by host at boot time (hence not requiring to be blacklisted). I have my RX 6500 XT running fine inside a Windows VM with such a configuration.

Nvidia GPU behave differently and usually require the proper settings to be added to /etc/modprobe.d/vfio.conf + unlocked ROM file passed at VM launch for some GPUs at least.

nienna · May 23, 2022

uiffiu said:
since kernel 5.15 windows VM boot only works after boot host with HDMI disconnected. Monitor power supply disconnect is not enough, need to unplug hdmi cable.

This was the final tip that got it to work for me. Had to unplug the HDMI dummy from the graphics card during reboot of the actual server. Then everything worked perfectly. Thank you so much for pointing this out, @uiffiu !

Does anyone know if this is something that can be fixed in the future, or will I always need to go digging at the back of my rack when I need to reboot the server?

guillaumeb · May 27, 2022

Is there any bug tracker that can be watched for this issue ? I have pinned the server to 5.13 in which GPU passthrough was working fine but would like the ability to upgrade to 5.15 as soon as it is fixed.

Lefuneste · May 27, 2022

guillaumeb said:
Is there any bug tracker that can be watched for this issue ? I have pinned the server to 5.13 in which GPU passthrough was working fine but would like the ability to upgrade to 5.15 as soon as it is fixed.

You can forcefully unload the BOOTFB framebuffer by using Jart proposed method. This should work fine with 5.15 kernel version. This is what I am using on 2 servers while the bug is still an issue with 5.15 kernels. Also see my post above about diagnosing if the BOOTFB lock is indeed your issue.

1/ Create a .sh file : nano /root/fix_gpu_pass.sh

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan

--> //Note Change "0000\:0X\:00.0" for your GPU PCI ID

2/ make it executable : chmod +x /root/fix_gpu_pass.sh

3/ Add this to your crontab so that it run after reboot :

crontab -e

add:

@reboot /root/fix_gpu_pass.sh

uiffiu · May 27, 2022

Lefuneste said:
You can forcefully unload the BOOTFB framebuffer by using Jart proposed method. This should work fine with 5.15 kernel version. This is what I am using on 2 servers while the bug is still an issue with 5.15 kernels. Also see my post above about diagnosing if the BOOTFB lock is indeed your issue.

1/ Create a .sh file : nano /root/fix_gpu_pass.sh

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan

--> //Note Change "0000\:0X\:00.0" for your GPU PCI ID

2/ make it executable : chmod +x /root/fix_gpu_pass.sh

3/ Add this to your crontab so that it run after reboot :

crontab -e

add:

@reboot /root/fix_gpu_pass.sh

BOOTFB was my issue, fixed it with this script. now I don't need to have hdmi unplugged during boot or use 5.13 xD

guillaumeb · May 27, 2022

Lefuneste said:
You can forcefully unload the BOOTFB framebuffer by using Jart proposed method. This should work fine with 5.15 kernel version. This is what I am using on 2 servers while the bug is still an issue with 5.15 kernels. Also see my post above about diagnosing if the BOOTFB lock is indeed your issue.

1/ Create a .sh file : nano /root/fix_gpu_pass.sh

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan

--> //Note Change "0000\:0X\:00.0" for your GPU PCI ID

2/ make it executable : chmod +x /root/fix_gpu_pass.sh

3/ Add this to your crontab so that it run after reboot :

crontab -e

add:

@reboot /root/fix_gpu_pass.sh

Yes, thank you for your detailed post. It helped me confirm BOOTFB was indeed the issue. Applying the script does release the memory allocation from BOOTFB and vfio seem to be correctly grabbing the pci devices
I seem to have occasions where the PCI rescan conflicts with the start of the Guest VM on startup though. I don't have a compelling reason to upgrade to 5.15 at this point so I might stick with 5.13.

03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
Subsystem: eVga.com. Corp. GP107 [GeForce GTX 1050 Ti] [3842:6253]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
03:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
Subsystem: eVga.com. Corp. GP107GL High Definition Audio Controller [3842:6253]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

e0000000-f1ffffff : PCI Bus 0000:03
e0000000-efffffff : 0000:03:00.0
e0000000-efffffff : vfio-pci **<= would have been allocated to BOOTFB before
f0000000-f1ffffff : 0000:03:00.0
f0000000-f1ffffff : vfio-pci
fa000000-fb0fffff : PCI Bus 0000:03
fa000000-faffffff : 0000:03:00.0
fa000000-faffffff : vfio-pci
fb000000-fb07ffff : 0000:03:00.0
fb080000-fb083fff : 0000:03:00.1
fb080000-fb083fff : vfio-pci

sublightnova · May 30, 2022

So glad I finally found this thread.. Like many others, been pulling my hair out over the last 3 days.... I had GPU passthrough working months ago and couldn't figure out why it doesn't work on a fresh install.

The cron job script did the job.. No more bootFB or whatever ROM reservation errors and passthrough works !!

Thanks the person who originally came up with this sollution or posted it here!! You saved me a bundle !!

My 3060ti is now happilly passed through.. now I'm holding my breath for the vGPU unlocking for this card to happen at some point in the (near) future...

jiuntian · Jun 10, 2022

This post save my day! Have been scratching out my head trying different methods to make the GPU passthrough works, but none of those actually made a difference until I found this. I was having BAR 1 error, due to bootFB and the cron job is just the right solution for this.

dece03 · Jun 17, 2022

What a nice post! I've been looking for a solution to this BAR 3 issues all night. Big Thanks!

By the way, I also found better solution on a reddit post. It is adding "initcall_blacklist=sysfb_init" to kernel parameter. No need "video=efifb:off" or "video=simplefb:off" in kernel parameter. I also tested, it does solve the problem!

Reference:
https://www.reddit.com/r/VFIO/comme...let_simplefb_stay_away_from_the_gpu/?sort=old
https://www.reddit.com/r/Proxmox/comments/vc9hw3/latest_proxmox_7_the_kernel_breaks_my_gpu/?sort=old

kevinyu211 · Jul 3, 2022

dece03 said:
What a nice post! I've been looking for a solution to this BAR 3 issues all night. Big Thanks!

By the way, I also found better solution on a reddit post. It is adding "initcall_blacklist=sysfb_init" to kernel parameter. No need "video=efifb:off" or "video=simplefb:off" in kernel parameter. I also tested, it does solve the problem!

Reference:
https://www.reddit.com/r/VFIO/comme...let_simplefb_stay_away_from_the_gpu/?sort=old
https://www.reddit.com/r/Proxmox/comments/vc9hw3/latest_proxmox_7_the_kernel_breaks_my_gpu/?sort=old

Thanks for your kind notes! I encountered a similar issue in Proxmox 7.2.5. Replacing video=efifb:off with initcall_blacklist=sysfb_init in /etc/default/grub indeed resolves the issue.

duxnobis13 · Jul 12, 2022

Could you guys please share a picture of your grub entries, same for modules, modprobe.d, and blacklist. Would be great. Cant get my glu through. Tried everything

leesteken · Jul 12, 2022

duxnobis13 said:
Could you guys please share a picture of your grub entries, same for modules, modprobe.d, and blacklist. Would be great. Cant get my glu through. Tried everything

Can you please share the make and model of your GPU and motherboard? What is the version of Proxmox and kernel (uname -a). Which bootloader does your system use? What is the output of cat /proc/cmdline? What do your IOMMU groups (

for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

) look like?

duxnobis13 · Jul 12, 2022

Hi. Sure! Let's see what I can provide...

Setup:
AsRock Taichi Z690
Intel i9 12900K
2x32GB DDR5 RAM
ASUS AMD Radeon RX 6800 XT 16GB

Kernel:
Linux hci01 5.17.14-edge #1 SMP PREEMPT PVE Edge 5.17.14-1 (2022-06-09) x86_64 GNU/Linux

Bootloader:
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,0002,0003,0004
Boot0001* UEFI OS HD(2,GPT,76ddea56-ef64-4p33-b78e-e29b213a7121,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0002* UEFI OS HD(2,GPT,e5158e13-ab5b-4b1e-f3ea-82d6afdac10f,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0003 ubuntu HD(1,GPT,c06371f8-f8f4-42ed-9361-1641fb1d2c25,0x800,0x100000)/File(\EFI\ubuntu\shimx64.efi)..BO
Boot0004 UEFI: USB, Partition 1 PciRoot(0x0)/Pci(0x14,0x0)/USB(17,0)/HD(1,MBR,0x0,0x800,0x394d800)..BO

Cmdline output:
initrd=\EFI\proxmox\5.17.14-edge\initrd.img-5.17.14-edge root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=efifb

ff

Iommu Groups:


IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4660] (rev 02)
IOMMU group 10 00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
IOMMU group 11 00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7abb] (rev 11)
IOMMU group 12 00:1c.5 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #6 [8086:7abd] (rev 11)
IOMMU group 13 00:1c.7 PCI bridge [0604]: Intel Corporation Device [8086:7abf] (rev 11)
IOMMU group 14 00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:7ab0] (rev 11)
IOMMU group 15 00:1f.0 ISA bridge [0601]: Intel Corporation Z690 Chipset LPC/eSPI Controller [8086:7a84] (rev 11)
IOMMU group 15 00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
IOMMU group 15 00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
IOMMU group 15 00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
IOMMU group 15 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (17) I219-V [8086:1a1d] (rev 11)
IOMMU group 16 01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU group 17 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU group 18 03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c1)
IOMMU group 19 03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 [8086:460d] (rev 02)
IOMMU group 20 04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU group 21 05:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU group 22 07:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:3042]
IOMMU group 23 08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. Killer E3000 2.5GbE Controller [10ec:3000] (rev 06)
IOMMU group 24 0a:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU group 25 0b:00.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU group 26 0b:01.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU group 27 0b:02.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU group 28 0b:03.0 PCI bridge [0604]: Intel Corporation Thunderbolt 4 Bridge [Maple Ridge 4C 2020] [8086:1136] (rev 02)
IOMMU group 29 0c:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 NHI [Maple Ridge 4C 2020] [8086:1137]
IOMMU group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c)
IOMMU group 30 40:00.0 USB controller [0c03]: Intel Corporation Thunderbolt 4 USB Controller [Maple Ridge 4C 2020] [8086:1138]
IOMMU group 3 00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 02)
IOMMU group 4 00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
IOMMU group 4 00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
IOMMU group 5 00:14.3 Network controller [0280]: Intel Corporation Alder Lake-S PCH CNVi WiFi [8086:7af0] (rev 11)
IOMMU group 6 00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH I2C Controller #0 [8086:7acc] (rev 11)
IOMMU group 7 00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
IOMMU group 8 00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
IOMMU group 9 00:1a.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac8] (rev 11)

Hope you can see somthing strange which could help.

Thanks for your time
Dux

leesteken · Jul 12, 2022

duxnobis13 said:
AsRock Taichi Z690

I can't find whether Intel Z690 supports VT-d, but I assume so since you have IOMMU groups (please retest without pcie_acs_override).

duxnobis13 said:
Intel i9 12900K

This CPU supports VT-d.

duxnobis13 said:
ASUS AMD Radeon RX 6800 XT 16GB

AMD GPUs from the 6000 series should have no problems resetting. I suggest not blacklisting amdgpu and not early binding (any part of) it to vfio-pci and letting it take care of releasing the boot framebuffer before vfio-pci takes over. Also don't use any video=...:off. Check (after making changes and rebooting) before starting the VM with lspci -k that amdgpu is active for the VGA part of the GPU. It's probably fine if the other functions are early binded to vfio-pci.
Maybe attach the files from /etc/modprobe.d/ to your next reply so I can point out some changes that might help?
A lspci -nnk of the full GPU device and your VM configuration file might also help.

duxnobis13 said:
Kernel:
Linux hci01 5.17.14-edge #1 SMP PREEMPT PVE Edge 5.17.14-1 (2022-06-09) x86_64 GNU/Linux

Sorry, I only have experience with the official PVE 5.15 kernel (and earlier). I have no idea what problems or improvements Edge 5.17 has. Can you please try with the most up-to-date pve-kernel-5.15?

duxnobis13 said:
Bootloader:
BootCurrent: 0001
Timeout: 1 seconds
BootOrder: 0001,0002,0003,0004
Boot0001* UEFI OS HD(2,GPT,76ddea56-ef64-4p33-b78e-e29b213a7121,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0002* UEFI OS HD(2,GPT,e5158e13-ab5b-4b1e-f3ea-82d6afdac10f,0x800,0x100000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0003 ubuntu HD(1,GPT,c06371f8-f8f4-42ed-9361-1641fb1d2c25,0x800,0x100000)/File(\EFI\ubuntu\shimx64.efi)..BO
Boot0004 UEFI: USB, Partition 1 PciRoot(0x0)/Pci(0x14,0x0)/USB(17,0)/HD(1,MBR,0x0,0x800,0x394d800)..BO

Weird. Looks like you are booting in UEFI from ZFS but according to the manual systemd-boot is not used, Maybe CSM is enabled? What is the output of proxmox-boot-tool status

duxnobis13 said:
Cmdline output:
initrd=\EFI\proxmox\5.17.14-edge\initrd.img-5.17.14-edge root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=efifbff

Because of the use of pcie_acs_override=downstream,multifunction any IOMMU group information is useless. And I really don't think you need pcie_acs_override. Please remove it and show the groups again. Maybe move the GPU to the PCIe x16 slot closest to the CPU. Please also remove video=efifb:off because it does not even help since kernel 5.15.

duxnobis13 said:
Hope you can see somthing strange which could help.

Yes, pcie_acs_override, the (unsupported?) kernel, unclear if it is booting in UEFI mode and the fact that a AMD 6800 XT should not need work-arounds (even when used during POST/boot).

[SOLVED] Problem with GPU Passthrough

New Member

Renowned Member

Member

Distinguished Member

Member

Member

Renowned Member

New Member

New Member

Renowned Member

Active Member

New Member

Member

New Member

Member

New Member

New Member

Distinguished Member

New Member

Distinguished Member

We value your privacy