Solved!
Using:
echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
echo 1 > /sys/bus/pci/rescan
You can create a .sh chmod +x and add it to cron
File: /root/fix_gpu_pass.sh
//Note Change "0000\:0X\:00.0" for your GPU PCI ID
#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan
Add to cron:
crontab -e
add:
@reboot /root/fix_gpu_pass.sh
Published in other post
Thank you for your post. I have just spent 2 days and 2 nights trying to eradicate this nasty BOOTFB memory lock brought by the broken 5.15 kernels. Obviously this awful regression happens (as usual) when I am in the process of installing and configuring a new GPU with pass-through, making the entire process much more time consuming and painful than it should be.
I pinpointed the problem, as many others, to the new kernel just ignoring any boot argument, and locking a GPU reserved memory range whatever the cmdline parameter you throw at it.
root=ZFS=rpool/ROOT/pve-1 boot=zfs iommu=pt amd_iommu=on kvm_amd.npt=1 kvm_amd.avic=1 pcie_acs_override=downstream,multifunction vfio_iommu_type1 allow_unsafe_interrupts=1 video=efifbff video=vesafbff video=simplefbff nofb nomodeset quiet
I downgraded to 5.13 and realized that I had wasted another week-end to kernel regressions, as the VM would get its GPU passthrough working straight away. Not willing to pin the kernel version, I was looking for a way to eradicate this ridiculous BOOTFB lock, as I already had to do the same in a previous life. But I could not get it to work again as the command was different.
To make things a bit clearer for desperate souls wandering this forum, here is what you get when you are plagued by this bug once you've done all you can to get the pass-through working (enabling IOMMU, changing boot parameters, getting Dev ID into vfio.conf...) :
Symptom : in
dmesg you get one instance (or several thousand) of the super infamous
vfio-pci 0000:06:00.0: BAR 0: can't reserve [mem 0xd0000000-0xdfffffff 64bit pref]
It means that the
vfio-pci pass-through driver is trying to lock the device (GPU) but is unable to do so, because another process is already accessing the same memory range.
Important to know that the actual memory range is indicated in the log message with 0x HEX prefix (
0xd0000000-0xdfffffff)
To dig a bit further the secret command required is :
cat /proc/iomem which returns the memory reservation ranges
Then you look at the memory range used by your GPU PCIe device (in my case
0000:06:00.0)
c0000000-fec2ffff : PCI Bus 0000:00
<-- Start of the PCIE bus reservation
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI
Bus 0000:06 <-- Hey here's my GPU!
d0000000-dfffffff : 0000:06:00.0 <-- Look Same address as in the error message !
d0000000-d02fffff :
BOOTFB <-- WHAT THE Fùµ) is this?
e0000000-e01fffff : 0000:06:00.0
This BOOTFB frame buffer is NOT there when using the 5.13 kernel. It is supposed to be blocked by he cmdline parameter
video=efifbff
Another of these frame buffers is
simplefb which has the same behavior but gets correctly blocked by
video=simplefbff
This is what it should look like after applying your deep cleansing method.
No more BOOTFB BS!
c0000000-fec2ffff : PCI Bus 0000:00
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI Bus 0000:06
d0000000-dfffffff : 0000:06:00.0
e0000000-e01fffff : 0000:06:00.0
And here's what happens once the GPU gets reserved for the pass-through driver (vfio-pci) and the VM is running in the background
c0000000-fec2ffff : PCI Bus 0000:00
c0000000-c13fffff : PCI Bus 0000:0f
c0000000-c00fffff : 0000:0f:00.0
c0100000-c01fffff : 0000:0f:00.0
c0200000-c11fffff : 0000:0f:00.0
c1200000-c120ffff : 0000:0f:00.0
c1210000-c130ffff : 0000:0f:00.0
d0000000-e01fffff : PCI Bus 0000:02
d0000000-e01fffff : PCI Bus 0000:03
d0000000-e01fffff : PCI Bus 0000:04
d0000000-e01fffff : PCI Bus 0000:05
d0000000-e01fffff : PCI Bus 0000:06
d0000000-dfffffff : 0000:06:00.0
d0000000-dfffffff :
vfio-pci <-- Pass-through driver correctly garbing the PCIE device!
e0000000-e01fffff : 0000:06:00.0
e0000000-e01fffff : vfio-pci
Your post saved the day and I could happily kick this nasty BOOTFB bugger out of my GPU reserved memory.
Obviously, as any flame thrower, it is not perfect to light up a cigarette, but it'll do the job for the moment...
Last important note for AMD GPU users :
/etc/modprobe.d/vfio.conf does not need to be set to the GPU Ids. Apparently, the pass-though still works even with amdgpu driver being used by host at boot time (hence not requiring to be blacklisted). I have my RX 6500 XT running fine inside a Windows VM with such a configuration.
Nvidia GPU behave differently and usually require the proper settings to be added to /etc/modprobe.d/vfio.conf + unlocked ROM file passed at VM launch for some GPUs at least.