[SOLVED] GPU Passthrough Issues After Upgrade to 7.2

Apr 3, 2022
110
40
28
After upgrading to 7.2, my GPU Passthrough was no longer functioning.

I have /etc/kernel/cmdline setup as follows:
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt video=efifb:off

Under the previous kernel (5.13.19-6-pve), I wouldn't see any output on the screen once the bootloader screen completed. After switching to 5.15.30-2-pve, I would see output up until a certain point. It's almost like it's not honoring the video=efifb:off statement from above.

Switching back to 5.13.19-6-pve resolved the issue.

Any thoughts on how I can resolve this?
 

leesteken

Famous Member
May 31, 2020
2,353
502
118
With 7.2 (and kernel 5.15 on 7.1), I would expect video=simplefb:off to fix this (where video=efifb:off worked in 5.13), but I my experience this does not work for every GPU and BOOTFB does not release memory (as can be seen in /proc/iomem).
I found that, for AMD GPUs, un-blacklisting amdgpu and not early binding to vfio_pci and removing video=efifb:off and similar parameters works best. amdgpu just takes over from the BOOTFB, and does release the GPU nicely to vfio_pci when starting the VM. (Of course, for AMD vendor-reset and reset_method=device_specific might be required.)
I don't know if this also works for nouveau or i915.
 
Apr 3, 2022
110
40
28
With 7.2 (and kernel 5.15 on 7.1), I would expect video=simplefb:off to fix this (where video=efifb:off worked in 5.13), but I my experience this does not work for every GPU and BOOTFB does not release memory (as can be seen in /proc/iomem).
I found that, for AMD GPUs, un-blacklisting amdgpu and not early binding to vfio_pci and removing video=efifb:off and similar parameters works best. amdgpu just takes over from the BOOTFB, and does release the GPU nicely to vfio_pci when starting the VM. (Of course, for AMD vendor-reset and reset_method=device_specific might be required.)
I don't know if this also works for nouveau or i915.
Grrr... that didn't work. I read through some of the other posts regarding the 5.15 kernel and it seems like I'll just stick with 5.13 for now.

Setting video=simplefb:off did prevent the startup messages from displaying... but it's still loading something into memory.

This is a snippet from /proc/iomem:
6000000000-6201ffffff : PCI Bus 0000:01 6000000000-61ffffffff : 0000:01:00.0 6000000000-60002fffff : BOOTFB 6200000000-6201ffffff : 0000:01:00.0 6200000000-6201ffffff : vfio-pci
 
Apr 3, 2022
110
40
28
I'm sorry it did not work for you. What brand/type of GPU are you using?
EDIT: The only work-around that I could find was forcefully-remove-bootfb but I have not tried it (as amdgpu works for me).
I'm running an EVGA 3060TI. I was actually just looking through that same workaround on Github.

EDIT: Think I'll just keep an eye out for a newer fix as this workaround is pretty old.
 
Last edited:

mirthrandir

New Member
Mar 31, 2021
5
1
3
I'm experiencing the same problem after upgrading to Proxmox 7.2 with the 5.15-30-2-pve kernel. I have a TRX40 motherboard with two GPUS:
  1. AMD RX 580 for Linux
  2. NVIDIA RTX 2070 Super for Windows
With the 5.15 kernel, the GPU passthrough of the NVIDIA GPU works fine for Windows; however, Linux with the AMD GPU repeatedly displays the following the following error message just before the graphical display manager would normally appear:

vfio-pci 0000:4a:00.0: BAR 0: can’t reserve [mem 0xc0000000-0xcffffffff 64bit pref]

Setting video=simplefb:off hid the Proxmox and Linux boot messages, but it didn’t fix the problem.

Booting with the 5.13 kernel works fine.
 

leesteken

Famous Member
May 31, 2020
2,353
502
118
Linux with the AMD GPU repeatedly displays the following the following error message just before the graphical display manager would normally appear:
vfio-pci 0000:4a:00.0: BAR 0: can’t reserve [mem 0xc0000000-0xcffffffff 64bit pref]
For the RX580 I expect that un-blacklisting amdgpu and not early binding to vfio_pci and removing all video=... parameters would work fine. This works for my RX570 but you also need vendor-reset and reset_method=device_specific for this to work.
 
Aug 1, 2019
21
4
8
42
Same problem here with 5.15.30 using AMD cpus and Nvidia cards. I've noticed that the mainline kernels 5.15.33 to .36 mention a lot of iommu and vfio changes/fixes but I'm unsure if any are relevent. I'm going to test 5.15.37 on my home lab this weekend and will report my findings to proxmox devs if successful. Also confirming going back to 5.13 'fixes' the issue for me
 

CodingGuy

New Member
May 6, 2022
1
0
1
Had the same problem yesterday, I am also using an AMD CPU with an Nvidia GPU. I got bar 1 can't reserve so mine might be different.

Code:
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt video=efifb:off video=simplefb:off video=vesafb:off"

I ended up changing back to 5.13 like mentioned by @WesC it also worked for me
Code:
GRUB_DEFAULT="gnulinux-advanced-d3b0c179-b5c2-4a5c-9692-7541e17557bc>gnulinux-5.13.19-6-pve-advanced-d3b0c179-b5c2-4a5c-969>
I ended up adding this option in my grub file, and now it boots just fine and my VMs can use the GPU again.

Here is the resource I used to find the correct grub setting: https://unix.stackexchange.com/questions/198003/set-default-kernel-in-grub

Thanks btw, I thought I was the only one with this issue, you guys rock!
 
Last edited:

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
7,384
1,186
164
I ended up adding this option in my grub file, and now it boots just fine and my VMs can use the GPU again.
just as a side note - with PVE 7.2 you can also use
Code:
proxmox-boot-tool kernel pin 5.13.19-6-pve

for the same effect :)
 

StephenM64

New Member
May 7, 2022
1
4
3
Frustratingly also have the "BAR 0: can’t reserve" issue and can no longer do GPU passthrough after the 7.2 update.

Tried all fixes except for pinning the kernel to the older version, which I would prefer not to do. Annoying as hell to use it as a sacrifice but ordered a cpu upgrade for the system that has a integrated gpu, meanwhile the sketchy workaround of removing the device and doing a rescan before starting the vm does work:
Code:
echo 1 > /sys/bus/pci/devices/0000\:XX\:00.0/remove
echo 1 > /sys/bus/pci/rescan
 

leesteken

Famous Member
May 31, 2020
2,353
502
118
I just remembered that I also run echo 0 | tee /sys/class/vtconsole/vtcon*/bind >/dev/null before starting the VM. Maybe that helps in releasing the iomem?
 
Apr 3, 2022
110
40
28
hmmm... when I get some time this week I'm going to try these last two suggestions in the thread. if anything works I'm just going to build it into a hook script that runs before the VM launches. hacky? for sure, but it would give me a chance to see if running 5.15 has any another big problems for my setup.
 

Kodey

Member
Oct 26, 2021
51
0
6
Frustratingly also have the "BAR 0: can’t reserve" issue and can no longer do GPU passthrough after the 7.2 update.

Tried all fixes except for pinning the kernel to the older version, which I would prefer not to do. Annoying as hell to use it as a sacrifice but ordered a cpu upgrade for the system that has a integrated gpu, meanwhile the sketchy workaround of removing the device and doing a rescan before starting the vm does work:
Code:
echo 1 > /sys/bus/pci/devices/0000\:XX\:00.0/remove
echo 1 > /sys/bus/pci/rescan
This worked for me when nothing else suggested did.
Some interesting things I tried that I thought would have potential (amongst many other things I tried out of desperation):
  1. Explicitly setting the pcie gen in bios
  2. setting cmdline
    Code:
    root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet amd_iommu=on iommu=pt video=vesafb:off video=efifb:off video=simplefb:off nofb nomodeset kvm.ignore_msrs=1 vfio-pci.ids=1002:683f,1002:aab0,1022:1482,1022:1483 default_hugepagesz=1G hugepagesz=1G hugepages=64
  3. Blacklisting all the framebuffer modules
  4. Code:
    echo 0 > /sys/class/vtconsole/vtcon*/bind
    [/LIST]
    echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
    echo simple-framebuffer.0 > /sys/bus/platform/drivers/simple-framebuffer/unbind

    I imagine pinning the kernel might work judging by the comments here, but I didn't try that.
    Absolutely nothing I did would prevent the simplefb module from loading (even blacklisting it) but I didn't see the console when I turned it off in the boot cmdline, it loaded anyway and can't be unloaded even after the device was removed and rescanned and windows started.

    I'm passing through a visiontek radeon-7750 as the only gpu to windows 11 vm
 
Apr 3, 2022
110
40
28
Only thing that worked for me was "sketchy workaround" offered by @StephenM64

Here's my hookscript:
Bash:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Virtual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

Here's how I deployed it:
Bash:
#create snippets folder
mkdir /var/lib/vz/snippets

#create script with content above
nano /var/lib/vz/snippets/gpu-hookscript.sh

#make it executable
chmod +x /var/lib/vz/snippets/gpu-hookscript.sh

#apply script to VM
qm set 100 --hookscript local:snippets/gpu-hookscript.sh

EDIT: I am now using the method documented here. By adding initcall_blacklist=sysfb_init to the kernel parameters, simple framebuffer won't load. This negates the need for my hookscript.
 
Last edited:

Mirmanium

Member
Aug 14, 2020
41
6
13
42
Only thing that worked for me was "sketchy workaround" offered by @StephenM64

Here's my hookscript:
Bash:
#!/bin/bash

if [ $2 == "pre-start" ]
then
    echo "gpu-hookscript: Resetting GPU for Vitual Machine $1"
    echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove
    echo 1 > /sys/bus/pci/rescan
fi

Here's how I deployed it:
Bash:
#create snippets folder
mkdir /var/lib/vz/snippets

#create script with content above
nano /var/lib/vz/snippets/gpu-hookscript.sh

#make it executable
chmod +x /var/lib/vz/snippets/gpu-hookscript.sh

#apply script to VM
qm set 100 --hookscript local:snippets/gpu-hookscript.sh
This was the only thing I need to do to solve my gpu passthrough issue after upgrade from 7.1 to 7.2 for a Nvidia 1065 super to Windows11.
It was NO necessary to add any extra parameter to grub like video=simplefb:off
Thank you so much @StephenM64 and @nick.kopas
 
Last edited:

netnem

New Member
Aug 8, 2020
3
0
1
36
Code:
echo 1 > /sys/bus/pci/devices/0000\:XX\:00.0/remove
echo 1 > /sys/bus/pci/rescan

this is also the only thing that worked for me after upgrading to 5.15 (pve-7.2). I didn't downgrade kernel to 5.13, but i suspect that would have worked.

Relevent file configs:

Code:
cat /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on video=efifb:off video=vesafb:off video=simplefb:off video=astdrmfb"

cat /etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:xxxx,10de:xxx
options vfio-pci ids=10de:xxxx,10de:xxxx
 
Last edited:

celemine1gig

Member
May 22, 2020
11
0
6
52
Also would like to thank StephenM64 for his suggested workaround.
Although it seems kind of hacky, it does indeed work.

Confirmed here with a Nvidia GTX GPU.
Non-working for pass-through by default, due to the mentioned "BAR" issue, with Kernel 5.15.

After applying the workaround, it is working like a charm, passed-through to Windows.
 

mirthrandir

New Member
Mar 31, 2021
5
1
3
@StephenM64 & @nick.kopas, thank you very much for the workaround and the detailed usage instructions. Your workaround enabled GPU passthrough to work on my AMD RX 580 using Proxmox 7.2 with the 5.15.35-1-pve kernel.

Your workaround also works better than the gnif vendor-reset I was previously using as described on Nick Sherlock's blog. The gnif vendor-reset caused the fans on my AMD RX 580 to run loudly after shutting down a VM that used the AMD GPU. Your workaround doesn't have that issue. :)

Thanks again for your help!
 

Astraea

Active Member
Aug 25, 2018
165
20
38
I have tried the commands/script by @nick.kopas but still not getting the GPU to work it is now detected in the VM again but shows error 43, I also tried the script with the older 5.13 kernel as well too and the latest kernel from the test repo. I will be trying an fresh VM setup to see if that fixes my issues along with a fresh install from both 7.2 and 7.1 if it does not and see where I get to.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!