[SOLVED] Problem with GPU Passthrough

rumble06

New Member
Jul 12, 2019
14
1
1
25
I successfully installed Proxmox and did a GPU Passthrough on a test HDD (80GB) without any trouble following this guide: https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/

However, once moved to the main storage (installed proxmox on a 750GB HDD and created a LVM Thin storage on a nvme ssd 512gb where I choose to install the VM), I started having troubles. Whenever I try to install my GPU driver I get disconnected from RDP and instead of the run icon, the VM has an exclamation icon with "Status: internal-error". My syslog is spamed with this message: "Jul 14 02:00:01 proxmox kernel: [ 6044.433981] vfio-pci 0000:07:00.0: BAR 0: can't reserve [mem 0xe0000000-0xefffffff 64bit pref]". I even tried the vbios stuff. On wiki there was a similar error with a fix, but instead of BAR 0 it was BAR 3, however that didn't worked neither, here is that line from the grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction video=vesafb:off,efifb:off"
 
Why do you think it's the HDD?
I don't, however the same steps were reproduced and didn't got it to work. I specified that because it may help someone realize what kind of problem i have, knowing that the hardware supports it and it works great with another proxmox installation.
 
I also had a problem with "BAR 0: can't reserve" when creating my Gamer-VM. The Prpboelm was resolved when I added this line to /etc/default/grub
Code:
GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"

This basically tells your kernel to run in text mode (which is what I want for console anyway :D) and use as video device astdrmfb which is the build-in graphics card of my mainboard. With the efifb: off you tell your system not to use EFI frame buffer, which will access your graphics card before the kernel parameter are applied.

The combination of multiple video parameters like video=vesafb: off,efifb: off did not work for me.

Also double-check if the modules for your graphics card are blacklisted!
 
I also had a problem with "BAR 0: can't reserve" when creating my Gamer-VM. The Prpboelm was resolved when I added this line to /etc/default/grub
Code:
GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"

This basically tells your kernel to run in text mode (which is what I want for console anyway :D) and use as video device astdrmfb which is the build-in graphics card of my mainboard. With the efifb: off you tell your system not to use EFI frame buffer, which will access your graphics card before the kernel parameter are applied.

The combination of multiple video parameters like video=vesafb: off,efifb: off did not work for me.

Also double-check if the modules for your graphics card are blacklisted!

You are heaven on earth. I kept my grub command however intead of 'video=vesafb:eek:ff,efifb:eek:ff' i put 'video=vesafb:eek:ff video=efifb:eek:ff' and it works. Was able to install gpu driver, monitor is on and no more errorlog spam.
 
  • Like
Reactions: niziak
I also had a problem with "BAR 0: can't reserve" when creating my Gamer-VM. The Prpboelm was resolved when I added this line to /etc/default/grub
Code:
GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"

This basically tells your kernel to run in text mode (which is what I want for console anyway :D) and use as video device astdrmfb which is the build-in graphics card of my mainboard. With the efifb: off you tell your system not to use EFI frame buffer, which will access your graphics card before the kernel parameter are applied.

The combination of multiple video parameters like video=vesafb: off,efifb: off did not work for me.

Also double-check if the modules for your graphics card are blacklisted!
This also seems to have worked for GPU passthrough (of an RTX 3090) on an Asus KRPA-U16 (AMD EPYC 7002) motherboard (CPU is AMD EPYC 7302p), where there was no way I could find of selecting/forcing the built-in Aspeed AST2500 graphics adapter as the default boot graphics adapter in the BIOS: The BIOS insists on initializing the AST2500 first, then switching to PCIe (Geforce RTX 3090).

Adding the kernel command line arguments textonly video=astdrmfb seems to have done the trick - the complete command line now being:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on textonly video=astdrmfb video=efifb:off"

It is however still being set as the boot VGA device (c1:00 is the GeForce GPU, c3:00 is the Aspeed AST2500):
Code:
Jan  5 18:20:27 aether kernel: [    2.226513] pci 0000:c1:00.0: vgaarb: setting as boot VGA device
Jan  5 18:20:27 aether kernel: [    2.226513] pci 0000:c1:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
Jan  5 18:20:27 aether kernel: [    2.226513] pci 0000:c3:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
Jan  5 18:20:27 aether kernel: [    2.226513] pci 0000:c1:00.0: vgaarb: bridge control possible
Jan  5 18:20:27 aether kernel: [    2.226513] pci 0000:c3:00.0: vgaarb: bridge control possible

And I am getting some quite troubling messages:
Code:
Jan  5 18:20:27 aether kernel: [    2.836270] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
Jan  5 18:20:27 aether kernel: [    2.836271] {1}[Hardware Error]: It has been corrected by h/w and requires no further action
Jan  5 18:20:27 aether kernel: [    2.836272] {1}[Hardware Error]: event severity: corrected
Jan  5 18:20:27 aether kernel: [    2.836273] {1}[Hardware Error]:  Error 0, type: corrected
Jan  5 18:20:27 aether kernel: [    2.836273] {1}[Hardware Error]:  fru_text: PcieError
Jan  5 18:20:27 aether kernel: [    2.836274] {1}[Hardware Error]:   section_type: PCIe error
Jan  5 18:20:27 aether kernel: [    2.836274] {1}[Hardware Error]:   port_type: 1, legacy PCI end point
Jan  5 18:20:27 aether kernel: [    2.836275] {1}[Hardware Error]:   version: 0.2
Jan  5 18:20:27 aether kernel: [    2.836275] {1}[Hardware Error]:   command: 0x0007, status: 0x0010
Jan  5 18:20:27 aether kernel: [    2.836276] {1}[Hardware Error]:   device_id: 0000:c1:00.0
Jan  5 18:20:27 aether kernel: [    2.836276] {1}[Hardware Error]:   slot: 0
Jan  5 18:20:27 aether kernel: [    2.836276] {1}[Hardware Error]:   secondary_bus: 0x00
Jan  5 18:20:27 aether kernel: [    2.836277] {1}[Hardware Error]:   vendor_id: 0x10de, device_id: 0x2204
Jan  5 18:20:27 aether kernel: [    2.836277] {1}[Hardware Error]:   class_code: 030000
Jan  5 18:20:27 aether kernel: [    2.836278] {1}[Hardware Error]:   bridge: secondary_status: 0xc000, control: 0x0000
Jan  5 18:20:27 aether kernel: [    2.836278] {1}[Hardware Error]:  Error 1, type: corrected
Jan  5 18:20:27 aether kernel: [    2.836279] {1}[Hardware Error]:   section_type: PCIe error
Jan  5 18:20:27 aether kernel: [    2.836279] {1}[Hardware Error]:   port_type: 0, PCIe end point
Jan  5 18:20:27 aether kernel: [    2.836279] {1}[Hardware Error]:   version: 0.2
Jan  5 18:20:27 aether kernel: [    2.836280] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
Jan  5 18:20:27 aether kernel: [    2.836280] {1}[Hardware Error]:   device_id: 0000:c1:00.1
Jan  5 18:20:27 aether kernel: [    2.836280] {1}[Hardware Error]:   slot: 0
Jan  5 18:20:27 aether kernel: [    2.836281] {1}[Hardware Error]:   secondary_bus: 0x00
Jan  5 18:20:27 aether kernel: [    2.836281] {1}[Hardware Error]:   vendor_id: 0x10de, device_id: 0x1aef
Jan  5 18:20:27 aether kernel: [    2.836281] {1}[Hardware Error]:   class_code: 040300
Jan  5 18:20:27 aether kernel: [    2.836282] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
Jan  5 18:20:27 aether kernel: [    2.836282] {1}[Hardware Error]:  Error 2, type: corrected
Jan  5 18:20:27 aether kernel: [    2.836283] {1}[Hardware Error]:   section_type: PCIe error
Jan  5 18:20:27 aether kernel: [    2.836283] {1}[Hardware Error]:   port_type: 1, legacy PCI end point
Jan  5 18:20:27 aether kernel: [    2.836283] {1}[Hardware Error]:   version: 0.2
Jan  5 18:20:27 aether kernel: [    2.836284] {1}[Hardware Error]:   command: 0x0007, status: 0x0010
Jan  5 18:20:27 aether kernel: [    2.836284] {1}[Hardware Error]:   device_id: 0000:c1:00.0
Jan  5 18:20:27 aether kernel: [    2.836284] {1}[Hardware Error]:   slot: 0
Jan  5 18:20:27 aether kernel: [    2.836285] {1}[Hardware Error]:   secondary_bus: 0x00
Jan  5 18:20:27 aether kernel: [    2.836285] {1}[Hardware Error]:   vendor_id: 0x10de, device_id: 0x2204
Jan  5 18:20:27 aether kernel: [    2.836285] {1}[Hardware Error]:   class_code: 030000
Jan  5 18:20:27 aether kernel: [    2.836286] {1}[Hardware Error]:   bridge: secondary_status: 0xc000, control: 0x0000
...
Jan  5 18:20:27 aether kernel: [    2.836324] {1}[Hardware Error]:  Error 13, type: corrected
Jan  5 18:20:27 aether kernel: [    2.836324] {1}[Hardware Error]:   section_type: PCIe error
Jan  5 18:20:27 aether kernel: [    2.836324] {1}[Hardware Error]:   port_type: 0, PCIe end point
Jan  5 18:20:27 aether kernel: [    2.836324] {1}[Hardware Error]:   version: 0.2
Jan  5 18:20:27 aether kernel: [    2.836325] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
Jan  5 18:20:27 aether kernel: [    2.836325] {1}[Hardware Error]:   device_id: 0000:c1:00.1
Jan  5 18:20:27 aether kernel: [    2.836326] {1}[Hardware Error]:   slot: 0
Jan  5 18:20:27 aether kernel: [    2.836326] {1}[Hardware Error]:   secondary_bus: 0x00
Jan  5 18:20:27 aether kernel: [    2.836326] {1}[Hardware Error]:   vendor_id: 0x10de, device_id: 0x1aef
Jan  5 18:20:27 aether kernel: [    2.836327] {1}[Hardware Error]:   class_code: 040300
Jan  5 18:20:27 aether kernel: [    2.836327] {1}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
Jan  5 18:20:27 aether kernel: [    2.836336] pci 0000:c1:00.0: AER: aer_status: 0x00008000, aer_mask: 0x00000000
Jan  5 18:20:27 aether kernel: [    2.836343] fbcon: Taking over console
Jan  5 18:20:27 aether kernel: [    2.836345] pci 0000:c1:00.0: AER:    [15] HeaderOF              
Jan  5 18:20:27 aether kernel: [    2.836347] pci 0000:c1:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
Jan  5 18:20:27 aether kernel: [    2.836350] pci 0000:c1:00.1: AER: aer_status: 0x00008000, aer_mask: 0x00000000
...

But vfio seems to load and run without fuss:
Code:
root@aether:~# grep vfio /var/log/syslog
Jan  5 18:20:27 aether systemd-modules-load[591]: Inserted module 'vfio'
Jan  5 18:20:27 aether systemd-modules-load[591]: Inserted module 'vfio_pci'
Jan  5 18:20:28 aether kernel: [    8.831383] vfio-pci 0000:c1:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Jan  5 18:20:28 aether kernel: [    8.850787] vfio_pci: add [10de:2204[ffffffff:ffffffff]] class 0x000000/00000000
Jan  5 18:20:28 aether kernel: [    8.874835] vfio_pci: add [10de:1aef[ffffffff:ffffffff]] class 0x000000/00000000
Jan  5 18:26:26 aether kernel: [  370.819453] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Jan  5 18:26:26 aether kernel: [  370.819474] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Jan  5 18:26:26 aether kernel: [  370.819482] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Jan  5 18:26:26 aether kernel: [  370.819483] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Jan  5 18:26:26 aether kernel: [  370.819484] vfio-pci 0000:c1:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Jan  5 18:26:26 aether kernel: [  370.855332] vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Jan  5 18:26:26 aether kernel: [  370.875424] vfio-pci 0000:c1:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
Jan  5 18:26:27 aether kernel: [  372.448090] vfio-pci 0000:c1:00.0: No more image in the PCI ROM
Jan  5 18:26:27 aether kernel: [  372.448115] vfio-pci 0000:c1:00.0: No more image in the PCI ROM
 
  • Like
Reactions: UnLock
I also had a problem with "BAR 0: can't reserve" when creating my Gamer-VM. The Prpboelm was resolved when I added this line to /etc/default/grub
Code:
GRUB_CMDLINE_LINUX="textonly video=astdrmfb video=efifb:off"

This basically tells your kernel to run in text mode (which is what I want for console anyway :D) and use as video device astdrmfb which is the build-in graphics card of my mainboard. With the efifb: off you tell your system not to use EFI frame buffer, which will access your graphics card before the kernel parameter are applied.

The combination of multiple video parameters like video=vesafb: off,efifb: off did not work for me.

Also double-check if the modules for your graphics card are blacklisted!
The combination .video=vesafb:off,efifb:off didn't work for me either. Just using video=efifb:off did work
 
Last edited:
For me it only worked the following way:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on nomodeset video=vesafb:off video=efifb:off"

The expression video=vesafb:off,efifb:off didn't work.

Also I had a problem connecting to the graphix card when there was no nomodeset .

Additionally I set iommu=pt after amd_iommu=on because ProxMox PCI passthrough howto recommends it, and also there is no harm in doing so, at least in my case.
 
For me it only worked the following way:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on nomodeset video=vesafb:off video=efifb:off"

The expression video=vesafb:off,efifb:off didn't work.

Also I had a problem connecting to the graphix card when there was no nomodeset .

Additionally I set iommu=pt after amd_iommu=on because ProxMox PCI passthrough howto recommends it, and also there is no harm in doing so, at least in my case.
Please allow me to explain some of these parameters and don't take it as criticism on your (working) setup.
Indeed video=vesafb:off,efifb:off is no longer valid and you need to split it into two, but many guides have not been updated.
amd_iommu=on is not necessary (because it is on by default) and is actually invalid. But it does not harm, as you said, because invalid parameters are ignored.
iommu=pt sets the IOMMU-mapping for non-passthrough devices to the identity mapping and has therefore nothing to do with passthrough. I don't know why people keep adding this when they enable passthrough with AMD, as IOMMU is already on regardless. Maybe it does something useful for Intel system, like disable DMAR?
 
  • Like
Reactions: Sticky1981
Finally it worked perfect with
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset video=vesafb:off video=efifb:off"
, too.
 
  • Like
Reactions: Sticky1981
thanks mate here is the updated grub as i updated to 5.15

Code:
video=vesafb:off video=efifb:off video=simplefb:off
1651444045028.png

updated and ran these
1651444153113.png

how do I test it? Is sending this command below and seeing an output enough?
Code:
dmesg | grep -e dmar -e iommu
 
Last edited:
Hello all,

since kernel 5.15 windows VM boot only works after boot host with HDMI disconnected. Monitor power supply disconnect is not enough, need to unplug hdmi cable.

Errormessages on dmesg when try to boot VM and HDMI was connected during host boot (lots of them!):
Code:
BAR 1: can't reserve [mem 0x7fe0000000-0x7fefffffff 64bit pref]

Asrock Rack X570 with IPMI
GTX960 vga passthrough to windows server 2022 vm

cat /proc/cmdline
Code:
initrd=\EFI\proxmox\5.15.35-1-pve\initrd.img-5.15.35-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs video=vesafb:off video=efifb:off video=simplefb:off vfio-pci.ids=10ec:8161,10de:1401,10de:0fba iommu=pt
 
  • Like
Reactions: nienna
I got same errors.
If I try to start my Win11 VM, the log is pumped full of:
2022-05-07_13-44-28.jpg


my grub config:
2022-05-07_13-46-41.jpg


With the old pie-version & kernel it has worked well.

Would be happy if someone can address that better than me ;)
 
I got same errors.
If I try to start my Win11 VM, the log is pumped full of:
2022-05-07_13-44-28.jpg


my grub config:
2022-05-07_13-46-41.jpg


With the old pie-version & kernel it has worked well.

Would be happy if someone can address that better than me ;)
Same error here, am using same cmd_linux_default, check if you /var/log files are very biggest, because are eating my left space
 
Solved!

Using:
echo 1 > /sys/bus/pci/devices/0000\:09\:00.0/remove
echo 1 > /sys/bus/pci/rescan

You can create a .sh chmod +x and add it to cron

File: /root/fix_gpu_pass.sh

//Note Change "0000\:0X\:00.0" for your GPU PCI ID

#!/bin/bash
echo 1 > /sys/bus/pci/devices/0000\:0X\:00.0/remove
echo 1 > /sys/bus/pci/rescan

Add to cron:

crontab -e

add:

@reboot /root/fix_gpu_pass.sh

Published in other post
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!