MSIX PBA outside of specified BAR | Adding GPU PCI lane (no vGPU) to machine

robseb

New Member
Apr 17, 2025
1
0
1
Hi Everybody,

we have a Dell PowerEdge R7525 with 2x AMD EPYC 74F3 24-Core Processor.
Inside this machine are two Nvidia A16 Grapic Cards.
On this PVE we want to run 8 virtual machines. Every machine should have one pci lane assigned (this config we had with vmware).

Every GPU has 4 PCI addresses:
1744902211129.png



What things did we check?:

- IOMMU activated

- In /etc/modules we added following lines:
- vfio
- vfio_iommu_type1
- vfio_virqfd
- vfio_pci
1744902926933.png

- In etc/modprobe.d/pve-blacklist.conf we added following lines:
- nouveau
- nvidia
- nvidiafb

1744903023540.png

The machine is configured like that:
1744903361354.png


Now lets get to the problem:

Whenever we add a pci lane of the graphics card, the following error occurs:

Task viewer: VM 133 - Start


kvm: -device vfio-pci,host=0000:29:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0,rombar=0: vfio 0000:29:00.0: hardware reports invalid configuration, MSIX PBA outside of specified BAR
TASK ERROR: start failed: QEMU exited with code 1


Does anyone have any ideas or suggestions as to what could be causing this?
Many thanks in advance!
 
Last edited:
Grub configurations looks now like this:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"

After this, we can start the vm with the PCI(GPU) device:
Code:
echo 1 > /sys/bus/pci/devices/0000\:29\:00.0/remove
 
echo 1 > /sys/bus/pci/rescan
 
hi,
maybe there is also a bios setting for the bar size? e.g. somebody here solved it with such a setting:
 
hi,
maybe there is also a bios setting for the bar size? e.g. somebody here solved it with such a setting:
Thank you.
Unfortunately we can not change the bus size.
The core problem is that the Nvidia GPU is in use from the system.

So the solution or workaround (thx Michi) :
Add following settings:

/etc/modules
Code:
vfio
vfio_iommu_type1
vfio_virqfd
vfio_pci

/etc/modprobe.d/pve-blacklist.conf
Code:
blacklist nouveau
blacklist nvidia
blacklist nvidiafb

/etc/default/grub
Code:
#for AMD
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pci=realloc=off"

#for Intel
#GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pci=realloc=off"

Code:
update-grub


Create a systemd service:
/etc/systemd/system/nvidia-rescan.service (this fires BEFORE Proxmox is starting VMs)


Code:
[Unit]
Description=Remove Nvidia GPU from Devices and Rescan
After=multi-user.target
Before=pve-guests.service

[Service]
Type=oneshot
ExecStart=/bin/bash -c 'echo 1 > /sys/bus/pci/devices/0000:29:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:2a:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:2b:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:2c:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:85:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:86:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:87:00.0/remove && \
                        echo 1 > /sys/bus/pci/devices/0000:88:00.0/remove && \
                        echo 1 > /sys/bus/pci/rescan'
RemainAfterExit=no

[Install]
WantedBy=multi-user.target

Enable Service
Code:
systemctl enable nvidia-rescan.service

#systemctl start nvidia-rescan.service
#systemctl status nvidia-rescan.service
 
Last edited: