Another Post About Passing Through A 7900 XTX - PCI Errors - IO_PAGE_FAULT

dizzydre21

Member
Apr 10, 2023
34
0
6
Hello,

I have been struggling for a couple days to get a new 7900 XTX to work in an EndeavourOS VM. I've read through many posts on this forum as well as on Reddit.

I also have a 4070ti Super in my system that has been working for months. I got tired of the Nvidia issues in EndeavourOS, so I have added the 7900XTX for it.

Anyways, I was getting tons of correctable PCI errors at first in Proxmox syslog and my IPMI logs. The errors only occurred when booting the VM. Whenever EndeavourOS would actually boot, it would not load amdgpu at all and was only accessible with ssh. Other times it would not boot and would eventually time out with exit code 1. After adding the romfile to my VM config and rebooting, the PCI errors seem to have went away, but now I have two other errors:

pve kernel: vfio-pci 0000:84:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x006a address=0xf7900747000 flags=0x0020]
pve kernel: vfio-pci 0000:84:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x006a address=0xf7900767000 flags=0x0020]

EndeavourOS now boots and the 7900 XTX is recognized, with amdgpu being the driver used. I can remote in via Sunshine/Moonlight and it seems happy. I just want to resolve the errors and make sure I'm good to go.

I have CSM, Above 4G decoding, and Re-Size BAR Support enabled in my BIOS.

My Grub Params:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=acpi_cpufreq_init amd_pstate.shared_mem=1 amd_pstate=active"

VM Config:
Code:
affinity: 0-11
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 12
cpu: host
efidisk0: tank_nvme:vm-104-disk-0,efitype=4m,size=1M
hostpci0: 0000:42:11.2,pcie=1
hostpci1: 0000:84:00,pcie=1,romfile=7900xtx.rom
machine: q35
memory: 16384
meta: creation-qemu=8.1.2,ctime=1707322107
name: EndeavourOS-AMD
numa: 0
ostype: l26
scsi0: tank_nvme:vm-104-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: tank_nvme:vm-104-disk-2,discard=on,iothread=1,size=64G,ssd=1
scsi2: storage_nvme:vm-104-disk-0,backup=0,discard=on,iothread=1,size=512G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=2b620f67-9099-476f-b8a4-6b845bc8524d
sockets: 1
vga: none
vmgenid: 9fb59840-4171-437f-b034-d459120f4f11

OS and Hardware:
Proxmox 8.1.4
Motherboard - Asrock Rack Rome8d-2t
CPU - Epyc 7443p
RAM - 256GB 3200MHZ ECC
GPU1 - Asus TUF RTX-4070ti Super, PCIe7
GPU2 - Sapphire Pulse 7900 XTX, PCIe1
OS Drive - Samsung 980 Pro 500GB
VM OS Drives - ZFS Mirror 2x960GB Samsung P9A3, Oculink 1/2
VM Storage - ZFS Striped Mirror 4x1TB WD SN850, bifurcation card PCIe5
TrueNAS Drives - 4x12TB Seagate x16, SATA 4-7
PCIe NIC - 82599ES 10Gbe - passed through to TrueNAS, PCIe4
 
I have CSM, Above 4G decoding, and Re-Size BAR Support enabled in my BIOS.
Resizable bar is not (yet) supported and the current advise it to disable it.
My Grub Params:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=acpi_cpufreq_init amd_pstate.shared_mem=1 amd_pstate=active"
amd_iommu=on is nonsense as it is on by default (and officially is not a valid value).
 
Resizable bar is not (yet) supported and the current advise it to disable it.

amd_iommu=on is nonsense as it is on by default (and officially is not a valid value).
Thanks for the quick response. I was aware amd_iommu=on didn't do anything, though I wasn't at the time that I added it. I just removed it ran update-grub. I also disabled Re-Size BAR Support in my BIOS.

I did not get any errors upon boot and haven't for several reboots. I ran a couple games and the GPU seems to be doing it's thing. I would like to mention that I did have a reboot just after posting that did not have any errors either, but perhaps it was coincidence.

Is there any indication on when Resizeable BAR will be supported? I've had it enabled in my BIOS since I built the machine. At one point, I did have some PCIe errors on the same PCIe port that the GPU is in, but at the time it had a bifurcation card in it. It was a cheap card, so I replaced it with a nice one and the errors went away. Think there could be any relation?
 
Last edited:
You got me to search the internet for this and I found: https://angrysysadmins.tech/index.p...eable-bar-rebar-in-your-vfio-virtual-machine/ and https://gitlab.com/qemu-project/qemu/-/issues/703 . Maybe there is more on this Proxmox forum as well?

EDIT: The script from first link seem to work (with a 6950XT on a Linux VM).
I will have to read into those.

I don't know if I missed it or it just came back up after rebooting EndeavourOS a few times, but I still have this error:

AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x006a address=0xf7900746f40 flags=0x0020]

My config is still the same as my previous post, so I don't know what's causing it.
 
Today I ran a new game in a VM and got similar error messages: kernel: vfio-pci 0000:0d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002d address=0xf7d3a384100 flags=0x0000] (which is my passthrough 6950XT). I'm going to disable Resizable BAR and above 4G decoding again to see if it makes a difference.

EDIT: The game is The TALOS Principle II and Resizable BAR does not seem to make a difference.
 
Last edited:
Today I ran a new game in a VM and got similar error messages: kernel: vfio-pci 0000:0d:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x002d address=0xf7d3a384100 flags=0x0000] (which is my passthrough 6950XT). I'm going to disable Resizable BAR and above 4G decoding again to see if it makes a difference.

EDIT: The game is The TALOS Principle II and Resizable BAR does not seem to make a difference.
To be honest, I can't remember if I changed the BIOS configuration after my last post, but the fault doesn't seem to pop up anymore.

I have two GPUs in my system on an 850 watt PSU that only has 4 PCIe power ports. I don't use the GPUs at the same time so it hasn't been too much of an issue. I originally had a splitter on one of the three 8-pin power ports on the 7900XTX, though. My other GPU is a 4070ti Super, which is less power hungry, so I put the splitter on it and have 3 power cables to the 7900XTX now. I'm pretty sure that is all I changed.
 
I have two GPUs in my system on an 850 watt PSU that only has 4 PCIe power ports. I don't use the GPUs at the same time so it hasn't been too much of an issue. I originally had a splitter on one of the three 8-pin power ports on the 7900XTX, though. My other GPU is a 4070ti Super, which is less power hungry, so I put the splitter on it and have 3 power cables to the 7900XTX now. I'm pretty sure that is all I changed.
Interesting. The new game might be more power hungry, but I i have both PCIe power cables separate on a recently bought 850 PSU and the game is limited by V-sync. The other GPU only takes power from the PCIe slot (less then 75W). Thank you for letting me know, I'll have a look at power distribution in my system.

EDIT: The game works fine on a Windows VM (same config as Linux), so it's probably the game or a Steam Proton issue. I found some information on the internet that suggest a regression in Mesa 23.x that might have been fixed but my Mint VM might be a little behind. I guess this rules out a hardware issue, thankfully.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!