AMD NAVI GPU Passthrough (5700 XT)

drEd

New Member
Jan 9, 2023
6
0
1
Hi All,

After reinstalling OS I am having issues with GPU pass-through. The error specified is| stopped: unable to read tail (got 0 bytes)

Grub configuration is| GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

/etc/modules
folder has following entries:
vendor-reset
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd


When attempting to launch VM I go to shell and type following before attempting to start VM: echo 'device_specific' > /sys/bus/pci/devices/0000:03:00.0/reset_method

kernel version: Linux 5.15.83-1-pve #1 SMP PVE 5.15.83-1

IOMMU is showing as enabled in shell
AMD VI showing as enabled in shell
 
Just a quick update I was able to get AMD gpu to automatically dismount using following commands added to GRUB_CMDLINE:
Code:
 video=simplefb:off video=efifb:off video=vesafb:off

I am still getting tail error. The VM is using Seabios on Q35 machine.
I will try create another VM using UEFI configuration and Q35 and test to see if it is same results. I found a good command posted by a user named "leesteken" that can disable amdgpu driver from shell: echo 0 | tee /sys/class/vtconsole/vtcon*/bind; sleep 3; rmmod amdgpu
 
Last edited:
If your GPU is used during boot, then you need this work-around since kernel version 5.15 instead of the old ones. Check with cat /proc/cmdline if you are editing the right boot loader.
Very interesting I am still trying to mess around and will read through these in detail thank you for these links. I ended up removing Vendor Reset as it appeared to do nothing and previously on original build I was able to pass-through my card even though they state it is not supported as it is a XT and suffers some reset issue but it worked fine previously which is strange I cannot explain why TBH probably had a good Kernel Version for it and got lucky.

My output is showing:
Code:
BOOT_IMAGE=/boot/vmlinuz-5.15.83-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt video=simplefb:off video=efifb:off video=vesafb:off

The very strange thing about this is as soon as I try start a VM with the PCI card attached and get the stopped: unable to read tail (got 0 bytes) DMESG I am unable to stop the VM from trying to start and when trying to reboot the machine or shutdown it locks up which makes troubleshooting very annoying as I am constantly having to hard reset the host machine even after dismounting the AMDGPU kernel driver before starting a VM.

I can still start other VM's just fine while the issue VM is constantly trying to start but when trying to reboot or shutdown the machine will lock up. I have a feeling the Kernel is ignoring my statements and trying to reserve memory on the card. I will keep playing with Grub flags to try get the AMDGPU to not be mounted at boot.

One thing I think about is the Kernel Flags Vendor_Reset wants as a prerequisite but after trying to find in proxmox where these need to go I ended up not adding them which probably explains why Vendor_Reset didn't end up doing anything

CONFIG_FTRACE=y
CONFIG_KPROBES=y
CONFIG_PCI_QUIRKS=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_FUNCTION_TRACER=y
 
Last edited:
Very interesting I am still trying to mess around and will read through these in detail thank you for these links. I ended up removing Vendor Reset as it appeared to do nothing and previously on original build I was able to pass-through my card even though they state it is not supported as it is a XT and suffers some reset issue but it worked fine previously which is strange I cannot explain why TBH probably had a good Kernel Version for it and got lucky.
Did you update to a 5.15 kernel, while it was working fine with version 5.13? That's when you needed to activate vendor-reset each host reboot and the video=efifb:off video=vesafb:off work-around stopped working IIRC. That's where the new initcall_blacklist=sysfb_init work-around comes in.
And for a 5700XT you definitely need a loaded and activated vendor-reset.
The very strange thing about this is as soon as I try start a VM with the PCI card attached and get the stopped: unable to read tail (got 0 bytes) DMESG I am unable to stop the VM from trying to start and when trying to reboot the machine or shutdown it locks up which makes troubleshooting very annoying as I am constantly having to hard reset the host machine even after dismounting the AMDGPU kernel driver before starting a VM.
I don't know this specific error, sorry. But i do have a AMD GPU that requires vendor-reset and I have had better luck with kernel version 5.19 preview, where I don't do anything but enabling device_specific for vendor-reset and the GPU can pass from the amdgpu driver to the vfio-pci driver without any problems. This still works with the latest kernel version 6.1 preview but you do get ugly stack traces in the system logs (which are probably harmless).
I can still start other VM's just fine while the issue VM is constantly trying to start but when trying to reboot or shutdown the machine will lock up. I have a feeling the Kernel is ignoring my statements and trying to reserve memory on the card. I will keep playing with Grub flags to try get the AMDGPU to not be mounted at boot.
Are you editing the right boot loader configuration? What is the output of cat /proc/cmdline?
One thing I think about is the Kernel Flags Vendor_Reset wants as a prerequisite but after trying to find in proxmox where these need to go I ended up not adding them which probably explains why Vendor_Reset didn't end up doing anything

CONFIG_FTRACE=y
CONFIG_KPROBES=y
CONFIG_PCI_QUIRKS=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_FUNCTION_TRACER=y
That's not it. vendor-reset works fine with the Proxmox kernels, but you do need to activate it each boot since kernel 5.15. Maybe update the vendor-reset sources and rebuild it? Make sure it is loaded. Any errors when activating it for your GPU?
 
If your GPU is used during boot, then you need this work-around since kernel version 5.15 instead of the old ones. Check with cat /proc/cmdline if you are editing the
Did you update to a 5.15 kernel, while it was working fine with version 5.13? That's when you needed to activate vendor-reset each host reboot and the video=efifb:off video=vesafb:off work-around stopped working IIRC. That's where the new initcall_blacklist=sysfb_init work-around comes in.
And for a 5700XT you definitely need a loaded and activated vendor-reset.

I don't know this specific error, sorry. But i do have a AMD GPU that requires vendor-reset and I have had better luck with kernel version 5.19 preview, where I don't do anything but enabling device_specific for vendor-reset and the GPU can pass from the amdgpu driver to the vfio-pci driver without any problems. This still works with the latest kernel version 6.1 preview but you do get ugly stack traces in the system logs (which are probably harmless).

Are you editing the right boot loader configuration? What is the output of cat /proc/cmdline?

That's not it. vendor-reset works fine with the Proxmox kernels, but you do need to activate it each boot since kernel 5.15. Maybe update the vendor-reset sources and rebuild it? Make sure it is loaded. Any errors when activating it for your GPU?

Mate just tried again but this time blacklisted the driver in /etc/modprobe.d instead of Grub which was shown on the Wiki page I just didn't think it was applicable to my setup and added blacklist amdgpu then ran update-initramfs -u and now it is passed through and working !

I will play around and make sure it is working 100%, I have a feeling I am going to have to install Vendor-Reset again but relieved that the tail error and hard freezing has disappeared !