GPU Passthrough for an AMD system

ocrate

New Member
Oct 24, 2024
4
0
1
Yes, this is another GPU passthrough help request...

I've been trying to get this to work for hours over multiple days, and for the life of me I cannot get it to work. I've followed the official documentation, read and watched several guides/tutorials and a whole bunch of forum posts both here and other places.

I was really hoping I could get this to work. The idea is that I have a secondary computer that functions both as a server for hosting various software and as a secondary gaming machine connected to my TV. Running Proxmox with GPU passthrough seemed like the ideal option so that I could keep my gaming machine logically separated from all the other stuff, and allows me to run Home Assistant OS instead of the Docker variant.

Now before I give up and go back to just running plain old Linux I really hope that someone might be able to help me out.

Here are my specs
  • ASUS TUF Gaming B650-PLUS WIFI
  • AMD Ryzen 9 7900X
  • Asus TUF Gaming Radeon RX 7800 XT OC
I am running Proxmox 8.2.7 with Linux 6.8.12-2-pve kernel.

I've made sure IOMMU is enabled, and that CMS is disabled, in the BIOS. Later I've also tried disabling higher than 4G decoding as per some suggestion I read somewhere.

Output of dmesg | grep -e IOMMU:
Code:
[    0.379094] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.408686] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).

Output of dmesg | grep 'remapping'
Code:
[    0.382141] AMD-Vi: Interrupt remapping enabled

Relevant output from lspci -nn
Code:
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 32 [Radeon RX 7700 XT / 7800 XT] [1002:747e] (rev c8)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]

The pvesh get command shows that my gpu (per device id 0000:03:00.0 name is not mentioned in the output) is in it's own iommugroup (group 14). Even separate from the audio controller (which is in group 15) .


Configuring the host

Now for the configuration of the host I've tried a lot of different settings and combination of settings. For starters I tried following the documentation and all the other guides that seemed to do the same.


Updating /etc/default/grub
Code:
# Initial attempt. I later found out that iommu is on by default and that some of these parameters are unnecessary.
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt video=vesafb:off video=efifb:off"
GRUB_CMDLINE_LINUX=""

# Later I tried this instead as per some post here on the forum
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt initcall_blacklist=sysfb_init"
GRUB_CMDLINE_LINUX=""

Blacklisting drivers /etc/modprobe.d/blacklist.conf
Code:
blacklist amdgpu
blacklist radeon

Adding vfio modules in /etc/modules
Code:
 vfio
 vfio_iommu_type1
 vfio_pci
 vfio_virqfd # only found out much later that I didn't need this one

And finally configuring early loading in /etc/modprobe.d/vfio.conf
Code:
# This is the option I originally tried
options vfio-pci ids=1002:747e,1002:ab30 disable_vga=1

# I also tried something like this later on
options vfio-pci ids=1002:747e,1002:ab30
softdep amdgpu pre: vfio-pci

Naturally the configurations were reloaded using update-grub and update-initramfs -u before rebooting.


Setting up the VM

I've tried running both Windows 11 and arch linux for the guest OS. Both with very similar setups.
16G of RAM
8 Cores (CPU as host)
OVMF
q35
The rest is pretty much the default

Then I added my 7800XT using raw device and finding the aforementioned id 0000:03:00 and selecting All Functions.
Note that I did not initially select Primary GPU or PCIe. I tried that later, but I will get back to that.


Results

So initially I would see the Poxmox Boot menu on my TV that was plugged into the 7800 XT via HDMI, and the screen after that, but it froze at that point before getting to the actual TTY and login screen. If I started the VM from the web interface the screen would go black (meaning the frozen text disappeared) and notrhing more happened. After a minute or so the computer would restart by itself. Presumably after crashing. There was no ouput on the VNC either in the browser.

This is the behavior I see if I try any setting that involves blacklisting the GPU. Any combination I've tried has resulted in the same behavior. No output on either the connect screen or the terminal in the browser, and the eventual reboot after a minute or two.

After a lot of trial and error, as well as reading, I saw a post saying that the GPU should be able to reset just fine and that the blacklisting actually was not necessary. After that I tried removing much of the configuration I've done and essentially ending up back in the state I had before starting. So no blacklisting, no vfio modules or settings specified.

Now when I boot any of the two VMs I actually do get some ouput in the browser, but not on the screen connected via HDMI (I left the raw device mapping). Like before the TV goes black the minute I start a VM with the GPU mapped into it as a raw device. But I was able to install both the Arch and WIndows VMs (not at the same time of course). And I was able to install the 7800 XT driver in Windows, but as I've seen with many others, I do get the error 34 message. At least it seems like it recognized that the card is there.
After this I've tried using the PCIe option and Primary GPU both separately and together. What always happens is that the VM would freeze on the Proxmox boot screen or simply just stay black in the browser console window (still no output to the TV).


Some final comments


I have not tried mounting the GPU in another slot. It is currently mounted in the top slot.

I still have my RTX 3080 laying around. I could have tested with that one as well, but I have not yet done that. Let me know if you think that would be a good idea.

Any logs or other things you'd want me to provide please let me know as I'm not entirely sure what would be good to include.


Any help or suggestions would be much appreciated. Thanks.
 
Can you try add nomodeset into GRUB_CMDLINE_LINUX_DEFAULT?
I'll double check my settings later. Most the setup looks good in your settings.
 
Last edited:
Can you try add nomodeset into GRUB_CMDLINE_LINUX_DEFAULT?
I'll double check my settings later. Most the setup looks good in your settings.
Thank you for your suggestion. I can try that. Which of the following scenarios are you running?
  1. No driver blacklisting or early loading. Your system is handing over the GPU to the VM just fine.
  2. No driver blacklisting, but with early loading and softdep for vfio configured.
  3. Drivers are blacklisted and early loading is configured.
The main downside of having the drivers blacklisted is that it will also affect the iGPU also using the amdgpu drivers. Which is why I'd prefer scenario 1 or 2.

Any updates? I am in the same situtation

Unfortunately, I'd mostly given up and concluded that my system may not be fully compatible for doing GPU passthrough. However, I will try @frankmanzhu's suggestion and see if that makes a difference. I wanted to keep Proxmox despite being unable to pass through a GPU, so I've moved quite a lot of services over to it. Because of this I don't want to be doing too much testing, but I guess I'll just do backups of all the machines just in case.
 
Can you try add nomodeset into GRUB_CMDLINE_LINUX_DEFAULT?
I'll double check my settings later. Most the setup looks good in your settings.

I tried adding nomodeset to the grub file with early loading configured as so

/etc/default/grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset"
GRUB_CMDLINE_LINUX=""

Vfio modules in /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci

Early loading and softdep in /etc/modprobe.d/vfio.conf:
Code:
options vfio-pci ids=1002:747e,1002:ab30
softdep amdgpu pre: vfio-pci

Proxmox booted as expected. After displaying some bootup messages the screen stops updating, stuck on the last message before the proxmox console itself comes up. Booting a VM with the GPU passed through causes the screen to go blank. Nothing further happens at this point. I tried opening the shell and the server crashed and rebooted.

This is the typical result I've experienced so far.
 
I tried adding nomodeset to the grub file with early loading configured as so

/etc/default/grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset"
GRUB_CMDLINE_LINUX=""

Vfio modules in /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci

Early loading and softdep in /etc/modprobe.d/vfio.conf:
Code:
options vfio-pci ids=1002:747e,1002:ab30
softdep amdgpu pre: vfio-pci

Proxmox booted as expected. After displaying some bootup messages the screen stops updating, stuck on the last message before the proxmox console itself comes up. Booting a VM with the GPU passed through causes the screen to go blank. Nothing further happens at this point. I tried opening the shell and the server crashed and rebooted.

This is the typical result I've experienced so far.


Sorry, I probably hadn't made it very clear. Here is my settings:

In the BIOS,

Above 4G = Disable (Must)
IOMMU = Enable (Must)


/etc/default/grub:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

Vfio modules in /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd


For /etc/modprobe.d/vfio.conf, use (I've got multiple gpu, all like this (including iGPU))
Code:
options vfio-pci ids=1002:747e,1002:ab30 disable_vga=1

For blacklist, I have many gpus and basically blacklisted all drivers
/etc/modprobe.d/pve-blacklist.conf

Code:
blacklist nvidiafb
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist amdgpu
blacklist snd_hda_intel

softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci

I passthrough gpu and pci usb controllers to different vms.
I know you don't want to blacklist drivers, so that when VM shutdown, it can return to the host and it will work with fan controller?
But the reality is that it's not that stable. The workaround is to spin another small VM with unused GPU if you just want the GPU to quiet the fan.

In addition, when passing through GPU, you need to tick All functions, Rom-Bar, PCIE (if you want GPU output, you need to tick primary GPU and set Display to none, if no need gpu output, you can leave it off),

After the change, update-grub and update-initramfs -u -k all before rebooting.

Please try to stick with my settings and make it work first. Then you can tweek/remove something you believe it's unnessesory. By the way, passthrough also depends on the kernel as well. But windows, linux works ok with the 6.x. (So you are safe to go) MacOS is very picky with iommu layout. I'm stuck with 5.15 for that reason.
Let me know if it works for you.
 
Last edited:
Sorry, I probably hadn't made it very clear. Here is my settings:

In the BIOS,

Above 4G = Disable (Must)
IOMMU = Enable (Must)


/etc/default/grub:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

Vfio modules in /etc/modules:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd


For /etc/modprobe.d/vfio.conf, use (I've got multiple gpu, all like this (including iGPU))
Code:
options vfio-pci ids=1002:747e,1002:ab30 disable_vga=1

For blacklist, I have many gpus and basically blacklisted all drivers
/etc/modprobe.d/pve-blacklist.conf

Code:
blacklist nvidiafb
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist amdgpu
blacklist snd_hda_intel

softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci

I passthrough gpu and pci usb controllers to different vms.
I know you don't want to blacklist drivers, so that when VM shutdown, it can return to the host and it will work with fan controller?
But the reality is that it's not that stable. The workaround is to spin another small VM with unused GPU if you just want the GPU to quiet the fan.

In addition, when passing through GPU, you need to tick All functions, Rom-Bar, PCIE (if you want GPU output, you need to tick primary GPU and set Display to none, if no need gpu output, you can leave it off),

After the change, update-grub and update-initramfs -u -k all before rebooting.

Please try to stick with my settings and make it work first. Then you can tweek/remove something you believe it's unnessesory. By the way, passthrough also depends on the kernel as well. But windows, linux works ok with the 6.x. (So you are safe to go) MacOS is very picky with iommu layout. I'm stuck with 5.15 for that reason.
Let me know if it works for you.

Thank you for the comprehensive reply.

I double checked the following values in the BIOS

Above 4G = Disable
IOMMU = Enable

They were set correctly.

I then tried your exact settings. Following this I tried booting a VM with the GPU passed through to it.
Booting the VM without the PCIe option enabled simply results in no output to the screen. The VM is unable to boot with it enabled and the entire Proxmox server crashes and reboots.

At this point I am wondering whether this is simply not possible on my system. I don't think I'll be spending more time trying to get this working, so I will instead simply build another system and move the GPU over for my secondary gaming rig and keep Proxmox running on this one without any GPUs.

Thank you for taking the time to try and help me.
 
Thank you for the comprehensive reply.

I double checked the following values in the BIOS

Above 4G = Disable
IOMMU = Enable

They were set correctly.

I then tried your exact settings. Following this I tried booting a VM with the GPU passed through to it.
Booting the VM without the PCIe option enabled simply results in no output to the screen. The VM is unable to boot with it enabled and the entire Proxmox server crashes and reboots.

At this point I am wondering whether this is simply not possible on my system. I don't think I'll be spending more time trying to get this working, so I will instead simply build another system and move the GPU over for my secondary gaming rig and keep Proxmox running on this one without any GPUs.

Thank you for taking the time to try and help me.

Sorry to hear what happened. Normally, it won't crash the host (unless some sort of device wasn't cleanly disconnected from the host and try to perform a pass through) Any logs indicates what's going on?

But I know every pc is different especially motherboard all have different layout. Some IOMMU are very poorly grouped.
Your cpu/motherboard/gpu all looks good to me. I wouldn't expect that much of trouble to get it working.

One guess is that for some reason, the new driver (which is not in the blacklist?) is loaded for the gpu and it couldn't unload and pass through to the VM properly (which causes host crash?) .
I'm still on the old kernel 5.15 because it's dumb and not having a lot of driver support (so I can pass those into VM easily)
I am having more hiccups on lastest kernel for passing through.
 
I think this might be due some changes in newer pve-qemu-kvm package versions. I have a setup with 3 RTX4090 pass-through to a vm, which was working like a charm. Suddenly it stopped working and it throw exceptions during booting and I wasn't able to boot it anymore. I tracked the issue down to pve-qemu-kvm packages newer than 8.1.5-6... Try to install this version in the proxmox host: apt-get install pve-qemu-kvm=8.1.5-6

Good luck!
 
I think this might be due some changes in newer pve-qemu-kvm package versions. I have a setup with 3 RTX4090 pass-through to a vm, which was working like a charm. Suddenly it stopped working and it throw exceptions during booting and I wasn't able to boot it anymore. I tracked the issue down to pve-qemu-kvm packages newer than 8.1.5-6... Try to install this version in the proxmox host: apt-get install pve-qemu-kvm=8.1.5-6

Good luck!

Thx for the info. I had installed the latest kernel v6.8.12-4 from the test repo to try and resolve a bug with AMD Zen 4 CPU's and it broke my setup. Windows 11 VM"s configured for GPU passthrough no longer boot.

My posted comments in the Zen 4 kernel bug thread are at https://forum.proxmox.com/threads/sudden-bulk-stop-of-all-vms.139500/post-718651.

I tried reverting to the older pve-qemu-kvm package but it will not work with the latest test kernel. Next step is to re-install Proxmox and test with a clean system that does not use the testing repository with the newer kernel. Will report back with findings.
 
Try downgrading libpve-common-perl to v8.2.5. That fixed it for me.

In order to prevent libpve-common-perl from updating first downgrade libpve-common-perl to v8.2.5: apt-get install libpve-common-perl=8.2.5. Then mark the package so it will not be updated: apt-mark hold libpve-common-perl.

Worked for me and I now have PCI Passthrough working again for my RX 6650XT GPU.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!