PCI GPU update; passthrough

levix

Member
Jul 8, 2020
4
0
6
32
So i got a new GPU to upgrade my Gaming windows VM. Had a radeon RX 560 and got a Radeon 5500 XT. The card is installed right running the vfio-pci drivers; And, have been able to add the device to my windows VM. Then i got to give the whole server a reboot and the VM will load up; showing the card in Device manager; as a basic display adapter. I go install the cards drivers; and the device finally switches over to a recognized device in device manager; as the 5500 XT and shows that is is properly working but the AMD software says No AMD graphics driver is installed.
So i go and restart my VM. Thinking it just need a good reboot after installation to get working. Rebooting the VM gets it stuck at the windows loading Screen with the spinning dots and never recovers. So i give the whole server a reboot again. Which lets the VM boot up finally but the device is not working with code 43 and AMD still says drivers are not installed.
So i go and manually install the driver again, however the card says its working for a few seconds then goes right back to not working.
I also cant shutdown the VM without needing a whole server reboot in order to get the VM running again; non of which i ever had with my RX 560.

Additionally ive created a whole new VM and reinstalled windows with the same issues. It seems once the Drivers are installed on the guest is when it all goes wrong. And im still seeing the terminal output from the Card so i know its also not fully mounting to the VM.
And Yes, i added all the conf files for all of this + blacklisted all the drivers that could be an issue.

I'm at a loss. If anyone has any insight would be much appreciated.

Running this on the latest Proxmox 6.2 with:
AMD ryzen 5 1600
on a MSI B450 Mobo

Thanks!
 
As discussed across the internet (for example here), the AMD Navi cards are in a bit of a troubling state when it comes to VFIO passthrough. The only success stories I've seen (and encountered myself) are on the latest and greatest kernels with some patches (i.e. this one at least), but even then accompanied by crashes and instability.

It's certainly possible, but AFAICT probably not on a stock PVE. You can of course build your own kernels with patches applied, see our git repositories for that.
 
As discussed across the internet (for example here), the AMD Navi cards are in a bit of a troubling state when it comes to VFIO passthrough. The only success stories I've seen (and encountered myself) are on the latest and greatest kernels with some patches (i.e. this one at least), but even then accompanied by crashes and instability.

It's certainly possible, but AFAICT probably not on a stock PVE. You can of course build your own kernels with patches applied, see our git repositories for that.
You know i was doing more troubleshooting on this with little success and had a feeling something with the new Navi was making it more difficult than it needed to be. I mean i know some people have had a world of pain just getting them to run fine on windows. Ive exausted pretty much everything i can think of to make this work to be honest and might just chuck it up to a loss. Guess i can say i gave it a good try.
 
As I'm using some older HP server (DL160G6 and DL390pG8) I usually recompile the current Proxmox kernel with a intel-iommu driver patch to remove RMRR checks. After some painful and frustrating attempts to passthrough some older Nvidia card (750Ti) to a WIn10 VM and ending up with this famous error 43 I swapped this card by a Radeon 5500 XT. Now I ran into this AMD hell because of the very odd way of resetting the GPU. Fortunate I found this thread and the link to this kernel patch. Even if this patch is targetting to Navi10 _only_, I found this working for the 5500XT (Navi14) as well! I just needed to add the PCI Id of my card to the list of Ids at the end of the patch:

Code:
 static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
                 reset_intel_82599_sfp_virtfn },
@@ -3836,6 +3961,14 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
        { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
                reset_chelsio_generic_dev },
+       { PCI_VENDOR_ID_ATI, 0x7310, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7312, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7318, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7319, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731a, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731b, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731f, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7340, reset_amd_navi10 },  <== my addition to the original patch (see lspci output)!
        { 0 }
 };

Now I'm able to restart my Win10 VM like a charm and the reset procedure can be monitored by watching the fans of my card - upon VM startup they nicely stop and restart...

@levix: if you're interessted, I could provide my kernel deb-package (5.4.44-2-pve) for testing purpose...

best regards,
Ralf
 
I just went through the process of figuring out how to create a patch file, that takes into account the already existing pve patches in 'pve-kernel/patches/kernel'. My thoughts were to write it up and help the next one being a little bit faster.

To create the patch file, i cloned the pve-kernel with "git clone git://git.proxmox.com/git/pve-kernel.git", ran "make" once (to get all source files and apply that one stupid patch that also touches "quirks.c" - because i was too lazy to figure it out another way), added the necessary code to a copy of "quirks.c", created the patch with "diff -u quirks.c.ORIGINAL quirks.c.PATCHED >> 0008-fix-navi-reset-bug-v2.patch" and fixed up the wrong file paths inside the patch file.
Now i deleted everything except the patch, started with the cloning, added the patch file to "./pve-kernel/patches/kernel/", ran "make" again to get a couple of debs which i installed with "dpkg -i *.deb", rebooted and finally was able to start-stop-restart my windows vm with 5500XT (as in "0x7340") passthrough (as my only gpu in the pve system) like a normal person ;)

Thanks to @gnif on the level1techs forum for creating the code in the first place and you will find his v2 patch here in the navi reset bug thread. Also thanks to everyone else for all the information i could leech off the internet during that process .. too numerous to list, but thanks nevertheless to all people sharing the knowledge (even after they have solved their problem).


To patch the kernel yourself just copy and create the patch file (see below) and run the following commands:

NOTE and for reference: the pve-kernel commit is "ceee458 (HEAD -> master, origin/master, origin/HEAD) bump version to 5.4.60-2", so far i can only tell it works now.

Bash:
cd /usr/src/
git clone git://git.proxmox.com/git/pve-kernel.git
cd pve-kernel
cp /PATH/TO/0008-fix-navi-reset-bug-v2.patch patches/kernel/
make
# wait and look at bpytop ;) netdata and any other nice tool to let you see the machine sweating
dpkg -i *.deb
reboot
# profit

patch file "0008-fix-navi-reset-bug-v2.patch"
Diff:
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4086,6 +4086,133 @@
        return 0;
}

+
+/*
+ * AMD Navi 10 series GPUs require a vendor specific reset procedure.
+ * According to AMD a PSP mode 2 reset should be enough however at this
+ * time the details of how to perform this are not available to us.
+ * Instead we can signal the SMU to enter and exit BACO which has the same
+ * desired effect.
+ */
+static int reset_amd_navi10(struct pci_dev *dev, int probe)
+{
+       const int mmMP0_SMN_C2PMSG_81 = 0x16091;
+       const int mmMP1_SMN_C2PMSG_66 = 0x16282;
+       const int mmMP1_SMN_C2PMSG_82 = 0x16292;
+       const int mmMP1_SMN_C2PMSG_90 = 0x1629a;
+
+       u16 cfg;
+       resource_size_t mmio_base, mmio_size;
+       uint32_t __iomem * mmio;
+       unsigned int sol;
+       unsigned int timeout;
+
+       /*
+        * if the device has FLR return -ENOTTY indicating that we have no
+        * device-specific reset method.
+        */
+       if (pcie_has_flr(dev))
+               return -ENOTTY;
+
+       /* bus resets still cause navi to flake out */
+       dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
+
+       if (probe)
+               return 0;
+
+       /* map BAR5 */
+       mmio_base = pci_resource_start(dev, 5);
+       mmio_size = pci_resource_len(dev, 5);
+       mmio = ioremap_nocache(mmio_base, mmio_size);
+       if (mmio == NULL) {
+               pci_disable_device(dev);
+               pci_err(dev, "Navi10: cannot iomap device\n");
+               return 0;
+       }
+
+       /* save the PCI state and enable memory access */
+       pci_read_config_word(dev, PCI_COMMAND, &cfg);
+       pci_write_config_word(dev, PCI_COMMAND, cfg | PCI_COMMAND_MEMORY);
+
+       #define SMU_WAIT() \
+       for(timeout = 1000; timeout && (readl(mmio + mmMP1_SMN_C2PMSG_90) & 0xFFFFFFFFL) == 0; --timeout) \
+               udelay(1000); \
+       if (readl(mmio + mmMP1_SMN_C2PMSG_90) != 0x1) \
+               pci_info(dev, "Navi10: SMU error 0x%x (line %d)\n", \
+                               readl(mmio + mmMP1_SMN_C2PMSG_90), __LINE__);
+
+       pci_set_power_state(dev, PCI_D0);
+
+       /* it's important we wait for the SOC to be ready */
+       for(timeout = 1000; timeout; --timeout) {
+               sol = readl(mmio + mmMP0_SMN_C2PMSG_81);
+               if (sol != 0xFFFFFFFF)
+                       break;
+               udelay(1000);
+       }
+
+       if (sol == 0xFFFFFFFF)
+               pci_warn(dev, "Navi10: timeout waiting for wakeup, continuing anyway\n");
+
+       /* check the sign of life indicator */
+       if (sol == 0x0) {
+               goto out;
+       }
+
+       pci_info(dev, "Navi10: performing BACO reset\n");
+
+       /* save the state around the reset */
+       pci_save_state(dev);
+
+       /* the SMU might be busy already, wait for it */
+       SMU_WAIT();
+
+       /* send PPSMC_MSG_ArmD3 with param */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_82); // BACO_SEQ_BACO
+       writel(0x46, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       /* send PPSMC_MSG_EnterBaco with param */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_82); // BACO_SEQ_BACO
+       writel(0x18, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       /* wait for the regulators to shutdown */
+       mdelay(1000);
+
+       /* send PPSMC_MSG_ExitBaco */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x19, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       #undef SMU_WAIT
+
+       /* wait for the SOC register to become valid */
+       for(timeout = 1000; timeout; --timeout) {
+               sol = readl(mmio + mmMP0_SMN_C2PMSG_81);
+               if (sol != 0xFFFFFFFF)
+                       break;
+               udelay(1000);
+       }
+
+       if (sol != 0x0) {
+               pci_err(dev, "Navi10: sol register = 0x%x\n", sol);
+               goto out;
+       }
+
+out:
+       /* unmap BAR5 */
+       iounmap(mmio);
+
+       /* restore the state and command register */
+       pci_restore_state(dev);
+       pci_write_config_word(dev, PCI_COMMAND, cfg);
+       return 0;
+}
+
+
static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
                 reset_intel_82599_sfp_virtfn },
@@ -4097,9 +4224,18 @@
        { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
        { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
                reset_chelsio_generic_dev },
+        { PCI_VENDOR_ID_ATI, 0x7310, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7312, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7318, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7319, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731a, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731b, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731f, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7340, reset_amd_navi10 },
        { 0 }
};

+
/*
  * These device-specific reset methods are here rather than in a driver
  * because when a host assigns a device to a guest VM, the host may need
 
  • Like
Reactions: r.jochum
maybe people being successfull with this (the v2 in general) could add their own identifier (e.g 0x7340) here
then it helps more people and makes more sense to add (eg. i don't know the id of 5700XT)

although i wouldn't mind if it is already compiled with the repo distributed kernel deb

ps @r.jochum: the 5700xt and the 5500xt are Navi14,
the patch just works for this gen too and it is just a name/comment in the script,
so no actual hardware specificity from that

[edited] wording and just trying to make more sense
 
Last edited:
@t.lamprecht any chance to have this patch in the pve-kernel ?

IIRC, I commented on that somewhere else: For such things I'd much rather bring this upstream, I'd be happy to backport a patch from Linus git tree or and graphic/AMD maintainers tree. This has the benefits of:
* blessed from people with in depth knowledge of that subsystem and/or hardware
* helps actually more people than a distro specific backport

So what's the status of the upstream effort of this patch?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!