PCI GPU update; passthrough

levix · Jul 8, 2020

So i got a new GPU to upgrade my Gaming windows VM. Had a radeon RX 560 and got a Radeon 5500 XT. The card is installed right running the vfio-pci drivers; And, have been able to add the device to my windows VM. Then i got to give the whole server a reboot and the VM will load up; showing the card in Device manager; as a basic display adapter. I go install the cards drivers; and the device finally switches over to a recognized device in device manager; as the 5500 XT and shows that is is properly working but the AMD software says No AMD graphics driver is installed.
So i go and restart my VM. Thinking it just need a good reboot after installation to get working. Rebooting the VM gets it stuck at the windows loading Screen with the spinning dots and never recovers. So i give the whole server a reboot again. Which lets the VM boot up finally but the device is not working with code 43 and AMD still says drivers are not installed.
So i go and manually install the driver again, however the card says its working for a few seconds then goes right back to not working.
I also cant shutdown the VM without needing a whole server reboot in order to get the VM running again; non of which i ever had with my RX 560.

Additionally ive created a whole new VM and reinstalled windows with the same issues. It seems once the Drivers are installed on the guest is when it all goes wrong. And im still seeing the terminal output from the Card so i know its also not fully mounting to the VM.
And Yes, i added all the conf files for all of this + blacklisted all the drivers that could be an issue.

I'm at a loss. If anyone has any insight would be much appreciated.

Running this on the latest Proxmox 6.2 with:
AMD ryzen 5 1600
on a MSI B450 Mobo

Thanks!

Stefan_R · Jul 8, 2020

As discussed across the internet (for example here), the AMD Navi cards are in a bit of a troubling state when it comes to VFIO passthrough. The only success stories I've seen (and encountered myself) are on the latest and greatest kernels with some patches (i.e. this one at least), but even then accompanied by crashes and instability.

It's certainly possible, but AFAICT probably not on a stock PVE. You can of course build your own kernels with patches applied, see our git repositories for that.

levix · Jul 8, 2020

Stefan_R said:
As discussed across the internet (for example here), the AMD Navi cards are in a bit of a troubling state when it comes to VFIO passthrough. The only success stories I've seen (and encountered myself) are on the latest and greatest kernels with some patches (i.e. this one at least), but even then accompanied by crashes and instability.

It's certainly possible, but AFAICT probably not on a stock PVE. You can of course build your own kernels with patches applied, see our git repositories for that.

You know i was doing more troubleshooting on this with little success and had a feeling something with the new Navi was making it more difficult than it needed to be. I mean i know some people have had a world of pain just getting them to run fine on windows. Ive exausted pretty much everything i can think of to make this work to be honest and might just chuck it up to a loss. Guess i can say i gave it a good try.

bonix · Jul 26, 2020

As I'm using some older HP server (DL160G6 and DL390pG8) I usually recompile the current Proxmox kernel with a intel-iommu driver patch to remove RMRR checks. After some painful and frustrating attempts to passthrough some older Nvidia card (750Ti) to a WIn10 VM and ending up with this famous error 43 I swapped this card by a Radeon 5500 XT. Now I ran into this AMD hell because of the very odd way of resetting the GPU. Fortunate I found this thread and the link to this kernel patch. Even if this patch is targetting to Navi10 _only_, I found this working for the 5500XT (Navi14) as well! I just needed to add the PCI Id of my card to the list of Ids at the end of the patch:

Code:

 static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
                 reset_intel_82599_sfp_virtfn },
@@ -3836,6 +3961,14 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
        { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
                reset_chelsio_generic_dev },
+       { PCI_VENDOR_ID_ATI, 0x7310, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7312, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7318, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7319, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731a, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731b, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x731f, reset_amd_navi10 },
+       { PCI_VENDOR_ID_ATI, 0x7340, reset_amd_navi10 },  <== my addition to the original patch (see lspci output)!
        { 0 }
 };

Now I'm able to restart my Win10 VM like a charm and the reset procedure can be monitored by watching the fans of my card - upon VM startup they nicely stop and restart...

@levix: if you're interessted, I could provide my kernel deb-package (5.4.44-2-pve) for testing purpose...

best regards,
Ralf

jw-it · Sep 18, 2020

I just went through the process of figuring out how to create a patch file, that takes into account the already existing pve patches in 'pve-kernel/patches/kernel'. My thoughts were to write it up and help the next one being a little bit faster.

To create the patch file, i cloned the pve-kernel with "git clone git://git.proxmox.com/git/pve-kernel.git", ran "make" once (to get all source files and apply that one stupid patch that also touches "quirks.c" - because i was too lazy to figure it out another way), added the necessary code to a copy of "quirks.c", created the patch with "diff -u quirks.c.ORIGINAL quirks.c.PATCHED >> 0008-fix-navi-reset-bug-v2.patch" and fixed up the wrong file paths inside the patch file.
Now i deleted everything except the patch, started with the cloning, added the patch file to "./pve-kernel/patches/kernel/", ran "make" again to get a couple of debs which i installed with "dpkg -i *.deb", rebooted and finally was able to start-stop-restart my windows vm with 5500XT (as in "0x7340") passthrough (as my only gpu in the pve system) like a normal person

Thanks to @gnif on the level1techs forum for creating the code in the first place and you will find his v2 patch here in the navi reset bug thread. Also thanks to everyone else for all the information i could leech off the internet during that process .. too numerous to list, but thanks nevertheless to all people sharing the knowledge (even after they have solved their problem).

To patch the kernel yourself just copy and create the patch file (see below) and run the following commands:

NOTE and for reference: the pve-kernel commit is "ceee458 (HEAD -> master, origin/master, origin/HEAD) bump version to 5.4.60-2", so far i can only tell it works now.

Bash:

cd /usr/src/
git clone git://git.proxmox.com/git/pve-kernel.git
cd pve-kernel
cp /PATH/TO/0008-fix-navi-reset-bug-v2.patch patches/kernel/
make
# wait and look at bpytop ;) netdata and any other nice tool to let you see the machine sweating
dpkg -i *.deb
reboot
# profit

patch file "0008-fix-navi-reset-bug-v2.patch"

Diff:

--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4086,6 +4086,133 @@
        return 0;
}

+
+/*
+ * AMD Navi 10 series GPUs require a vendor specific reset procedure.
+ * According to AMD a PSP mode 2 reset should be enough however at this
+ * time the details of how to perform this are not available to us.
+ * Instead we can signal the SMU to enter and exit BACO which has the same
+ * desired effect.
+ */
+static int reset_amd_navi10(struct pci_dev *dev, int probe)
+{
+       const int mmMP0_SMN_C2PMSG_81 = 0x16091;
+       const int mmMP1_SMN_C2PMSG_66 = 0x16282;
+       const int mmMP1_SMN_C2PMSG_82 = 0x16292;
+       const int mmMP1_SMN_C2PMSG_90 = 0x1629a;
+
+       u16 cfg;
+       resource_size_t mmio_base, mmio_size;
+       uint32_t __iomem * mmio;
+       unsigned int sol;
+       unsigned int timeout;
+
+       /*
+        * if the device has FLR return -ENOTTY indicating that we have no
+        * device-specific reset method.
+        */
+       if (pcie_has_flr(dev))
+               return -ENOTTY;
+
+       /* bus resets still cause navi to flake out */
+       dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
+
+       if (probe)
+               return 0;
+
+       /* map BAR5 */
+       mmio_base = pci_resource_start(dev, 5);
+       mmio_size = pci_resource_len(dev, 5);
+       mmio = ioremap_nocache(mmio_base, mmio_size);
+       if (mmio == NULL) {
+               pci_disable_device(dev);
+               pci_err(dev, "Navi10: cannot iomap device\n");
+               return 0;
+       }
+
+       /* save the PCI state and enable memory access */
+       pci_read_config_word(dev, PCI_COMMAND, &cfg);
+       pci_write_config_word(dev, PCI_COMMAND, cfg | PCI_COMMAND_MEMORY);
+
+       #define SMU_WAIT() \
+       for(timeout = 1000; timeout && (readl(mmio + mmMP1_SMN_C2PMSG_90) & 0xFFFFFFFFL) == 0; --timeout) \
+               udelay(1000); \
+       if (readl(mmio + mmMP1_SMN_C2PMSG_90) != 0x1) \
+               pci_info(dev, "Navi10: SMU error 0x%x (line %d)\n", \
+                               readl(mmio + mmMP1_SMN_C2PMSG_90), __LINE__);
+
+       pci_set_power_state(dev, PCI_D0);
+
+       /* it's important we wait for the SOC to be ready */
+       for(timeout = 1000; timeout; --timeout) {
+               sol = readl(mmio + mmMP0_SMN_C2PMSG_81);
+               if (sol != 0xFFFFFFFF)
+                       break;
+               udelay(1000);
+       }
+
+       if (sol == 0xFFFFFFFF)
+               pci_warn(dev, "Navi10: timeout waiting for wakeup, continuing anyway\n");
+
+       /* check the sign of life indicator */
+       if (sol == 0x0) {
+               goto out;
+       }
+
+       pci_info(dev, "Navi10: performing BACO reset\n");
+
+       /* save the state around the reset */
+       pci_save_state(dev);
+
+       /* the SMU might be busy already, wait for it */
+       SMU_WAIT();
+
+       /* send PPSMC_MSG_ArmD3 with param */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_82); // BACO_SEQ_BACO
+       writel(0x46, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       /* send PPSMC_MSG_EnterBaco with param */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_82); // BACO_SEQ_BACO
+       writel(0x18, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       /* wait for the regulators to shutdown */
+       mdelay(1000);
+
+       /* send PPSMC_MSG_ExitBaco */
+       writel(0x00, mmio + mmMP1_SMN_C2PMSG_90);
+       writel(0x19, mmio + mmMP1_SMN_C2PMSG_66);
+       SMU_WAIT();
+
+       #undef SMU_WAIT
+
+       /* wait for the SOC register to become valid */
+       for(timeout = 1000; timeout; --timeout) {
+               sol = readl(mmio + mmMP0_SMN_C2PMSG_81);
+               if (sol != 0xFFFFFFFF)
+                       break;
+               udelay(1000);
+       }
+
+       if (sol != 0x0) {
+               pci_err(dev, "Navi10: sol register = 0x%x\n", sol);
+               goto out;
+       }
+
+out:
+       /* unmap BAR5 */
+       iounmap(mmio);
+
+       /* restore the state and command register */
+       pci_restore_state(dev);
+       pci_write_config_word(dev, PCI_COMMAND, cfg);
+       return 0;
+}
+
+
static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
        { PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
                 reset_intel_82599_sfp_virtfn },
@@ -4097,9 +4224,18 @@
        { PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
        { PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
                reset_chelsio_generic_dev },
+        { PCI_VENDOR_ID_ATI, 0x7310, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7312, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7318, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7319, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731a, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731b, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x731f, reset_amd_navi10 },
+        { PCI_VENDOR_ID_ATI, 0x7340, reset_amd_navi10 },
        { 0 }
};

+
/*
  * These device-specific reset methods are here rather than in a driver
  * because when a host assigns a device to a guest VM, the host may need

r.jochum · Sep 18, 2020

I also have a Navi 10 based GFX (5700 XT), your findings will come in handy once i want to passthrough it.

r.jochum · Sep 19, 2020

@t.lamprecht any chance to have this patch in the pve-kernel ?

jw-it · Sep 20, 2020

maybe people being successfull with this (the v2 in general) could add their own identifier (e.g 0x7340) here
then it helps more people and makes more sense to add (eg. i don't know the id of 5700XT)

although i wouldn't mind if it is already compiled with the repo distributed kernel deb

ps @r.jochum: the 5700xt and the 5500xt are Navi14,
the patch just works for this gen too and it is just a name/comment in the script,
so no actual hardware specificity from that

[edited] wording and just trying to make more sense

r.jochum · Sep 20, 2020

Its: 1002:731f

Code:

2d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1)

t.lamprecht · Sep 21, 2020

r.jochum said:
@t.lamprecht any chance to have this patch in the pve-kernel ?

IIRC, I commented on that somewhere else: For such things I'd much rather bring this upstream, I'd be happy to backport a patch from Linus git tree or and graphic/AMD maintainers tree. This has the benefits of:
* blessed from people with in depth knowledge of that subsystem and/or hardware
* helps actually more people than a distro specific backport

So what's the status of the upstream effort of this patch?

r.jochum · Sep 21, 2020

Upstream is delayed as people have problems with the patch, see: [1].

1: https://forum.level1techs.com/t/navi-reset-kernel-patch/147547/31

t.lamprecht · Sep 21, 2020

We certainly do not want to include a patch which is known to make some systems hang on boot for a feature that is rather experimental.

r.jochum · Sep 21, 2020

Kernel 5.10 will have "a reset handling change", see [1]

1: https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-Linux-5.10-First

Search

Search

PCI GPU update; passthrough

levix

Member

Stefan_R

Proxmox Retired Staff

levix

Member

bonix

New Member

jw-it

Member

r.jochum

Renowned Member

r.jochum

Renowned Member

jw-it

Member

r.jochum

Renowned Member

t.lamprecht

Proxmox Staff Member

r.jochum

Renowned Member

t.lamprecht

Proxmox Staff Member

r.jochum

Renowned Member

We value your privacy