Problems with PCIe passthrough with two identical devices

Figured I'd chime in to add myself to the list of people having this problem. My newly-acquired UGreen DXP8800 Plus has the two ASMedia ASM1164 controllers in it, and the second one is not visible to a VM (tried Unraid and DSM/Arc Loader) no matter what kind off passing through I try. I tried the solutions on this page of the thread and nothing works. Like @elvito noticed, the controllers disappear from the host UI once the VM is started, but the VM cannot see any drives attached to it.

I have not put this node into "production" (it's just a homelab, really) yet, so I am still at a point where I can experiment with any potential solutions there are. Hoping someone has the magic one for all of us!
 
If you try a reset on a device, that has problems with a reset (during operation), then the logical consequence is, that the device will no longer be accessible.

So, either someone is actually willing to try out, what was proposed, or most likely nothing will happen.
If everyone is just waiting for someone else to fix the issue, this could take a LONG time. :)
You all obviously chose to use Open Source Software, so now you actually have the chance to contribute.
 
If you try a reset on a device, that has problems with a reset (during operation), then the logical consequence is, that the device will no longer be accessible.

So, either someone is actually willing to try out, what was proposed, or most likely nothing will happen.
If everyone is just waiting for someone else to fix the issue, this could take a LONG time. :)
You all obviously chose to use Open Source Software, so now you actually have the chance to contribute.
I'm happy to try just about anything (especially while still in the return period for the UGreen ), but after looking at your kernel quirk and firmware suggestions I decided they were over my head without a step by step guide.
 
I'm happy to try just about anything (especially while still in the return period for the UGreen ), but after looking at your kernel quirk and firmware suggestions I decided they were over my head without a step by step guide.
My answer was not directed at any one person in particular. So, it is completely fair to say, that you cannot do this.

Main point is:
Someone will have to try it. My suggestion is far from being proven. Only way however, in order to find out, is someone actually trying it, or coming up with another idea.
As long as everyone just says "same here", it will stay the way it is. Which is not working, if I am not mistaking. So, if someone knows how to do this and is actually affected, maybe think about it. If I had the time, currently, I would build you a test kernel. However, currently I sadly have bigger fish to fry.
 
Hello,

I did test the early isolation, also in combination with boot option
Code:
pcie_aspm=off
and ressource mapping, but the suspend still happens after Proxmox booted up and the HDDs of the second controller won't spin up again. Maybe the suspend could be avoided with different BIOS settings (e.g. turn off the power saving/hot plugging) or the quirk solution? My device is in a cabinet (no monitor/no keyboard) and because of effort and downtime I didn't want to take it out to test it with different BIOS settings. In the meantime I also added a 6th HDD and passed it to the Truenas VM (like the 5th HDD) as a single device, which is also working without problems.
I tried the early isolation method this morning - same results. Even though it does seem to force vfio-pci to grab the device at host boot, the second SATA controller still doesn't show in the VM. And on subsequent attempts, the VM fails to boot because the PCI device (SATA controller) can't be reset. As we know, this can only be fixed with a host reboot. I tried tinkering with any BIOS settings I thought would be related, but it still produces the same result. Hoping that maybe you see a BIOS setting that I didn't, but I'm pretty sure @celemine1gig is right in that this has to be fixed at the firmware and/or kernel level - both of which are above my pay grade, currently.
 
Just FYI, for the quirk to apply you would have to build a test Linux Kernel for your machine, with the PCI ID of you controller in question, added to the list of devices, that the "quirk_no_bus_reset" should be applied to. To my knowledge, there is no other/obvious way to test it.
Simply a heads-up, that this won't be an easy and fast test and/or workaround. Besides the question, if it even helps at all.

Edit:
As further explanation, you could for example add the following information after line 3775 in the "quirks.c" file (https://git.kernel.org/pub/scm/linu...x.git/tree/drivers/pci/quirks.c?h=v6.14#n3775):
C:
/*
 * Test patch for Asmedia SATA controller issues with PCI-pass-through
 * Some Asmedia ASM1164 controllers do not seem to successfully
 * complete a bus reset.
 */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ASMEDIA, 0x1164, quirk_no_bus_reset);

This example obviously applies to the above mentioned Asmedia ASM1164 controller.
After spending the better part of three days with Gemini learning how to compile a Linux kernel and working through some errors and differences between Proxmox and standard Linux kernels, I got this to work. The second SATA controller has survived multiple host and VM reboots. I couldn't have done it without being able to copy & paste this code into the quirks file, so thank you very much for posting it here. Hopefully, this becomes a more permanent fix in the future. Having never done this before, can I assume that I need to avoid kernel updates for the foreseeable future? Unless of course I want to re-build a kernel every time with the quirk added, right?
 
  • Like
Reactions: celemine1gig
...
Hopefully, this becomes a more permanent fix in the future. Having never done this before, can I assume that I need to avoid kernel updates for the foreseeable future? Unless of course I want to re-build a kernel every time with the quirk added, right?
Great to read, that it worked out like I had hoped for.

Now the thing is: That was the easy part.
If you want to have this permanently taken care of, this should go as a fix into the mainline Kernel.

That means, one would have to contact the official PCI subsystem maintainer about it.
See here:
https://docs.kernel.org/process/maintainers.html#pci-subsystem

Last time I tried something like this - mind you with a trivial fix, just like this - it took several months to get it accepted. And then it will take additional time, until it will be actively used in standard distributions.
So, for the time being, you will most likely have to continue building your own kernels with fixes, until this is finally done.
Unless you can convince the developers at Proxmox to implement it already, in the meantime.