To Blacklist or Not to Blacklist...

I am glad swapping the GPUs did the trick. I have found that the first pci slot has issues and moving it solves the issue. If you have a 3rd pcie slot you might consider putting the Quadro there.

I would just create a new post and put it all there. Immensely helpful to the community.
 
  • Like
Reactions: jemme
If you have any x8 slot use it. I don't think you are giving up that much on the Quadro but I could be wrong.
 
Another way it to early bind the device to vfio-pci (see the documentation). However, you need to make sure vfio-pci is loaded before the actual driver and that might require a softdep (which is unfortunately not in the documentation). This will still affect all identical devices (if you have two NICs with the same vendor:device ID, for example) but not other devices that use the same driver.
nice clarification on sequencing of events

i am doing a lot of nvme pass through and the issue in bold is deintely an issue, i wish we could black list by IOMMU group as that would fix that issue, i was also looking through logs at startup and noticed without blacklisting ZFS starts before the cluster services reset the device, giving a race condition where zfs will import the VMs ZFS if a)proxmox thinks it ever managed it or b)it is marked as exported

I have found the only way to mitigate this was to create a script that runs as part of initramfd (init-top script handler) to make vfio load as the driver as early as possible - of course messing this deeply in in initramfs is, umm, 'fun` :-)