[SOLVED] Passthrough of MCIO based SATA controller not working. MSIX PBA outside of specified BAR

scyto

Well-Known Member
Aug 8, 2023
548
119
48
I have an AMD Epyc 9155 Turin CPU on an ASROCK GENOAD8UD-2T/X550 motherboard.

Two MCIO ports can be configured for SATA mode (each port is then seen as two 4 lane SATA controllers).

When being passed through to a VM the VM would not start with:

Code:
error writing '1' to '/sys/bus/pci/devices/0000:42:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:42:00.0', but trying to continue as not all devices need a reset
kvm: -device vfio-pci,host=0000:42:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:42:00.0: hardware reports invalid configuration, MSIX PBA outside of specified BAR
TASK ERROR: start failed: QEMU exited with code 1

this occured for both 42:00.0 and 42:00.1

this is a bug likely caused by invalid MSIX PBA settings for the device on the motherboard as per this comment https://bugs.launchpad.net/qemu/+bug/1894869/comments/4

as in this thread https://forum.proxmox.com/threads/pci-passtrough-a100.143838/

amending my vm conf file to add the following was the solution

Code:
args: -set device.hostpci0.x-msix-relocation=bar2 -set device.hostpci1.x-msix-relocation=bar2

where the two SATA devices were defind in the conf like this
Code:
hostpci0: 0000:42:00.0,pcie=1
hostpci1: 0000:42:00.1,pcie=1

I am logging this incase anyone else hits this, and for incase i ever forget this and do a google search later :-)
 
Last edited:
I literally fogot I had hit this and fixed it, I found the thread by search when i hit the same issue an hour or so ago!
(the Feb truenas vitualization blew up shortly after that post IIRC)

So to make this an additive reply

It is also importat that devices like 42:00.0 and 42:00.1 are added as discrete devices, do not use 42:00 and choose to pass all functions, that doesn't work for this motheboard.

Once i had done this <grin>

1746581309339.png
 
Another note to myself and others on how proxmox seems to boot with no black listing and why its dangerous to do that with ZFS
  1. First the system starts booting
  2. it enumerates all the disks and loads all the normal drivers
  3. all zfs processes start - it looks at the pools and choose to not to import the pool unless this instance of proxmox has managed the pol before or the pool is an exported sate
  4. system continues to boot
  5. once zfs / ceph / network targets are ccomplet the proxmox cluster services start
  6. at this point the tell vfio to be the driver for the device, resetting the device and effecitvely unloading the nvme / sata drivers and load vfio for the device
this means there is absolutely a window during boot where zfs can snatch / affect metadata on a zfs pool, now this should rarely happen and should always be possible to avoid so long as you never put the truenas VMs pools into export state, but if you mistakenly do that and reboot you will be hosed, worse still I don't know what happens if proxmox imports a pool and then resets the device driver on all the disks becuase the devices are passed through - but i posit it is why i and others have had weird corruption issues

tl;dr never export a pool on a truenas VM and then reboot the host without having reimported the pool first in the VM.

blacklisting device IDs is a VERY good idea, this is best done with one of the methods documented in the wiki, however this doesn't work if you have devices with the same ID and want to passthrough some and leave some for the host (in my case 2 pairs of kingston DC2000Bs). The only way i have found to do block by PCID (bus device number is this). scyto/virtio-fs-detection-and-exlusion - i warn folks, this is a last resort as you can break initramfs and leave yoursefl unbootable, but it is the only 100% guranteed way to ensure the ZFS module on the host never sees the pools on the devices passed through.

Basically this (where ZFS is allowed to scan disks that it shouldnt be scanning) - default behavious if nvme or hba is passed through to VM without blacklisting:

Bash:
May 09 19:04:06 pve-nas1 systemd[1]: Finished ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
May 09 19:04:06 pve-nas1 systemd[1]: Finished systemd-udev-settle.service - Wait for udev To Complete Device Initialization.
May 09 19:04:06 pve-nas1 systemd[1]: zfs-import-cache.service - Import ZFS pools by cache file was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/zfs/zpool.cache).
May 09 19:04:06 pve-nas1 systemd[1]: Starting zfs-import-scan.service - Import ZFS pools by device scanning...
May 09 19:04:06 pve-nas1 zpool[1777]: cannot import 'Fast': pool was previously in use from another system.
May 09 19:04:06 pve-nas1 zpool[1777]: Last accessed by pve-nas1 (hostid=58a1c56c) at Fri May  9 16:12:17 2025
May 09 19:04:06 pve-nas1 zpool[1777]: The pool can be imported, use 'zpool import -f' to import the pool.
May 09 19:04:06 pve-nas1 zpool[1777]: cannot import 'Rust': pool was previously in use from another system.
May 09 19:04:06 pve-nas1 zpool[1777]: Last accessed by pve-nas1 (hostid=58a1c56c) at Fri May  9 16:12:17 2025
May 09 19:04:06 pve-nas1 zpool[1777]: The pool can be imported, use 'zpool import -f' to import the pool.
May 09 19:04:06 pve-nas1 systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
May 09 19:04:06 pve-nas1 systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
May 09 19:04:06 pve-nas1 systemd[1]: Failed to start zfs-import-scan.service - Import ZFS pools by device scanning.
May 09 19:04:06 pve-nas1 systemd[1]: Reached target zfs-import.target - ZFS pool import target.
May 09 19:04:06 pve-nas1 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems...
May 09 19:04:06 pve-nas1 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev...
May 09 19:04:06 pve-nas1 zvol_wait[2125]: Testing 3 zvol links
May 09 19:04:06 pve-nas1 zvol_wait[2125]: All zvol links are now present.
May 09 19:04:06 pve-nas1 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev.
May 09 19:04:06 pve-nas1 systemd[1]: Reached target zfs-volumes.target - ZFS volumes are ready.
May 09 19:04:06 pve-nas1 systemd[1]: Finished zfs-mount.service - Mount ZFS filesystems.


vs

this where a script runs in initramfs to override the driver much earlier.... unknow is my script...

Code:
May 09 20:06:41 pve-nas1 unknow:  vfio-pci-bind starting
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:05:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:05:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:06:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:06:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:07:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:07:00.0
May 09 20:06:41 pve-nas1 kernel: sd 19:0:0:0: [sda] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata20.00: Entering standby power mode
May 09 20:06:41 pve-nas1 kernel: sd 20:0:0:0: [sdb] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: sd 21:0:0:0: [sdc] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: sd 22:0:0:0: [sdd] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata23.00: Entering standby power mode
May 09 20:06:41 pve-nas1 kernel: sd 23:0:0:0: [sde] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata24.00: Entering standby power mode
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:42:00.0 from ahci
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:42:00.0
May 09 20:06:41 pve-nas1 kernel: sd 29:0:0:0: [sdf] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata30.00: Entering standby power mode
May 09 20:06:41 pve-nas1 kernel: sd 30:0:0:0: [sdg] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata31.00: Entering standby power mode
May 09 20:06:41 pve-nas1 kernel: sd 31:0:0:0: [sdh] Synchronizing SCSI cache
May 09 20:06:41 pve-nas1 kernel: ata32.00: Entering standby power mode
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:42:00.1 from ahci
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:42:00.1
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:a1:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:a1:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:a3:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:a3:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:a5:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:a5:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:a7:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:a7:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e1:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e1:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e2:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e2:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e3:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e3:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e4:00.0 from nvme
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e4:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e6:00.0 from ahci
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e6:00.0
May 09 20:06:41 pve-nas1 unknow:  Unbound 0000:e6:00.1 from ahci
May 09 20:06:41 pve-nas1 unknow:  Set override to vfio-pci for 0000:e6:00.1
May 09 20:06:41 pve-nas1 kernel: VFIO - User Level meta-driver version: 0.3
May 09 20:06:41 pve-nas1 unknow: ✅ vfio-pci modprobe completed
May 09 20:06:41 pve-nas1 kernel: Btrfs loaded, zoned=yes, fsverity=yes
May 09 20:06:41 pve-nas1 kernel: spl: loading out-of-tree module taints kernel.
May 09 20:06:41 pve-nas1 kernel: zfs: module license 'CDDL' taints kernel.
May 09 20:06:41 pve-nas1 kernel: Disabling lock debugging due to kernel taint
May 09 20:06:41 pve-nas1 kernel: zfs: module license taints kernel.
May 09 20:06:41 pve-nas1 kernel: ZFS: Loaded module v2.2.7-pve2, ZFS pool version 5000, ZFS filesystem version 5

<cut>
May 09 20:07:24 pve-nas1 systemd[1]: Finished ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
May 09 20:07:24 pve-nas1 systemd[1]: Finished systemd-udev-settle.service - Wait for udev To Complete Device Initialization.
May 09 20:07:24 pve-nas1 systemd[1]: zfs-import-cache.service - Import ZFS pools by cache file was skipped because of an unmet condition check (ConditionFileNotEmpty=/etc/zfs/zpool.cache).
May 09 20:07:24 pve-nas1 systemd[1]: Starting zfs-import-scan.service - Import ZFS pools by device scanning...
May 09 20:07:24 pve-nas1 zpool[1463]: no pools available to import
May 09 20:07:24 pve-nas1 systemd[1]: Finished zfs-import-scan.service - Import ZFS pools by device scanning.
May 09 20:07:24 pve-nas1 systemd[1]: Reached target zfs-import.target - ZFS pool import target.
May 09 20:07:24 pve-nas1 systemd[1]: Starting zfs-mount.service - Mount ZFS filesystems...
May 09 20:07:24 pve-nas1 systemd[1]: Starting zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev...
May 09 20:07:24 pve-nas1 zvol_wait[1510]: Testing 3 zvol links
May 09 20:07:24 pve-nas1 zvol_wait[1510]: All zvol links are now present.
May 09 20:07:24 pve-nas1 systemd[1]: Finished zfs-volume-wait.service - Wait for ZFS Volume (zvol) links in /dev.
May 09 20:07:24 pve-nas1 systemd[1]: Reached target zfs-volumes.target - ZFS volumes ar
 
Last edited: