PCI Passthrough Selection with Identical Devices

radensb

Active Member
Jul 24, 2018
3
4
43
41
PCI Passthrough has been discussed many times, but I cannot find information on my use case. I have gotten PCI Passthrough to work on my previous system just fine when passing a HBA to my OpenMediaVault VM with PVE 5.3.

On my current system, I installed PVE 6.1 and hit a snag. I now have two of the same HBA controllers in the system. Both are good ol' SAS2008 LSI controllers using the same driver and have the same vendor ID codes. The only way to distinguish them is the Bus ID's. One of the SAS2008 controllers contains the Proxmox boot and vmfs drives. The other is passed to OpemMediaVault. In my initial attempt, I rolled through the PCI Passthrough How To, as I did previously, and blacklisted the driver for the device.... OOPS. There went access to my boot drives! After removing the blacklisted driver via LiveCD, I gave PCI Passthrough to OMV a try, and it worked! The VM booted and I had access to the drives on the passed HBA.

However, I got disconcerting messages on the servers login screen (log messages?) indicating that drives from the Linux RAID that is on the passed HBA were failing and that the RAID was operating on reduced drive count. I got a few of these errors back-to-back with the available drive count in the RAID decreasing by 1 each message. I confirmed that with the OMV VM off, Proxmox can see drives from both HBA controllers and as soon as I booted OMV, the drives on the passed HBA controller were no longer visible in Proxmox. My guess is that when the OMV VM boots, and takes control of the passed HBA and its attached drives, Proxmox sees the disappearance of the drives in the RAID as some kind of failure? An uncomfortable scenario.

Since blacklisting the driver is not an option (as I need to be able to boot), is there any other way to tell ProxMox to ignore a device based on PCI Bus ID (address?) so that it never sees the HBA being passed in the first place? That seems like the way this should be configured and how my old system operated as the HBA for the boot drives was different than the HBA for OMV pass through, so the driver blacklist was not a problem then. Or is there no real concern to striping the drives from ProxMox when the HBA access is passed and I can safely ignore the error/warning messages since I know (or think I know) what is happening? Since the OMV VM is set to boot automatically and the server is not restarted often in normal use, this situation doesn't happen often. However, if it could be destructive in some way to the drives on the passed HBA, I would like to correct it.

Appreciate the help in understanding this better! Thanks!
 
this is a tricky situation and to solve it requires quite a bit of linux knowledge...
basically what you need is to execute a script in initramfs which overrides the driver for specific devices in sysfs before the vfio-pci module is loaded
this way that device gets bind to vfio-pci and the other to the regular driver

there is an example/guide here: http://vfio.blogspot.com/2015/05/vfio-gpu-how-to-series-part-3-host.html
altough i never needed to do that and the blog is about centos/fedora? so they use dracut instead of update-initramfs etc..
 
  • Like
Reactions: radensb
@dcsapak
Thanks for the response. I am not super familiar with Linux, but a good buddy of mine is and between him and the link provided, I have gotten this far.
Note: it doesn't work..., but I think its close. Any input or suggestions (specially things that may be PVE specific) are greatly appreciated.

Process:
Find device:
Code:
#lspci

01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

#find /sys/devices/pci* | grep 01:00.0

...

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/…

...
Make vfio-pci load first:
Code:
#nano /etc/default/grub
      GRUB_CMDLINE_LINUX=”rd.driver.pre=vfio-pci” # modify this line
#update-grub
Add the install option to iommu.conf (create or modify file)
Code:
#nano /etc/modprobe.d/iommu.conf
      options vfio_iommu_type1 allow_unsafe_interrupts=1
      install vfio-pci /sbin/vfio-pci-override-sas2008-addr1.sh
Add override to initramfs via script:
Code:
#nano /etc/initramfs-tools/hooks/vfio-pci-override-sas2008-addr1-hook.sh
      #!/bin/sh -e
      PREREQS=""
      case $1 in
      prereqs) echo "${PREREQS}"; exit 0;;
      esac
      . /usr/share/initramfs-tools/hook-functions
      copy_exec /sbin/vfio-pci-override-sas2008-addr1.sh /sbin
#chmod 755 vfio-pci-override-sas2008-addr1-hook.sh
Add modules to initramfs to make it the same as /etc/modules? (not sure if this is even needed??)
Code:
#nano /etc/initramfs-tools/modules
      vfio_pci
      vfio
      vfio_iommu_type1
      vfio_virqfd
Make override script
Code:
#nano /sbin/vfio-pci-override-sas2008-addr1.sh
      #!/bin/sh -e
      echo "vfio-pci"  >
      /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/driver_override
      modprobe -i vfio-pci
#chmod 755 vfio-pci-override-sas2008-addr1.sh
#update-initramfs -u -k all

Then, check that the file is there:
Code:
#lsinitramfs -l /boot/initrd.img-5.3.13-1-pve | grep sas2008
-rwxr-xr-x 1 root root 196 Jan 10 15:55 usr/sbin/vfio-pci-override-sas2008-addr1.sh (YAY!!!)

#reboot

#lspci | grep 01.00
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02) --dammit, still there!

EDIT::
I was informed that even if this was working, I would still see the device with lspci. So, I am checking with fdisk -l and making sure that the disk that is on that HBA is not listed. Currently, it still is. I have, however, confirmed that my scripts are running.

#device that I am trying to pass:
root@pve:/etc/initramfs-tools/hooks# cat /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/driver_override
vfio-pci

#device that the host needs:
root@pve:~# cat /sys/devices/pci0000:00/0000:00:09.0/0000:04:00.0/driver_override
(null)

So clearly, the scripts are running. However, I still see the disk on that HBA and running:
Code:
root@pve:~#  lspci -s 01:00.0 -v | grep driver
        Kernel driver in use: mpt3sas
indicates that the device I am trying to pass is still using the kernel driver.
 
Last edited:
I figured it out.

Turns out the process can be simplified a great deal by adding your bind script to:
Code:
/etc/initramfs-tools/scripts/init-top/
init-top the scripts in this directory are the first scripts to be executed after
sysfs and procfs have been mounted. It also runs the udev hook for populating the
/dev tree (udev will keep running until init-bottom).

The process I followed::

Find desired device

Code:
# lspci
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

# find /sys/devices/pci* | grep 01:00.0
...
/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/…
...

Make bind script
Code:
# nano /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
      #!/bin/sh -e
      echo "vfio-pci"  > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/driver_override

      modprobe -i vfio-pci

# chmod 755 bind_vfio.sh
# chown root:root /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
# update-initramfs -u -k all

Add vfio module to initramfs
Code:
# nano /etc/initramfs-tools/modules
    vfio-pci

# update-initramfs -u -k all
# reboot

Check
root@pve:~# lspci -s 01:00.0 -v
01:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)
Subsystem: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
...
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

root@pve:~# lspci -s 04:00.0 -v
04:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
Subsystem: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]
...
Kernel driver in use: mpt3sas
Kernel modules: mpt3sas

Profit.

So we can see that the SAS2008 at 01:00.0 is assigned the vfio-pci driver. While I can still see the device with lspci, PVE cannot see any of the drives attached to it. The SAS2008 at 04:00.0 is assigned the mpt3sas driver and all the drives attached to it (my boot and vmfs drives!) are available in PVE. Following the PVE published directions to get passthrough running now results is the desired passthrough operation to the VM.

Interesting (infuriating) Note: The name of the bind script seems to matter (WHY!?). My original script name (vfio-pci-override-sas2008-addr1.sh) contained the exact same contents as the bind_vfio.sh script I refer to above. However, when the script was named vfio-pci-override-sas2008-addr1.sh, it didnt work, even though I confirmed that it did run. I would get mpt3sas loaded instead of vfio-pci for the kernel driver. I then shortened the name to vfio_addr1.sh, but THAT DIDN'T WORK EITHER!? I recalled seeing bind_vfio.sh written somewhere, so I tried that on a whim - and it worked. Was it because the others had numbers in it? Who knows... but bind_vifo.sh works. Crazy...
 
@radensb: You are my hero!! I was fighting this the whole weekend, but I wasn't paying close attention to your infuriating note at the end of your post regarding the name of the script. Once again for anybody reading this, it should be named exactly like this:

bind_vfio.sh

Funny thing is that although the second LSI SAS card in my system was still using the mpt3sas drive after doing all the configuration (with the exception of the name of the script), once turning on the related VM the vfio-pci driver was getting loaded. Anyhow, now everything is spot on in my Proxmox installation.

It would be good to put this information in the wiki/Proxmox documentation. Again, many thanks to radensb :)
 
Last edited:
IMPORTANT: see EDIT #3 as to why this script doesn't work.

I modified a bit the script from @radensb to be more dynamic and support several devices and separate the script from the configuration.

My main motivation is to install the script using Saltstack on every PVE host. Then on each PVE host I will define the host-specific PCI devices to be passed through to the guest(s).

/etc/initramfs-tools/scripts/init-top/bind_vfio.sh
Bash:
#!/bin/bash

# Configuration file
config="/etc/default/vfio-pci-devices"

if [[ -f "$config" ]]; then
    # Load configuration
    echo "Loading VFIO_PCI configuration from $config."
    source $config
else
    # Configuration file doesn't exist
    echo "No configuration $config exists."
    echo "No driver will be overridden."
fi

# Check if array is empty
if [[ -z "${VFIO_PCI_DEVICES}" ]]; then
   echo "No device specified in $config"
   echo "No driver will be overridden"
else
   # Device address found by lspci
   for device in "${VFIO_PCI_DEVICES[@]}"
   do
      # Need to swap the :00 part with what is in front
      # e.g.03:00.0 -> 00:03.0
      device_swapped=$(echo $device | sed  -e 's#\(..\)\:\(..\).\(.\)#\2:\1\.\3#g')

      # Define target
      target="/sys/devices/pci0000:00/0000:${device_swapped}/driver_override"

      # Echo
      echo "Setting up PCI device $device to use VFIO_PCI driver"
      echo "   Setting configuration in ${target}."

      # Override default driver with vfio-pci
      # Allows the device to be ignored by the host and be passed-through to the guest
      echo "vfio-pci"  > "${target}"
    done
fi

# (Re)load module
modprobe -i vfio-pci

/etc/default/vfio-pci-devices
Bash:
#!/bin/bash

# Pass devices to guest
VFIO_PCI_DEVICES=("03:00.0")

# Don't pass any devices to guest
#VFIO_PCI_DEVICES=()

I use only bash in my environment and don't have much experience with other shells.
Feel free to adapt ;).


EDIT #1: Even though the script is clearly in the initramfs, it's not being executed for whatever reason, since the mpt3sas driver is loaded for both cards (1 which I want for the host, 1 which I want for the guest).


lsinitramfs -l /boot/initrd.img-5.3.18-3-pve | grep vfio
Bash:
-rw-r--r--   1 root     root           51 May  1 09:35 etc/modprobe.d/vfio.conf
-rwxr-xr-x   1 root     root         1196 May  1 10:28 scripts/init-top/bind_vfio.sh
drwxr-xr-x   3 root     root            0 May  1 10:43 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio
drwxr-xr-x   2 root     root            0 May  1 10:43 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio/pci
-rw-r--r--   1 root     root        89464 Mar 17 16:33 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio/pci/vfio-pci.ko
-rw-r--r--   1 root     root        57736 Mar 17 16:33 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio/vfio.ko
-rw-r--r--   1 root     root        36848 Mar 17 16:33 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio/vfio_iommu_type1.ko
-rw-r--r--   1 root     root        11064 Mar 17 16:33 usr/lib/modules/5.3.18-3-pve/kernel/drivers/vfio/vfio_virqfd.ko

I guess that maybe bash isn't available in the early phase of the startup sequence ?

EDIT #2: I guess the problem is also actually the configuration file. Nothing external to the script can be used :(.

EDIT #3: confirmed 2 problems, one being the wrong PCI address (you don't need to swap the two numbers), the other being the fact that likely neither grep nor sed can be used in the pre-boot shell environment. Therefore I will just go back to the solution proposed by @radensb and just directly write from Saltstack into a template and do the for loop there. It's a bit of a shame but it appears it's the only way.
 
Last edited:
@radensb

Thank you, I passed two SAS HBA's in IT mode with firmware ROM, one for my proxmox machine, one for a VM itself.
When I set those up a few weeks ago, everything went fine, they had their own device ID.
After I changed the SSDs for bigger ones and had to do some fiddling to get them working properly, the device ID of the Dell H310 and the H210 apparently got set to their LSI 9211-8i firmware or something?
Not immediatly, but after a few reboots and a bit without power.

The device ID in VFIO got changed to something completely different, which now was the same on both cards.
lspci -nn shows me following output:
04:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
05:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)

The steps in your post work fine.
I thought I'd add this for someone else that is googling strange passthrough issues, since it did take me a while to find this post.

Some of the keywords I have been using, since those apparently only count for GPUs?
Maybe this will help out someone else as well.

pci passthrough identical device id
vfio identical device id
vfio identical cards
pci stub
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!