Pci passthrough (qla2xxx) crashes host

moxmox

Active Member
Aug 14, 2019
63
8
28
44
I am trying to get my QLogic Corp. ISP2432-based 4Gb Fibre Channel card passed through to a Centos 7 guest, I have the pci passthrough setup and I can see the qla device in centos so the passthrough all seems to be working.

However when centos guest starts up and initialises the fibre target the whole host crashes.

I have blacklisted the qla2xxx module on the host and have confirmed in the hosts dmesg that its not using the device directly.

I have also tried the other method of stopping the host using the card.

Code:
options vfio-pci ids=1234:5678

Anything else I can do?

One this to note is I had to run this

Code:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

because without doing this after adding the pci card to the guest it would not start.

It seemed like I shouldn't need to add that as the result from script (on this https://pve.proxmox.com/wiki/Pci_passthrough) "Alternatively, run the following script to determine if your system has interrupt remapping support: " was ok.

Also the dmesg mesage here suggested that I should not have to allow unsafe_interrupts?

Code:
[    1.816549] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap c90780106f0462 ecap f020fe
 
Quick response :)

Code:
05:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)
05:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)

they are in group 16

which is only that device

Code:
root@r710-pve:~#  find /sys/kernel/iommu_groups/ -type l | grep 16
/sys/kernel/iommu_groups/16/devices/0000:05:00.1
/sys/kernel/iommu_groups/16/devices/0000:05:00.0

full iommu groups

Code:
/sys/kernel/iommu_groups/17/devices/0000:fe:00.1
/sys/kernel/iommu_groups/17/devices/0000:fe:00.0
/sys/kernel/iommu_groups/7/devices/0000:00:09.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.4
/sys/kernel/iommu_groups/25/devices/0000:ff:03.2
/sys/kernel/iommu_groups/25/devices/0000:ff:03.0
/sys/kernel/iommu_groups/25/devices/0000:ff:03.1
/sys/kernel/iommu_groups/15/devices/0000:03:00.0
/sys/kernel/iommu_groups/5/devices/0000:00:06.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.0
/sys/kernel/iommu_groups/23/devices/0000:ff:00.1
/sys/kernel/iommu_groups/13/devices/0000:01:00.0
/sys/kernel/iommu_groups/13/devices/0000:01:00.1
/sys/kernel/iommu_groups/3/devices/0000:00:04.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.0
/sys/kernel/iommu_groups/21/devices/0000:fe:05.3
/sys/kernel/iommu_groups/21/devices/0000:fe:05.1
/sys/kernel/iommu_groups/21/devices/0000:fe:05.2
/sys/kernel/iommu_groups/11/devices/0000:08:03.0
/sys/kernel/iommu_groups/11/devices/0000:00:1e.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.1
/sys/kernel/iommu_groups/28/devices/0000:ff:06.2
/sys/kernel/iommu_groups/28/devices/0000:ff:06.0
/sys/kernel/iommu_groups/28/devices/0000:ff:06.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.5
/sys/kernel/iommu_groups/18/devices/0000:fe:02.3
/sys/kernel/iommu_groups/18/devices/0000:fe:02.1
/sys/kernel/iommu_groups/18/devices/0000:fe:02.4
/sys/kernel/iommu_groups/18/devices/0000:fe:02.2
/sys/kernel/iommu_groups/18/devices/0000:fe:02.0
/sys/kernel/iommu_groups/8/devices/0000:00:14.1
/sys/kernel/iommu_groups/8/devices/0000:00:14.2
/sys/kernel/iommu_groups/8/devices/0000:00:14.0
/sys/kernel/iommu_groups/26/devices/0000:ff:04.2
/sys/kernel/iommu_groups/26/devices/0000:ff:04.0
/sys/kernel/iommu_groups/26/devices/0000:ff:04.3
/sys/kernel/iommu_groups/26/devices/0000:ff:04.1
/sys/kernel/iommu_groups/16/devices/0000:05:00.1
/sys/kernel/iommu_groups/16/devices/0000:05:00.0
/sys/kernel/iommu_groups/6/devices/0000:00:07.0
/sys/kernel/iommu_groups/24/devices/0000:ff:02.5
/sys/kernel/iommu_groups/24/devices/0000:ff:02.3
/sys/kernel/iommu_groups/24/devices/0000:ff:02.1
/sys/kernel/iommu_groups/24/devices/0000:ff:02.4
/sys/kernel/iommu_groups/24/devices/0000:ff:02.2
/sys/kernel/iommu_groups/24/devices/0000:ff:02.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.0
/sys/kernel/iommu_groups/14/devices/0000:02:00.1
/sys/kernel/iommu_groups/4/devices/0000:00:05.0
/sys/kernel/iommu_groups/22/devices/0000:fe:06.3
/sys/kernel/iommu_groups/22/devices/0000:fe:06.1
/sys/kernel/iommu_groups/22/devices/0000:fe:06.2
/sys/kernel/iommu_groups/22/devices/0000:fe:06.0
/sys/kernel/iommu_groups/12/devices/0000:00:1f.2
/sys/kernel/iommu_groups/12/devices/0000:00:1f.0
/sys/kernel/iommu_groups/2/devices/0000:00:03.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.2
/sys/kernel/iommu_groups/20/devices/0000:fe:04.0
/sys/kernel/iommu_groups/20/devices/0000:fe:04.3
/sys/kernel/iommu_groups/20/devices/0000:fe:04.1
/sys/kernel/iommu_groups/10/devices/0000:00:1d.1
/sys/kernel/iommu_groups/10/devices/0000:00:1d.0
/sys/kernel/iommu_groups/10/devices/0000:00:1d.7
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/19/devices/0000:fe:03.1
/sys/kernel/iommu_groups/19/devices/0000:fe:03.4
/sys/kernel/iommu_groups/19/devices/0000:fe:03.2
/sys/kernel/iommu_groups/19/devices/0000:fe:03.0
/sys/kernel/iommu_groups/9/devices/0000:00:1a.1
/sys/kernel/iommu_groups/9/devices/0000:00:1a.0
/sys/kernel/iommu_groups/9/devices/0000:00:1a.7
/sys/kernel/iommu_groups/27/devices/0000:ff:05.3
/sys/kernel/iommu_groups/27/devices/0000:ff:05.1
/sys/kernel/iommu_groups/27/devices/0000:ff:05.2
/sys/kernel/iommu_groups/27/devices/0000:ff:05.0
 
options vfio-pci ids=1234:5678
just to be on the safe side - you need to add the actual device's id here (not literally 1234:5678) - `lspci -nnk` output needs to indicate that the driver in use for the FC-card is vfio
 
yes I just copied that from your help

in my config file I actually have (/etc/modprobe.d/vfio-pci.conf)

Code:
options vfio-pci ids=1077:2432

Code:
lspci -nnk

05:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)

        Subsystem: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:0138]

        Kernel modules: qla2xxx

05:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)

        Subsystem: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:0138]

        Kernel modules: qla2xxx

are you saying that as it says kernel modules qla2xxx - that is incorrect? should it say vfio?
 
After starting the vm it does say this

Code:
05:00.0 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)

        Subsystem: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:0138]

        Kernel driver in use: vfio-pci

        Kernel modules: qla2xxx

05:00.1 Fibre Channel [0c04]: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:2432] (rev 03)

        Subsystem: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA [1077:0138]

        Kernel driver in use: vfio-pci

        Kernel modules: qla2xxx
 
but then the vm still crashes the whole host once it initilaises the fc target
 
hmm - that sounds about correct (no driver in use until you start the guest - then vfio-pci)

Leaves the question why you need the allow_unsafe_interrupts option - see question 8 in http://vfio.blogspot.com/2014/08/vfiovga-faq.html

because without doing this after adding the pci card to the guest it would not start.
What's the error message without the parameter?

Else - the logs when the host crashes would be interesting (maybe you can see something on the console - or configuring remote syslog also can help to catch messages when a machine crashes)
(in any case make sure to enable persistent journalling (`mkdir /var/log/journal; systemctl restart systemd-journald`))

I hope this helps!
 
seems this is the problem :

failed to setup container for group 16: Failed to set iommu for container: Operation not permitted

Just so you know I have used passthrough on the same fc card on this machine in vmware esxi and it worked fine. (hardware is a Dell R710 so pretty standard I assume)


Virtual Environment 6.0-6
Node 'r710-pve'
User name:
Only Errors:
Logs
()
kvm: -device vfio-pci,host=05:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on: vfio 0000:05:00.0: failed to setup container for group 16: Failed to set iommu for container: Operation not permitted
TASK ERROR: start failed: command '/usr/bin/kvm -id 102 -name FreeNAS -chardev 'socket,id=qmp,path=/var/run/qemu-server/102.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/102.pid -daemonize -smbios 'type=1,uuid=9eb8ceda-9bfe-40c6-ba5b-17c9b0bd17dc' -smp '4,sockets=4,cores=1,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/102.vnc,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 4096 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'vmgenid,guid=74aac906-c300-469b-91ed-f32e363984c5' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'vfio-pci,host=05:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on' -device 'vfio-pci,host=05:00.1,id=hostpci0.1,bus=pci.0,addr=0x10.1' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:56a2acb136d0' -drive 'file=/dev/zvol/rpool/data/vm-102-disk-0,if=none,id=drive-ide0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -drive 'file=/dev/zvol/pool_3tb/vm-102-disk-0,if=none,id=drive-ide1,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=1,drive=drive-ide1,id=ide1' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=CE:33:17:5D:9A:8C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc'' failed: exit code 1
 
kvm: -device vfio-pci,host=05:00.0,id=hostpci0.0,bus=pci.0,addr=0x10.0,multifunction=on: vfio 0000:05:00.0: failed to setup container for group 16: Failed to set iommu for container: Operation not permitted
hmm - seems that it might be an incompatibility with the system - if there's a BIOS update available it's worth to update and see if this fixes the problem - also check the various BIOS settings regarding vt-d, SR-IOV ...
 
I updated to latest dell bios and also turned on SRIOV - still the same issue. I have also tried the pcie=on flag for the pci device.

Its strange I have had it working on vmware fine.

I have just read about using the pci-stub module as per here, is that still something that might work?

https://forum.proxmox.com/threads/pci-passthrough-issues.21889/

seem to be running out of things to try

thanks!
 
did you manage to get this resolved? I'm hitting exactly the same problem here.
 
Not sure if this is exactly relevant to your case but I had an issue where PCI passthrough of a NIC was causing the host to crash.
I had only one VM at the time on my host and I had assigned all available memory to the VM (16GB out of 16GB). As soon as the VM would start the host would crash. Also because I had set the VM to auto start on boot I thought that the host was crashing all the time when it was the boot of the VM that was triggering the crash.
To the point before finding the solution I tried:
1. Trying different NIC's/in different IOMMU groups
2. Disabling/enabling ROM-BAR
3. Leaving one NIC to the host. (I read somewhere that if you passthrough all your NIC's this may cause crashes)

In the end what solved it for me was reducing the memory.
When I reduced it (16GB -> 12GB with ballooning on) the VM started and the device was passthrough successfully.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!