Proxmox goes offline when passthrough of HBA card to VM is applied

ZHS

Member
Jul 29, 2022
9
1
6
Hello,

I am using Proxmox VE v7.2.3 and I want to passthrough a PCIe RAID Controller Card (LSI MegaRAID SATA-SAS 9260-8i is the card in question, to be exact) with two 12 GB Ironwolf NAS drives.

When I tried to passthrough this card to my TrueNAS Core VM, the whole Proxmox froze and went offline and I had to restart the whole server.

I initally couldn't even boot the Proxmox node since that TrueNAS VM was set to autoboot, but I managed to fix it by switching IOMMU off in Proxmox recovery mode.

When everything was back up and running, I ran this command:

Code:
dmesg | grep 'remapping'

and I got this message:

0.880005] AMD-Vi: Interrupt remapping enabled

I also ran this command:

Code:
find /sys/kernel/iommu_groups/ -type l

and I got this message:

/sys/kernel/iommu_groups/7/devices/0000:00:08.0 /sys/kernel/iommu_groups/5/devices/0000:00:07.0 /sys/kernel/iommu_groups/13/devices/0000:09:00.1 /sys/kernel/iommu_groups/3/devices/0000:00:04.0 /sys/kernel/iommu_groups/11/devices/0000:08:00.0 /sys/kernel/iommu_groups/1/devices/0000:00:02.0 /sys/kernel/iommu_groups/8/devices/0000:00:08.1 /sys/kernel/iommu_groups/6/devices/0000:00:07.1 /sys/kernel/iommu_groups/14/devices/0000:09:00.3 /sys/kernel/iommu_groups/4/devices/0000:00:05.0 /sys/kernel/iommu_groups/12/devices/0000:09:00.0 /sys/kernel/iommu_groups/2/devices/0000:00:03.1 /sys/kernel/iommu_groups/2/devices/0000:07:00.2 /sys/kernel/iommu_groups/2/devices/0000:07:00.0 /sys/kernel/iommu_groups/2/devices/0000:00:03.0 /sys/kernel/iommu_groups/2/devices/0000:07:00.3 /sys/kernel/iommu_groups/2/devices/0000:07:00.1 /sys/kernel/iommu_groups/10/devices/0000:00:18.3 /sys/kernel/iommu_groups/10/devices/0000:00:18.1 /sys/kernel/iommu_groups/10/devices/0000:00:18.6 /sys/kernel/iommu_groups/10/devices/0000:00:18.4 /sys/kernel/iommu_groups/10/devices/0000:00:18.2 /sys/kernel/iommu_groups/10/devices/0000:00:18.0 /sys/kernel/iommu_groups/10/devices/0000:00:18.7 /sys/kernel/iommu_groups/10/devices/0000:00:18.5 /sys/kernel/iommu_groups/0/devices/0000:03:00.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.2 /sys/kernel/iommu_groups/0/devices/0000:00:01.2 /sys/kernel/iommu_groups/0/devices/0000:02:00.0 /sys/kernel/iommu_groups/0/devices/0000:00:01.0 /sys/kernel/iommu_groups/0/devices/0000:01:00.0 /sys/kernel/iommu_groups/0/devices/0000:06:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:08.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.1 /sys/kernel/iommu_groups/0/devices/0000:00:01.1 /sys/kernel/iommu_groups/0/devices/0000:05:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:04.0 /sys/kernel/iommu_groups/0/devices/0000:04:00.0 /sys/kernel/iommu_groups/9/devices/0000:00:14.3 /sys/kernel/iommu_groups/9/devices/0000:00:14.0

When I try to pass it through to that same VM, I get a screen like this:

Proxmox Screenshot.png

I am new to Proxmox, but my guess this means that this card has its dedicated IOMMU group.

I saw in other forum topic that one person solved it by placing the HBA in another PCIe slot, but I don't have any empty slots left on my motherboard.

Is there any way to solve this problem? Also, if I get a different card from another manufacturer, could I face the same issue?
 
/sys/kernel/iommu_groups/0/devices/0000:03:00.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.2 [QUOTE="ZHS, post: 487841, member: 158124"] When I try to pass it through to that same VM, I get a screen like this: I am new to Proxmox, but my guess this means that this card has its dedicated IOMMU group. I saw in other forum topic that one person solved it by placing the HBA in another PCIe slot, but I don't have any empty slots left on my motherboard. Is there any way to solve this problem? Also, if I get a different card from another manufacturer, could I face the same issue? [/QUOTE] /sys/kernel/iommu_groups/0/devices/0000:00:01.2 /sys/kernel/iommu_groups/0/devices/0000:02:00.0 /sys/kernel/iommu_groups/0/devices/0000:00:01.0 /sys/kernel/iommu_groups/0/devices/0000:01:00.0 /sys/kernel/iommu_groups/0/devices/0000:06:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:08.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.1 /sys/kernel/iommu_groups/0/devices/0000:00:01.1 /sys/kernel/iommu_groups/0/devices/0000:05:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:04.0 /sys/kernel/iommu_groups/0/devices/0000:04:00.0
As you can see, there are many more devices in IOMMU group 0. IOMMU groups cannot be split/shared between VMs or between VMs and the Proxmox host. As soon as you start the VM with device 04:00.0, the Proxmox host loses all devices in group 0 including at least a network device and an SSD. This will make Proxmox unreachable and probably crash.

You did not share the make and model of your motherboard, but it looks like a Ryzen platform. That means that, unless it is an X570 chipset, only devices provided by the CPU and one M.2 and one, or two depending on chipset/CPU, x16 PCIe slots (working at x8) are in separate groups. Everything else is in a big "chipset group".

You might be able to ignore ACS and "break up" the groups (and the security isolation) by using the pcie_acs_override=downstream,multifunction kernel parameter, as your own risk.
 
  • Like
Reactions: ZHS
Thank you so much for your answers and for claryfing things for me @Tahsin and @leesteken.

What motherboard are you using? It looks like that PCI-E group is shared with many other devices. Also, can you post your settings for modules, modprobe, cmdline, etc.

I am currently using ASRock B550 Steel Legend. Can you please explain to me how would I post my settings for modules, modprobe, and cmdline?
I am very new to Proxmox and the whole HomeLab virtualization thing. :)

As you can see, there are many more devices in IOMMU group 0. IOMMU groups cannot be split/shared between VMs or between VMs and the Proxmox host. As soon as you start the VM with device 04:00.0, the Proxmox host loses all devices in group 0 including at least a network device and an SSD. This will make Proxmox unreachable and probably crash.

You did not share the make and model of your motherboard, but it looks like a Ryzen platform. That means that, unless it is an X570 chipset, only devices provided by the CPU and one M.2 and one, or two depending on chipset/CPU, x16 PCIe slots (working at x8) are in separate groups. Everything else is in a big "chipset group".

You might be able to ignore ACS and "break up" the groups (and the security isolation) by using the pcie_acs_override=downstream,multifunction kernel parameter, as your own risk.

Oops, I was thinking that second set of numbers after ":" was the group. In hindsight, your explanation of IOMMU groups makes much more sense.
Yes, as you guessed I am using Ryzen platform and the motherboard I mentioned. Since the B550 chipset can't separate groups, how would I run this kernel command? From terminal or would I add as parameter while Proxmox is booting?
I will probably buy either ASRock Rack X570D4U-2L2T or ASRock X570M Pro4, but it seems like the availability could be a problem.
 
I can confirm the same issue with a Gigabyte B350Gaming 3 and LSI 9217-8i. Did you ever solve this?
There is no "issue", this is common behavior on any Ryzen motherboard except X570 as explained in post #3 of this thread. Just put the PCIe add-in card in a PCIe slot that is connected to the CPU (like the first x16 slot) so that it is in a IOMMU group without the motherboard network, SATA and USB controllers.
 
There is no "issue", this is common behavior on any Ryzen motherboard except X570 as explained in post #3 of this thread. Just put the PCIe add-in card in a PCIe slot that is connected to the CPU (like the first x16 slot) so that it is in a IOMMU group without the motherboard network, SATA and USB controllers.
Yep, learned that about 20 minutes ago. I had to pick between pass thru of my GPU or my HBA. Fortunately it is possible to pass through the physical disc (why is that not in the webgui?), so I picked that one. Thanks for confirming what I managed to get working (I landed here through Google so hopefully this will help others).

In my case I had to put it in the first x16 slot. My mobo has x1, x16 - only this one has it's own group, x16, x1 and x16

Btw I'm on Gigabyte AB350 Gaming 3, BIOS F52e, Ryzen 5 1600
 
Last edited:
Yep, learned that about 20 minutes ago. I had to pick between pass thru of my GPU or my HBA. Fortunately it is possible to pass through the physical disc (why is that not in the webgui?), so I picked that one. Thanks for confirming what I managed to get working (I landed here through Google so hopefully this will help others).

In my case I had to put it in the first x16 slot. My mobo has x1, x16 - only this one has it's own group, x16, x1 and x16

Btw I'm on Gigabyte AB350 Gaming 3, BIOS F52e, Ryzen 5 1600
so you can't use your GPU in passthrough mode?
 
so you can't use your GPU in passthrough mode?
Now I can, but I had to use the ACS patch to split my IOMMU groups even further in software

Added this to my grub config

quiet amd_iommu=on pcie_acs_override=downstream,multifunction

Some say this is insecure, but I haven't found another way to do it