Proxmox goes offline when passthrough of HBA card to VM is applied

ZHS

Member
Jul 29, 2022
9
0
6
Hello,

I am using Proxmox VE v7.2.3 and I want to passthrough a PCIe RAID Controller Card (LSI MegaRAID SATA-SAS 9260-8i is the card in question, to be exact) with two 12 GB Ironwolf NAS drives.

When I tried to passthrough this card to my TrueNAS Core VM, the whole Proxmox froze and went offline and I had to restart the whole server.

I initally couldn't even boot the Proxmox node since that TrueNAS VM was set to autoboot, but I managed to fix it by switching IOMMU off in Proxmox recovery mode.

When everything was back up and running, I ran this command:

Code:
dmesg | grep 'remapping'

and I got this message:

0.880005] AMD-Vi: Interrupt remapping enabled

I also ran this command:

Code:
find /sys/kernel/iommu_groups/ -type l

and I got this message:

/sys/kernel/iommu_groups/7/devices/0000:00:08.0 /sys/kernel/iommu_groups/5/devices/0000:00:07.0 /sys/kernel/iommu_groups/13/devices/0000:09:00.1 /sys/kernel/iommu_groups/3/devices/0000:00:04.0 /sys/kernel/iommu_groups/11/devices/0000:08:00.0 /sys/kernel/iommu_groups/1/devices/0000:00:02.0 /sys/kernel/iommu_groups/8/devices/0000:00:08.1 /sys/kernel/iommu_groups/6/devices/0000:00:07.1 /sys/kernel/iommu_groups/14/devices/0000:09:00.3 /sys/kernel/iommu_groups/4/devices/0000:00:05.0 /sys/kernel/iommu_groups/12/devices/0000:09:00.0 /sys/kernel/iommu_groups/2/devices/0000:00:03.1 /sys/kernel/iommu_groups/2/devices/0000:07:00.2 /sys/kernel/iommu_groups/2/devices/0000:07:00.0 /sys/kernel/iommu_groups/2/devices/0000:00:03.0 /sys/kernel/iommu_groups/2/devices/0000:07:00.3 /sys/kernel/iommu_groups/2/devices/0000:07:00.1 /sys/kernel/iommu_groups/10/devices/0000:00:18.3 /sys/kernel/iommu_groups/10/devices/0000:00:18.1 /sys/kernel/iommu_groups/10/devices/0000:00:18.6 /sys/kernel/iommu_groups/10/devices/0000:00:18.4 /sys/kernel/iommu_groups/10/devices/0000:00:18.2 /sys/kernel/iommu_groups/10/devices/0000:00:18.0 /sys/kernel/iommu_groups/10/devices/0000:00:18.7 /sys/kernel/iommu_groups/10/devices/0000:00:18.5 /sys/kernel/iommu_groups/0/devices/0000:03:00.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.2 /sys/kernel/iommu_groups/0/devices/0000:00:01.2 /sys/kernel/iommu_groups/0/devices/0000:02:00.0 /sys/kernel/iommu_groups/0/devices/0000:00:01.0 /sys/kernel/iommu_groups/0/devices/0000:01:00.0 /sys/kernel/iommu_groups/0/devices/0000:06:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:08.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.1 /sys/kernel/iommu_groups/0/devices/0000:00:01.1 /sys/kernel/iommu_groups/0/devices/0000:05:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:04.0 /sys/kernel/iommu_groups/0/devices/0000:04:00.0 /sys/kernel/iommu_groups/9/devices/0000:00:14.3 /sys/kernel/iommu_groups/9/devices/0000:00:14.0

When I try to pass it through to that same VM, I get a screen like this:

Proxmox Screenshot.png

I am new to Proxmox, but my guess this means that this card has its dedicated IOMMU group.

I saw in other forum topic that one person solved it by placing the HBA in another PCIe slot, but I don't have any empty slots left on my motherboard.

Is there any way to solve this problem? Also, if I get a different card from another manufacturer, could I face the same issue?
 
/sys/kernel/iommu_groups/0/devices/0000:03:00.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.2 [QUOTE="ZHS, post: 487841, member: 158124"] When I try to pass it through to that same VM, I get a screen like this: I am new to Proxmox, but my guess this means that this card has its dedicated IOMMU group. I saw in other forum topic that one person solved it by placing the HBA in another PCIe slot, but I don't have any empty slots left on my motherboard. Is there any way to solve this problem? Also, if I get a different card from another manufacturer, could I face the same issue? [/QUOTE] /sys/kernel/iommu_groups/0/devices/0000:00:01.2 /sys/kernel/iommu_groups/0/devices/0000:02:00.0 /sys/kernel/iommu_groups/0/devices/0000:00:01.0 /sys/kernel/iommu_groups/0/devices/0000:01:00.0 /sys/kernel/iommu_groups/0/devices/0000:06:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:08.0 /sys/kernel/iommu_groups/0/devices/0000:02:00.1 /sys/kernel/iommu_groups/0/devices/0000:00:01.1 /sys/kernel/iommu_groups/0/devices/0000:05:00.0 /sys/kernel/iommu_groups/0/devices/0000:03:04.0 /sys/kernel/iommu_groups/0/devices/0000:04:00.0
As you can see, there are many more devices in IOMMU group 0. IOMMU groups cannot be split/shared between VMs or between VMs and the Proxmox host. As soon as you start the VM with device 04:00.0, the Proxmox host loses all devices in group 0 including at least a network device and an SSD. This will make Proxmox unreachable and probably crash.

You did not share the make and model of your motherboard, but it looks like a Ryzen platform. That means that, unless it is an X570 chipset, only devices provided by the CPU and one M.2 and one, or two depending on chipset/CPU, x16 PCIe slots (working at x8) are in separate groups. Everything else is in a big "chipset group".

You might be able to ignore ACS and "break up" the groups (and the security isolation) by using the pcie_acs_override=downstream,multifunction kernel parameter, as your own risk.
 
  • Like
Reactions: ZHS
Thank you so much for your answers and for claryfing things for me @Tahsin and @leesteken.

What motherboard are you using? It looks like that PCI-E group is shared with many other devices. Also, can you post your settings for modules, modprobe, cmdline, etc.

I am currently using ASRock B550 Steel Legend. Can you please explain to me how would I post my settings for modules, modprobe, and cmdline?
I am very new to Proxmox and the whole HomeLab virtualization thing. :)

As you can see, there are many more devices in IOMMU group 0. IOMMU groups cannot be split/shared between VMs or between VMs and the Proxmox host. As soon as you start the VM with device 04:00.0, the Proxmox host loses all devices in group 0 including at least a network device and an SSD. This will make Proxmox unreachable and probably crash.

You did not share the make and model of your motherboard, but it looks like a Ryzen platform. That means that, unless it is an X570 chipset, only devices provided by the CPU and one M.2 and one, or two depending on chipset/CPU, x16 PCIe slots (working at x8) are in separate groups. Everything else is in a big "chipset group".

You might be able to ignore ACS and "break up" the groups (and the security isolation) by using the pcie_acs_override=downstream,multifunction kernel parameter, as your own risk.

Oops, I was thinking that second set of numbers after ":" was the group. In hindsight, your explanation of IOMMU groups makes much more sense.
Yes, as you guessed I am using Ryzen platform and the motherboard I mentioned. Since the B550 chipset can't separate groups, how would I run this kernel command? From terminal or would I add as parameter while Proxmox is booting?
I will probably buy either ASRock Rack X570D4U-2L2T or ASRock X570M Pro4, but it seems like the availability could be a problem.
 
I can confirm the same issue with a Gigabyte B350Gaming 3 and LSI 9217-8i. Did you ever solve this?
There is no "issue", this is common behavior on any Ryzen motherboard except X570 as explained in post #3 of this thread. Just put the PCIe add-in card in a PCIe slot that is connected to the CPU (like the first x16 slot) so that it is in a IOMMU group without the motherboard network, SATA and USB controllers.
 
There is no "issue", this is common behavior on any Ryzen motherboard except X570 as explained in post #3 of this thread. Just put the PCIe add-in card in a PCIe slot that is connected to the CPU (like the first x16 slot) so that it is in a IOMMU group without the motherboard network, SATA and USB controllers.
Yep, learned that about 20 minutes ago. I had to pick between pass thru of my GPU or my HBA. Fortunately it is possible to pass through the physical disc (why is that not in the webgui?), so I picked that one. Thanks for confirming what I managed to get working (I landed here through Google so hopefully this will help others).

In my case I had to put it in the first x16 slot. My mobo has x1, x16 - only this one has it's own group, x16, x1 and x16

Btw I'm on Gigabyte AB350 Gaming 3, BIOS F52e, Ryzen 5 1600
 
Last edited:
Yep, learned that about 20 minutes ago. I had to pick between pass thru of my GPU or my HBA. Fortunately it is possible to pass through the physical disc (why is that not in the webgui?), so I picked that one. Thanks for confirming what I managed to get working (I landed here through Google so hopefully this will help others).

In my case I had to put it in the first x16 slot. My mobo has x1, x16 - only this one has it's own group, x16, x1 and x16

Btw I'm on Gigabyte AB350 Gaming 3, BIOS F52e, Ryzen 5 1600
so you can't use your GPU in passthrough mode?
 
so you can't use your GPU in passthrough mode?
Now I can, but I had to use the ACS patch to split my IOMMU groups even further in software

Added this to my grub config

quiet amd_iommu=on pcie_acs_override=downstream,multifunction

Some say this is insecure, but I haven't found another way to do it
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!