IOMMU Issues

BiscottiMuncher

New Member
Sep 25, 2023
4
0
1
Hello! Ive been trying go get IOMMU Groups and PCI-E pass through, mainly GPU, to properly work for quite a while on my new server hardware, my DMAR output looks like this. Thank you in advance and sorry for the long post, this has been a multiweek and multi-reinstall battle

1699427902923.png

I haven't been able to get Interrupt Remapping to work no matter what I do, Ive tried enabling it in the "iommu_unsafe_interrupts.conf" with
- `echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
and also
- echo "Y" > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts

I have only changed my Grub config to have "amd_iommu=on" and then updated my grub config and thats it, so I don't know if there is anything else that I should do there

Ive also added these lines to my "etx/modules"
1699428126398.png

The Main thing that has been bugging me is this, Some PCIE and Boardside devices are getting an IOMMU Group, fully numbered and useable, I can pass through a GbE NIC to any machine, but I cannot pass ANY GPU through to VM, as they have a "-1" iommu group number. I dont know why that is and I would love to have an answer lol.
1699428463573.png
- output from: pvesh get /nodes/pve/hardware/pci --pci-class-blacklist ""
None of these GPU's have any displays plugged into the back of them, they both function as intened when they need to display.

Here are the specs for my machine, The motherboard is at the latest version and it has everything that I can turn on in the AMD CBS Menu turned on that I can have.
1699428600326.png

Here is all the relevant history commands wise for what I have tried
1699429151483.png
 
Hello! Ive been trying go get IOMMU Groups and PCI-E pass through, mainly GPU, to properly work for quite a while on my new server hardware, my DMAR output looks like this. Thank you in advance and sorry for the long post, this has been a multiweek and multi-reinstall battle

View attachment 57746
This looks fine, but I have no experience myselft with Threadripper. It should have lots of PCIe lanes from the CPU and I would expect good IOMMU groups.
I haven't been able to get Interrupt Remapping to work no matter what I do, Ive tried enabling it in the "iommu_unsafe_interrupts.conf" with
- `echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
and also
- echo "Y" > /sys/module/vfio_iommu_type1/parameters/allow_unsafe_interrupts
Why do you need unsafe_interrupts? Usually stuff just works fine without it.
I have only changed my Grub config to have "amd_iommu=on" and then updated my grub config and thats it, so I don't know if there is anything else that I should do there

Ive also added these lines to my "etx/modules"
View attachment 57747
amd_iommu=on is not necessary because it is always on on AMD hardware that supports it. vifio_virqfd is no longer a module (but the manual has not been fixed yet).
Note that Proxmox can also use systemd-boot instead of GRUB. Are all IOMMU settings Enabled (not just Auto)? ACS, SR-IOV, AER, ARI, etc.
The Main thing that has been bugging me is this, Some PCIE and Boardside devices are getting an IOMMU Group, fully numbered and useable, I can pass through a GbE NIC to any machine, but I cannot pass ANY GPU through to VM, as they have a "-1" iommu group number. I dont know why that is and I would love to have an answer lol.
View attachment 57748
- output from: pvesh get /nodes/pve/hardware/pci --pci-class-blacklist ""
None of these GPU's have any displays plugged into the back of them, they both function as intened when they need to display.
Looks like IOMMU is working fine. But it is strange that some devices are not put in a IOMMU group. Would like to read the motherboard manual and which slots are used.
Maybe they are behind a PCIe multiplexer with limited IOMMU functionality. Can you put the devices in other PCIe slots? Anything in journalctl -b | grep -i iommu?
 
  • Like
Reactions: BiscottiMuncher
This looks fine, but I have no experience myselft with Threadripper. It should have lots of PCIe lanes from the CPU and I would expect good IOMMU groups.
Sorry for the late reply. That's what Ive seen and expected, I don't know if this board is actually going to do what I want it to do as ive only seen like one other thread referencing it on the internet.
Why do you need unsafe_interrupts? Usually stuff just works fine without it.
Every guide that Ive seen call fore interrupts to be enabled, and this install, kernel wise, is completely stock. I have only started to try and edit stuff for IOMMU this time around. I dont know why they would be disabled and why the unsafe_interrupts didnt work after rebotos and grub-updates.
amd_iommu=on is not necessary because it is always on on AMD hardware that supports it. vifio_virqfd is no longer a module (but the manual has not been fixed yet).
Note that Proxmox can also use systemd-boot instead of GRUB. Are all IOMMU settings Enabled (not just Auto)? ACS, SR-IOV, AER, ARI, etc.
I didnt even know about the amd_iommu. Ill go through my systemd-boot options and see whats turned on when I have some spare time.
Looks like IOMMU is working fine. But it is strange that some devices are not put in a IOMMU group. Would like to read the motherboard manual and which slots are used.
Maybe they are behind a PCIe multiplexer with limited IOMMU functionality. Can you put the devices in other PCIe slots? Anything in journalctl -b | grep -i iommu?
Here's the output from my journalctl. Looks pretty standard, although its only generating 20 IOMMU Groups.
When I had a GPU in the next slot down it also showed it as a -1 IOMMU group, which makes me think that it might just be targeting GPU's as I have a GbE PCIe NIC in slot and it has a good IOMMU group. Thank you again for your time, I appreciate it more than you know
1699597141516.png
 
Sorry for the late reply. That's what Ive seen and expected, I don't know if this board is actually going to do what I want it to do as ive only seen like one other thread referencing it on the internet.
Don't worry about timeliness, I'm not waiting for anything.
Every guide that Ive seen call fore interrupts to be enabled, and this install, kernel wise, is completely stock. I have only started to try and edit stuff for IOMMU this time around. I dont know why they would be disabled and why the unsafe_interrupts didnt work after rebotos and grub-updates.
But why do you or they need it. How do you notice that unsafe_interrupts works or does not work? I really don't know what problem is fixed by using that, or why you would need it. It's not standard: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_pci_passthrough
I didnt even know about the amd_iommu. Ill go through my systemd-boot options and see whats turned on when I have some spare time.
The manual does not tell to you add it, only for Intel. Invalid parameters are ignored, so they at least should not interfere. No need to make it a priority.
Here's the output from my journalctl. Looks pretty standard, although its only generating 20 IOMMU Groups.
When I had a GPU in the next slot down it also showed it as a -1 IOMMU group, which makes me think that it might just be targeting GPU's as I have a GbE PCIe NIC in slot and it has a good IOMMU group. Thank you again for your time, I appreciate it more than you know
Interesting that it only applies to GPUs, which I have not seen before. Did you disable Resizable BAR (or maybe your system does not have that?). Maybe try disabling above 4G decoding? I really don't know what's happening here.
The higher PCI ID's of your (42:00.0 and 43:00.0) are indeed not mentions in the addding to IOMMU groups. It would suspect the motherboard but there is also no mention of IOMMU chips that don't support all features and you noticed that it follows the GPUs, regardless of the PCIe slots. It's probably something Threadripper specific or maybe NUMA?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!