Asus z8pe series and IOMMU support

HarryNerd

New Member
Mar 25, 2024
13
0
1
I've been through everything in the BIOS line by line and enabled everything I could find that even seemed at all related but I still get the same error when I try to add a PCI device to a VM in Proxmox VE:
Code:
No IOMMU detected, please activate it.See Documentation for further information.

There's only a single CPU installed at the moment so there might be some MPS options disabled in the BIOS but that shouldn't be hiding IOMMU or VT-d features?

I knew there would be no AVX support and such with an older board, but wasn't expecting PCI passthrough problems.

Web searches hint that I may not have had the right CPU features enabled when I did the initial install and the bootloader may need to be reconfigured to enable IOMMU but none of the tips (ie: add intel_iommu=on to /etc/kernel/cmdline) seem to be related to the PVE server I've got running (the cmdline file does not exist)?

TIA for any insights/experience.
 
Hi,
you need to add intel_iommu=on to the kernel parameters[1] if you have an intel CPU. You can find all necessary steps to correctly enable PCI pass-through in our wiki, including further steps [2].

[1] https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_edit_kernel_cmdline
[2] https://pve.proxmox.com/wiki/PCI(e)_Passthrough
Thanks.

This did help me find that I'm on kernel 6.5.x, but doesn't explain an easy way to sort out if I'm using GRUB so I guessed and ran commands to update both kernels and that didn't progress the situation.

Next up it says:

You have to make sure the following modules are loaded. This can be achieved by adding them to ‘/etc/modules’.

Code:
 vfio
 vfio_iommu_type1
 vfio_pci

Which seems valid since the command to test for those modules (lsmod | grep vfio) failed.

After using vi to add the modules above to /etc/modules I tried the update command:
update-initramfs -u -k all

... which gave a few warnings:
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --esp-path= to specify path to mount point.

Mind you, after a reboot, the lsmod | grep vfio command now lists the modules, and this test also succeeds now:
dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
Showing me that there is Interrupt Remapping enabled in the BIOS.

I still get the error in the web GUI that no IOMMU is detected but the next steps of the instructions suggest setting up config for a specific device and I have to go shut the machine down and physically put a GPU into one of the slots since I keep pessimistically removing the extra hardware when I hit roadblocks.
 
What is the output of cat /proc/cmdline?
Same thing as the other command in the docs, confirms the kernel version is 6.5.11-4, which the shell also says when connecting, now that I pay closer attention.

I don't see 'IOMMU' hits but I do see this line in the grep'd response from dmesg:
DMAR-IR: This system BIOS has enabled interrupt remapping

Which seems to be a success but the web admin is still quite adamant that I have no IOMMU detected.
I also don't see the GPU when I try lspci or pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist "" ... so I need to go double check the hardware is happy.
 
I still get the error in the web GUI that no IOMMU is detected
What is the output of cat /proc/cmdline?
Which seems to be a success but the web admin is still quite adamant that I have no IOMMU detected.
The first step in diagnosing this issue is to show the output of cat /proc/cmdline. If you insist on not showing it, please confirm that it contains intel_iommu=on.
If it does not, something did not go right in the step for adding it. Then please show cat /etc/kernel/cmdline and cat /etc/default/grub (or not, your choice of course).
If it is present, VT-d is not enabled by the motherboard BIOS (or CPU) or it is a PCIe/IOMMU hardware issue and you'll have to look into that.
 
The first step in diagnosing this issue is to show the output of cat /proc/cmdline. If you insist on not showing it, please confirm that it contains intel_iommu=on.
If it does not, something did not go right in the step for adding it. Then please show cat /etc/kernel/cmdline and cat /etc/default/grub (or not, your choice of course).
If it is present, VT-d is not enabled by the motherboard BIOS (or CPU) or it is a PCIe/IOMMU hardware issue and you'll have to look into that.
Ah sorry for the confusion. I ran it, thought it seemed odd we're still checking the kernel version, and didn't capture the whole message.

Of course, if I feel like if I had seen intel_iommu=on, or if you'd stated the goal originally, I'd certainly have spoken up as I do get the concept of "help me help you", having done a ton of support work.

I found another problem where the GPU seems to cause a shuffle that knocks out the network in PVE. I can trigger the same issue with a USB3.1 host adapter in either of the primary PCIe slots but the GPU is causing it in all the slots. In my case en4 becomes en5 but both the onboard LAN ports are disabled by the OS for some reason vs. auto re-configuring? I can literally watch the primary LAN link/status LEDs switch off when PVE boots up but they had been on in the BIOS. If I connect a screen/kb so I can login to the console on the server directly then I can reset the eth config to get the LAN ports back online so I can resume using the web admin, but that's a hassle and partly why I'm extra distracted.

I should probably splurge and just buy a more modern system but this is already adding up to an expensive hobby? :D
 
Ah sorry for the confusion. I ran it, thought it seemed odd we're still checking the kernel version, and didn't capture the whole message.

Of course, if I feel like if I had seen intel_iommu=on, or if you'd stated the goal originally, I'd certainly have spoken up as I do get the concept of "help me help you", having done a ton of support work.
Is the "No IOMMU" warning fixed now, or can you provide the requested information for troubleshooting? Sorry but it is not clear to me if you need help with this part.
I found another problem where the GPU seems to cause a shuffle that knocks out the network in PVE.
That's very common (and there are many threads about this): adding or removing PCI(e) device can change the PCI ID of other (onboard) PCI(e) devices (by one) and the name of the network devices depends on the PCI ID. You'll need to adjust /etc/network/interfaces accordingly (check the new names with ip a).
I should probably splurge and just buy a more modern system but this is already adding up to an expensive hobby? :D
That won't fix the PCI ID isses when adding/enabling or removing/disabling (onboard) PCI(e) devices. It will also not fix enabling VT-d/IOMMU on Intel systems.
 
Is the "No IOMMU" warning fixed now, or can you provide the requested information for troubleshooting? Sorry but it is not clear to me if you need help with this part.
I would loudly declare success on this point as I'm aware there's been some generous investment to help here. No worries, no lack of due respect.
That's very common (and there are many threads about this): adding or removing PCI(e) device can change the PCI ID of other (onboard) PCI(e) devices (by one) and the name of the network devices depends on the PCI ID. You'll need to adjust /etc/network/interfaces accordingly (check the new names with ip a).
That's what I was trying to explain.

To me it seems like my local arrangement, where having to setup a screen/keyboard every time a device is added/moved should have caused me to avoid this path entirely.

That won't fix the PCI ID isses when adding/enabling or removing/disabling (onboard) PCI(e) devices. It will also not fix enabling VT-d/IOMMU on Intel systems.
On my system it's $60 with shipping to get a used dongle that sets up the spare onboard management NIC to host RDP access to admin the server remotely. If you *always* have to get an add-on dongle to enable RDP access with any server board, and IOMMU issues are *always* this frustrating after all the visibly relevant BIOS options are enabled (and a few that probably didn't need to be enabled), then I guess my frustration is mainly with how hard this web based OS is to manage? But that seems wildly misplaced and smarter to expect it to be old hardware hassles?
 
Auto-negotiation for connectivity failures seems like a really nice option to have available for installations like mine where I'm bridging all the VMs and leaning on headless web management harder than most people might?

This morning I got a GPU and a USB3.1 host adapter in PCIe slots and I'm still on the network. I've even almost memorized the root password.

So now I can get the output of cat /proc/cmdline:
BOOT_IMAGE=/boot/vmlinuz-6.5.11-4-pve root=/dev/mapper/pve-root ro quiet

For some reason grep'ing dmesg (dmesg | grep -e DMAR -e IOMMU -e AMD-Vi) no longer includes the crucial message I saw earlier about having interrupt remapping enabled. Now it just has one line of filtered output using the exact same command as earlier:
[ 12.223166] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.

So I need to make another run around in the BIOS and perhaps double check the charge of the BIOS battery? So strange!
 
Okay that's so strange. I got into the BIOS and found lots of things (ie: CPU power options) setup the way I'd left them. CPU virtualization and VT-d are still enabled, but there was a disabled northbridge feature I'm unfamiliar with called "Crystal Beach DMA" that I did enable while I was there and then that unlocked a new "Crystal Beach DCA" option that I also enabled before saving and rebooting. Looking online, both features are just for performance and are disabled by default, but I didn't want to leave any stone unturned so I left them enabled.

Here's the result of running dmesg | grep -e DMAR -e IOMMU -e AMD-Vi :
Code:
[    0.012073] ACPI: DMAR 0x00000000BF79E0C0 000120 (v01 AMI    OEMDMAR  00000001 MSFT 00000097)
[    0.012115] ACPI: Reserving DMAR table memory at [mem 0xbf79e0c0-0xbf79e1df]
[    0.279415] DMAR-IR: This system BIOS has enabled interrupt remapping
[    0.531671] DMAR: Host address width 40
[    0.531672] DMAR: DRHD base: 0x000000fbffe000 flags: 0x1
[    0.531683] DMAR: dmar0: reg_base_addr fbffe000 ver 1:0 cap c90780106f0462 ecap f020fe
[    0.531685] DMAR: RMRR base: 0x000000000e4000 end: 0x000000000e7fff
[    0.531687] DMAR: RMRR base: 0x000000bf7ec000 end: 0x000000bf7fffff
[    0.531688] DMAR: ATSR flags: 0x0
[   11.692531] AMD-Vi: AMD IOMMUv2 functionality not available on this system - This is not a bug.

And cat /proc/cmdline is still not mentioning IOMMU:
Code:
BOOT_IMAGE=/boot/vmlinuz-6.5.11-4-pve root=/dev/mapper/pve-root ro quiet

I still have the IOMMU modules inside /etc/modules ... but the update command update-initramfs -u -k all, still gives a few warnings I have not resolved:
Code:
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --esp-path= to specify path to mount point.

So I think that's the next step?
 
So now I can get the output of cat /proc/cmdline:
BOOT_IMAGE=/boot/vmlinuz-6.5.11-4-pve root=/dev/mapper/pve-root ro quiet
And cat /proc/cmdline is still not mentioning IOMMU:
BOOT_IMAGE=/boot/vmlinuz-6.5.11-4-pve root=/dev/mapper/pve-root ro quiet
I'm not a native speaker and I'm not always following the nuances of what your are so eloquently saying, but I think you want help with this. Please follow the manual carefully to add intel_iommu=on to the kernel command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_edit_kernel_cmdline . Without this, IOMMU won't be enabled on Proxmox even if VT-d is enabled in the motherboard BIOS. Please show cat /etc/kernel/cmdline and cat /etc/default/grub, as asked before, if you need more detailed help with this.

still gives a few warnings I have not resolved:
Code:
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Couldn't find EFI system partition. It is recommended to mount it to /boot or /efi.
Alternatively, use --esp-path= to specify path to mount point.

So I think that's the next step?
I guess your installation does not use ESP and proxmox-boot-tool and that is not a problem per se. I do think you want to enable IOMMU (see above) and this warning/informational message is not the issue here, unless the bootloader configuration is correct and this is causing it to not become active (but I would need more information to determine that, as asked above, if you would be so kind). What version of Proxmox are you running and what (old) version did you initially install? Maybe reinstalling PVE 8.1 (which will match the manual) and retoring VMs from backup might be easier then switching to ESP and proxmox-boot-tool.
 
Please follow the manual carefully to add intel_iommu=on to the kernel command line: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_edit_kernel_cmdline . Without this, IOMMU won't be enabled on Proxmox even if VT-d is enabled in the motherboard BIOS.
Okay so I went back and started following those instructions again and nothing new was coming of it. So I treated the instructions like they were written maliciously:
The kernel commandline needs to be placed in the variable GRUB_CMDLINE_LINUX_DEFAULT in the file /etc/default/grub. Running update-grub appends its content to all linux entries in /boot/grub/grub.cfg.
... what "kernel commandline"? The instructions have not mentioned what this is? So I googled and this is where the instructions should say:
The kernel commandline intel_iommu=on needs to be appended inside the variable GRUB_CMDLINE_LINUX_DEFAULT value in the file /etc/default/grub.
ie: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
And that was it. I'm now much warmer!
warmer.gz.jpg
Please show cat /etc/kernel/cmdline and cat /etc/default/grub, as asked before, if you need more detailed help with this.

FWIW: That's the first time you've asked for /etc/default/grub in this conversation. I did try to explain (in English sadly) above why it took a moment to see why you wanted the /etc/kernel/cmdline information. It was supplied above once that was cleared up.

At this point it looks like I need to map some devices for Proxmox to see them as shared? I sure picked a really awful test for Proxmox considering how much of the essential steps aren't inside the web admin?

EDIT: Oh I see that I can use "Raw Device" entries. Neat! Moving along! :D
 
Last edited:
Okay so now it looks like an issue with QEMU configuration/permissions?
Code:
kvm: -device vfio-pci,host=0000:04:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-vga=on,multifunction=on: vfio 0000:04:00.0: failed to setup container for group 28: Failed to set iommu for container: Operation not permitted
TASK ERROR: start failed: QEMU exited with code 1

I'll try rolling out a fresh VM from scratch before I panic, but if I can reconfigure that VM it'd be nice, as I had a few things installed and setup the way I like them.
 
Okay so I went back and started following those instructions again and nothing new was coming of it. So I treated the instructions like they were written maliciously:
That's a bit harsh. Proxmox can use one of two bootloaders and they need to be configured differently.
... what "kernel commandline"? The instructions have not mentioned what this is? So I googled and this is where the instructions should say:
It's literally described in the manual which I linked. Here it is again: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_edit_kernel_cmdline
And that was it. I'm now much warmer!
I'm assuming you got the "No IOMMU" warning fixed.
Mapped devices are just a cluster-wide user configuration for raw devices (when you want to move VMs between cluster nodes with similar hardware): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#resource_mapping
FWIW: That's the first time you've asked for /etc/default/grub in this conversation. I did try to explain (in English sadly) above why it took a moment to see why you wanted the /etc/kernel/cmdline information. It was supplied above once that was cleared up.
Yeah, we are clearly not understanding each other. Or you are just moving much faster than me and I get confused easily when I don't see command outputs and comments about other stuff.
At this point it looks like I need to map some devices for Proxmox to see them as shared?
I don't know that you mean by that, since the term "shared" and "devices" can mean different things in Proxmox depending on the context.
I sure picked a really awful test for Proxmox considering how much of the essential steps aren't inside the web admin?
PCI(e) passthrough is error prone, not guaranteed and often need various work-around. It's the exact opposite of virtualization, because you expose the VM to actual hardware (which all it's issues).
EDIT: Oh I see that I can use "Raw Device" entries. Neat! Moving along! :D
Nice!
 
  • Like
Reactions: HarryNerd
That's a bit harsh.
But totally on point. I'm not a seasoned Proxmox user and the manual is guessing at me knowing things about kernel options?

It's literally described in the manual which I linked. Here it is again: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot_edit_kernel_cmdline
Yeah that's the part I was quoting above. I even carefully explained what was missing. You sending me back there just confirms I was in the right spot and the information I required is not there/not explained in that section.

The suggestion still stands.

I'm assuming you got the "No IOMMU" warning fixed.
Yes. The screenshots do confirm that I'm past that issue. Thanks for helping me spot the missing information in the documentation.

Mapped devices are just a cluster-wide user configuration for raw devices (when you want to move VMs between cluster nodes with similar hardware): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#resource_mapping

Yeah, we are clearly not understanding each other. Or you are just moving much faster than me and I get confused easily when I don't see command outputs and comments about other stuff.

I don't know that you mean by that, since the term "shared" and "devices" can mean different things in Proxmox depending on the context.

PCI(e) passthrough is error prone, not guaranteed and often need various work-around. It's the exact opposite of virtualization, because you expose the VM to actual hardware (which all it's issues).

Nice!
Yeah I've got to know what to censor and what to include, you don't really need to know all the hurdles I have getting to the machine physically for example, but those are things I'm dealing with each time that shouldn't be helpful to share.. though you never know, someone might have good advice for sticky door locks??

I still have to try deploying a fresh VM but I will circle back and mark this as solved in a bit I hope! :fingers-crossed:
 
Just an update that I have been stuck for a few days on:
Code:
kvm: ../hw/pci/pci.c:1637: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.

But I keep finding help on this topic so I haven't given up, yet. I've gone back and nuked old VMs to still get this issue, but it looks like we barely scraped the surface of the GRUB command line topic and all the parameters it's GOING to need to get working.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!