Problems with GPU Passthrough since 8.2

I seem to sometimes randomly also get this on 6.5.13-5-pve
Infrequently though....And times it stalls out waiting for nvme device...
 
Maybe I have something else happening here. Sudden crash and then 50 restarts later, I was only able to get it working resetting BIOS ( I then basically chose all the same settings again)

Maybe I'll try moving all my NVME to my Hyper card...
 
So, i put all my nvme in my hyper, now proxmox finishes booting and i get rpool IO error. Thats hard to believe as boot pretty much finishes. Is there something in proxmox that doesnt like the hyper card?
 
Hello everybody!

Found this post while searching help for the 1:1 mapping Problem - passthrough of an HBA.
My system (SuperMicro M11SDV-8C+-LN4F) is on Kernel "6.8.12-2-pve" and the error is still lthe same:

Code:
vfio-pci 0000:01:00.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

Has anyone made progress or resolved the problem on an AMD-Plattform!?
@athurdent

Kind regards!
 
Hello everybody!

Found this post while searching help for the 1:1 mapping Problem - passthrough of an HBA.
My system (SuperMicro M11SDV-8C+-LN4F) is on Kernel "6.8.12-2-pve" and the error is still lthe same:

Code:
vfio-pci 0000:01:00.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

Has anyone made progress or resolved the problem on an AMD-Plattform!?
@athurdent

Kind regards!
My last contact with Supermicro indicated, they were speaking to AMD about this now. It seems, they have been able to recreate the issue, I helped with that as good as possible.
 
Concerning the Error
Code:
kvm: ../hw/pci/pci.c:1637: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
TASK ERROR: start failed: QEMU exited with code 1

I also seem to be affected. I originally reported this in another Thread -> https://forum.proxmox.com/threads/g...p-with-pci_num_pins-error.143481/#post-706128

Seems like REMOVING [one of these Options] from /etc/default/grub.d/powersave.cfg seems to make the system NOT crash as soon as I run any Command at least
Code:
cpufreq.default_governor=powersave initcall_blacklist=acpi_cpufreq_init amd_pstate.shared_mem=1 amd_pstate=guided
 
I am on AMD systems and I'm still on PVE 7 with all sorts of passthrough, nested vm, working perfectly. I want to upgrade to 8 at some stage.
From what I am gathered here, 8.2 ISO upgrade will give me issues? Does the issue contains in the default 8.1.2 ISO (not sure which exactly kernel is the default ISO shipped with) ?
 
Hello everybody!

Found this post while searching help for the 1:1 mapping Problem - passthrough of an HBA.
My system (SuperMicro M11SDV-8C+-LN4F) is on Kernel "6.8.12-2-pve" and the error is still lthe same:

Code:
vfio-pci 0000:01:00.0: Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

Has anyone made progress or resolved the problem on an AMD-Plattform!?
@athurdent

Kind regards!
Hello Mate, i have exactly the same board SuperMicro M11SDV-8C+-LN4F and i am also struggling with the same issue.
I use the m.2 port on the board and i am unable to pass through on Proxmox 8.2.
There is a second exactly same node running lower version in cluster happily passing through.
Last updates on kernel Linux 6.8.12-2-pve (2024-09-05T10:03Z)
i have tried to change NVME and M.2 from Vendor defined to AMI native, but had no luck. The issue remains te same:
kvm: -device vfio-pci,host=0000:01:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:01:00.0: failed to setup container for group 13: Failed to set group container: Invalid argument
TASK ERROR: start failed: QEMU exited with code 1
bios2.png
I would love to see this fixed as well, so i do not have to go back to lower version ...
 
Last edited:
Hi

Has anybody solve problem with PCI/NVME passthrough ?

maybe in latest kernel?


still fighting with (VM Win11 with NVME and GTX passthrough)

Code:
swtpm_setup: Not overwriting existing state file.kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion 0 <= irq_num && irq_num < PCI_NUM_PINS' failed.stopping swtpm instance (pid 746888) due to QEMU startup errorTASK ERROR: start failed: QEMU exited with code 1
 
ok so every group where there is a region that is marked as 'direct' is not eligible for passthrough. sadly seem to be all groups in your case...

is there maybe a bios option to change some iommu settings?
Could you please elaborate on what this means? I have a system with a RAID card set to JBOD mode, that I am trying to pass through. Just my luck, that one device is the only device in the entire system reporting "direct".

I am also seeing the following when trying to start a VM with the device attached:
Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

I performed the steps from pve-kernel-builder, but I still never see the message "DMAR: Intel-IOMMU: assuming all RMRRs are relaxable. This can lead to instability or data loss". Everything else appears correct as far as I can tell though.

Code:
root@carbon:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    0.015419] ACPI: DMAR 0x0000000067F40000 0001F2 (v01 INTEL  S2600WF  00000001 INTL 20091013)
[    0.015482] ACPI: Reserving DMAR table memory at [mem 0x67f40000-0x67f401f1]
[    0.744425] DMAR: Host address width 46
[    0.744429] DMAR: DRHD base: 0x000000d37fc000 flags: 0x0
[    0.744443] DMAR: dmar0: reg_base_addr d37fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744450] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
[    0.744458] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744464] DMAR: DRHD base: 0x000000ee7fc000 flags: 0x0
[    0.744471] DMAR: dmar2: reg_base_addr ee7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744477] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    0.744484] DMAR: dmar3: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744489] DMAR: DRHD base: 0x000000aaffc000 flags: 0x0
[    0.744496] DMAR: dmar4: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744502] DMAR: DRHD base: 0x000000b87fc000 flags: 0x0
[    0.744509] DMAR: dmar5: reg_base_addr b87fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744514] DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
[    0.744523] DMAR: dmar6: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744528] DMAR: DRHD base: 0x0000009d7fc000 flags: 0x1
[    0.744535] DMAR: dmar7: reg_base_addr 9d7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744540] DMAR: RMRR base: 0x00000067339000 end: 0x0000006733bfff
[    0.744545] DMAR: RMRR base: 0x000000531c8000 end: 0x0000005b1cffff
[    0.744549] DMAR: ATSR flags: 0x0
[    0.744554] DMAR: ATSR flags: 0x0
[    0.744560] DMAR-IR: IOAPIC id 12 under DRHD base  0xc5ffc000 IOMMU 6
[    0.744565] DMAR-IR: IOAPIC id 11 under DRHD base  0xb87fc000 IOMMU 5
[    0.744570] DMAR-IR: IOAPIC id 10 under DRHD base  0xaaffc000 IOMMU 4
[    0.744574] DMAR-IR: IOAPIC id 18 under DRHD base  0xfbffc000 IOMMU 3
[    0.744579] DMAR-IR: IOAPIC id 17 under DRHD base  0xee7fc000 IOMMU 2
[    0.744583] DMAR-IR: IOAPIC id 16 under DRHD base  0xe0ffc000 IOMMU 1
[    0.744587] DMAR-IR: IOAPIC id 15 under DRHD base  0xd37fc000 IOMMU 0
[    0.744591] DMAR-IR: IOAPIC id 8 under DRHD base  0x9d7fc000 IOMMU 7
[    0.744596] DMAR-IR: IOAPIC id 9 under DRHD base  0x9d7fc000 IOMMU 7
[    0.744600] DMAR-IR: HPET id 0 under DRHD base 0x9d7fc000
[    0.744604] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.746793] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    1.716859] DMAR: No SATC found
[    1.716865] DMAR: dmar6: Using Queued invalidation
[    1.716873] DMAR: dmar5: Using Queued invalidation
[    1.716879] DMAR: dmar3: Using Queued invalidation
[    1.716885] DMAR: dmar0: Using Queued invalidation
[    1.716890] DMAR: dmar7: Using Queued invalidation
[    1.878502] DMAR: Intel(R) Virtualization Technology for Directed I/O

Code:
root@carbon:~# grep '' /sys/kernel/iommu_groups/*/reserved_regions
...
/sys/kernel/iommu_groups/37/reserved_regions:0x0000000067339000 0x000000006733bfff direct-relaxable
/sys/kernel/iommu_groups/37/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/38/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/39/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/3/reserved_regions:0x00000000531c8000 0x000000005b1cffff direct # THIS IS THE DEVICE I WANT
/sys/kernel/iommu_groups/3/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/40/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/41/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/42/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/43/reserved_regions:0x0000000000000000 0x0000000000ffffff direct-relaxable
/sys/kernel/iommu_groups/43/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/44/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/45/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
...
/sys/kernel/iommu_groups/9/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
 
Could you please elaborate on what this means? I have a system with a RAID card set to JBOD mode, that I am trying to pass through. Just my luck, that one device is the only device in the entire system reporting "direct".

I am also seeing the following when trying to start a VM with the device attached:
Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.

I performed the steps from pve-kernel-builder, but I still never see the message "DMAR: Intel-IOMMU: assuming all RMRRs are relaxable. This can lead to instability or data loss". Everything else appears correct as far as I can tell though.

Code:
root@carbon:~# dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
[    0.015419] ACPI: DMAR 0x0000000067F40000 0001F2 (v01 INTEL  S2600WF  00000001 INTL 20091013)
[    0.015482] ACPI: Reserving DMAR table memory at [mem 0x67f40000-0x67f401f1]
[    0.744425] DMAR: Host address width 46
[    0.744429] DMAR: DRHD base: 0x000000d37fc000 flags: 0x0
[    0.744443] DMAR: dmar0: reg_base_addr d37fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744450] DMAR: DRHD base: 0x000000e0ffc000 flags: 0x0
[    0.744458] DMAR: dmar1: reg_base_addr e0ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744464] DMAR: DRHD base: 0x000000ee7fc000 flags: 0x0
[    0.744471] DMAR: dmar2: reg_base_addr ee7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744477] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[    0.744484] DMAR: dmar3: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744489] DMAR: DRHD base: 0x000000aaffc000 flags: 0x0
[    0.744496] DMAR: dmar4: reg_base_addr aaffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744502] DMAR: DRHD base: 0x000000b87fc000 flags: 0x0
[    0.744509] DMAR: dmar5: reg_base_addr b87fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744514] DMAR: DRHD base: 0x000000c5ffc000 flags: 0x0
[    0.744523] DMAR: dmar6: reg_base_addr c5ffc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744528] DMAR: DRHD base: 0x0000009d7fc000 flags: 0x1
[    0.744535] DMAR: dmar7: reg_base_addr 9d7fc000 ver 1:0 cap 8d2078c106f0466 ecap f020df
[    0.744540] DMAR: RMRR base: 0x00000067339000 end: 0x0000006733bfff
[    0.744545] DMAR: RMRR base: 0x000000531c8000 end: 0x0000005b1cffff
[    0.744549] DMAR: ATSR flags: 0x0
[    0.744554] DMAR: ATSR flags: 0x0
[    0.744560] DMAR-IR: IOAPIC id 12 under DRHD base  0xc5ffc000 IOMMU 6
[    0.744565] DMAR-IR: IOAPIC id 11 under DRHD base  0xb87fc000 IOMMU 5
[    0.744570] DMAR-IR: IOAPIC id 10 under DRHD base  0xaaffc000 IOMMU 4
[    0.744574] DMAR-IR: IOAPIC id 18 under DRHD base  0xfbffc000 IOMMU 3
[    0.744579] DMAR-IR: IOAPIC id 17 under DRHD base  0xee7fc000 IOMMU 2
[    0.744583] DMAR-IR: IOAPIC id 16 under DRHD base  0xe0ffc000 IOMMU 1
[    0.744587] DMAR-IR: IOAPIC id 15 under DRHD base  0xd37fc000 IOMMU 0
[    0.744591] DMAR-IR: IOAPIC id 8 under DRHD base  0x9d7fc000 IOMMU 7
[    0.744596] DMAR-IR: IOAPIC id 9 under DRHD base  0x9d7fc000 IOMMU 7
[    0.744600] DMAR-IR: HPET id 0 under DRHD base 0x9d7fc000
[    0.744604] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.746793] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    1.716859] DMAR: No SATC found
[    1.716865] DMAR: dmar6: Using Queued invalidation
[    1.716873] DMAR: dmar5: Using Queued invalidation
[    1.716879] DMAR: dmar3: Using Queued invalidation
[    1.716885] DMAR: dmar0: Using Queued invalidation
[    1.716890] DMAR: dmar7: Using Queued invalidation
[    1.878502] DMAR: Intel(R) Virtualization Technology for Directed I/O

Code:
root@carbon:~# grep '' /sys/kernel/iommu_groups/*/reserved_regions
...
/sys/kernel/iommu_groups/37/reserved_regions:0x0000000067339000 0x000000006733bfff direct-relaxable
/sys/kernel/iommu_groups/37/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/38/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/39/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/3/reserved_regions:0x00000000531c8000 0x000000005b1cffff direct # THIS IS THE DEVICE I WANT
/sys/kernel/iommu_groups/3/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/40/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/41/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/42/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/43/reserved_regions:0x0000000000000000 0x0000000000ffffff direct-relaxable
/sys/kernel/iommu_groups/43/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/44/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
/sys/kernel/iommu_groups/45/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
...
/sys/kernel/iommu_groups/9/reserved_regions:0x00000000fee00000 0x00000000feefffff msi
Solved my own problem, I don't think I actually had to do anything within proxmox, I just needed to flash my controller to IT mode using this. Turns out IT mode and IR mode with JBOD is not the same..
 
Hi, same problem here. I have upgraded my Nvidia Tesla K20X to a Nvidia RTX 3060 and i get this error:
Code:
kvm: -device vfio-pci,host=0000:24:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:24:00.0: failed to setup container for group 19: Failed to set group container: Invalid argument
TASK ERROR: start failed: QEMU exited with code 1

Should i use 6.5.13-5-pve kernel?

Edit: for me, worked with relax-rmrr parameter in /etc/default/grub

https://github.com/kiler129/relax-intel-rmrr/blob/master/README.md#other-distros
 
Last edited:
Today’s update on the M11SDV-8C-LN4F from Supermicro support sounds promising:

Are there any news?
Does it make sense from our side to establish contact with supermicro?
Is there a ticket number we could reference or a "special" Mailadress?
Happy holidays!
 
So I too own a Supermicro M11SDV-8C-LN4F but it had been working fine up until i switched all the other guests over to one node a couple of weeks ago. It was running a newer kernel then 6.8.5 but then i realized that I upgraded the BIOS to current version after taking it out of active duty.
I can now confirm that BIOS 1.0a works with current 6.8.12-5 kernel.
 
  • Like
Reactions: mldy
So I too own a Supermicro M11SDV-8C-LN4F but it had been working fine up until i switched all the other guests over to one node a couple of weeks ago. It was running a newer kernel then 6.8.5 but then i realized that I upgraded the BIOS to current version after taking it out of active duty.
I can now confirm that BIOS 1.0a works with current 6.8.12-5 kernel.
thats wild. I want to give that a try too.
Is it possible to downgrade the bios version? If so, do you know if there is an official way to download older bios versions from supermicro?
 
Sadly I was unable to find any links to it online. In the end I had saved 1.0a and 1.0b to my backups and that's where it got it from.
I've posted it here http://gokapi.fail.pm/downloadFile?id=5klzoBSNajr3tsz (Edit: changed the link to one that doesn't auto expire)
I haven't been able to find a CRC for the file so that it can actually be verified.

Yes it's possible to downgrade, I went from current back to 1.0a without issues using uefi flasher. I haven't bothered testing 1.0b as my notes said I had issues rebooting with that version.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!