Failing SAS Passthrough on Proxmox 9 with 8.14 kernel

Mindbang

Active Member
Mar 2, 2018
12
2
41
Hey folks,
I've recently updated my cluster to Proxmox 9. Most things went without issues, but I'm stuck on one problem.
In one machine I have a LSI 9207-8i that I'm passing through to a VM. On Proxmox 8 with a 6.8 kernel this is working fine.
After upgrading to Proxmox 9 with kernel 6.14.8-2-pve the passthrough is eventually failing. The issue is somewhere in the handoff of the controller to the VM (See the attached screenshot for the errors).

I've done a bit of digging and there's been some movement in the mpt3sas driver in the kernel (https://github.com/torvalds/linux/commits/master/drivers/scsi/mpt3sas), with some interesting commits that could potentially relate:
- https://github.com/torvalds/linux/commit/3f5eb062e8aa335643181c480e6c590c6cedfd22
- https://github.com/torvalds/linux/commit/5612d6d51ed2634a033c95de2edec7449409cbb9

Long story short, I'm on Proxmox 9, but I've pinned the kernel to 6.8.12-13-pve. Things are working fine. But that's running everything in an unsupported configuration, which I don't like.

Has anybody else run into this issue or has some pointers or where I better go digging?
 

Attachments

  • Screenshot 2025-08-09 at 7.16.14 AM.png
    Screenshot 2025-08-09 at 7.16.14 AM.png
    252.4 KB · Views: 30
I build a few custom patches for the mpt3sas driver before I realized that this is not the right approach, since the host machine isn't using the hardware, but it's passed through. The issue is somewhere in the vfio, iommu or kvm subsystems. Probably gonna try to bisect it by going the kernel versions up. 6.11 minor also works.
 
Last edited:
Unfortunately, I claimed victory too early "hugepager=never" does allow me to start the VM with passthrough of the SAS, but as soon as I try to copy data onto the drives in the VM I get constant errors an the transfer speed remains at 0. I had to revert to a 6.8.xx to 6.11.xx Kernel again.
 

Attachments

  • 1774303967995.png
    1774303967995.png
    57.5 KB · Views: 2
Last edited:
With further digging I found a solution in my case as well. The "hugepage" hint was very helpful but did not fix it for me.
The explanation i put together for my case: The patch mentioned above changes how the kernel allocates IOVA (IO Virtual Addresses) for DMA mappings. It introduces Transparent Huge Pages (THP) for IOMMU mappings instead of individual 4 KiB pages, 2 MiB huge pages are used for DMA translations. This is intended to improve performance (fewer IOTLB misses), but:
The problem with older Intel IOMMU implementations (such as the Z97 which I have; maybe other older are affected as well): The IOMMU hardware expects specific PTE formats. When the kernel generates huge-page PTEs, older IOMMU units cannot interpret them correctly and the “reserved fields” in the PTE are then non-zero, which triggers the exact error “fault reason 0x0c: non-zero reserved fields in PTE.”
Why "transparent_hugepage=never" only partially helped in my case: This parameter disables THP for regular memory (MMU), but not completely for IOMMU DMA mappings. For small DMA transfers (such as the SAS controller’s initialization I assume), 4 KiB pages are sufficient, but for large transfers (moving data via the SAS controller), the kernel still falls back on larger mappings and the reason why I could boot TrueNAS with the option but any data traffic failed.

Solution: Browsing through Red Hat documentation... https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/8.8_release_notes/kernel_parameters_changes
... I stumbled over the option intel_iommu=sp_off which reads in the description: "By default, super page will be supported if Intel IOMMU has the capability. With this option, super page will not be supported."
And "Super Pages" are the IOMMU huge pages introduced by the problematic patch f9e54c3a2f5b. With "sp_off", I think only 4 KiB pages are used for IOMMU mappings exactly the same behavior as in the old 6.8 kernel. Tested and and now I can use the 6.17 Kernel without DMAR errors.
Untested further side-benefit: Not setting"transparent_hugepage=never" also avoids the reduced performance of the entire system (not just IOMMU) because regular memory no longer uses huge pages either with this option.
 
  • Like
Reactions: Huamijo