Hi folks,
I was running into an issue with an X550 NIC using the ixgbe driver in my PVE host (R740xd) hanging and dropping traffic whenever I tried pushing a modest amount of traffic through it. Note that I did not have pass-through enabled, nor was I attempting to pass any devices through. The hang was preceded by a bunch of DMAR faults as seen below:
I looked everywhere, but wasn't able to find anything conclusive, so had to do some experimentation. I updated the NIC firmware, fiddled with the device offload settings, and enabled intel_iommu in the boot params and SR-IOV in the BIOS, but nothing seemed to resolve the issue. That is, at least until I added iommu=pt to my boot command line and this solved my problem. No more DMAR faults and ixgbe hangs.
I realize that this does not solve the underlying issue that would likely manifest if attempting to pass this device through, but if you don't need to and this issue pops-up, I hope this info helps.
I was running into an issue with an X550 NIC using the ixgbe driver in my PVE host (R740xd) hanging and dropping traffic whenever I tried pushing a modest amount of traffic through it. Note that I did not have pass-through enabled, nor was I attempting to pass any devices through. The hang was preceded by a bunch of DMAR faults as seen below:
Code:
Feb 12 21:22:54 cocytus kernel: DMAR: DRHD: handling fault status reg 2
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Write NO_PASID] Request device [86:00.0] fault addr 0x0 [fault reason 0x05] P>
Feb 12 21:22:54 cocytus kernel: DMAR: DRHD: handling fault status reg 102
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e78000 [fault reason 0>
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e7b000 [fault reason 0>
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e7c000 [fault reason 0>
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e71000 [fault reason 0>
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e73000 [fault reason 0>
Feb 12 21:22:54 cocytus kernel: DMAR: [DMA Read NO_PASID] Request device [86:00.0] fault addr 0xe3e75000 [fault reason 0>
Feb 12 21:23:00 cocytus kernel: ixgbe 0000:86:00.0 ens5f0: Detected Tx Unit Hang
Tx Queue <9>
TDH, TDT <74>, <7a>
next_to_use <7a>
next_to_clean <4c>
tx_buffer_info[next_to_clean]
time_stamp <115bcc4b4>
jiffies <115bcdb80>
I looked everywhere, but wasn't able to find anything conclusive, so had to do some experimentation. I updated the NIC firmware, fiddled with the device offload settings, and enabled intel_iommu in the boot params and SR-IOV in the BIOS, but nothing seemed to resolve the issue. That is, at least until I added iommu=pt to my boot command line and this solved my problem. No more DMAR faults and ixgbe hangs.
I realize that this does not solve the underlying issue that would likely manifest if attempting to pass this device through, but if you don't need to and this issue pops-up, I hope this info helps.