[SOLVED] mcx4121a NIC drops after booting VM with sas2308 passthrough under

appid · Mar 12, 2024

Environment Details:
- PVE Version: 8.1.4
- Kernel Version: 6.5.13-1-pve
- Network Card Model: MCX4121A-ACAT
- sas2308 Model: LSI SAS 9217-8i

SysLog:

Code:

Mar 11 23:07:25 pve pvedaemon[1449]: <root@pam> end task UPID:pve:00001FE2:00033FFC:65EF1E2D:qmclone:1000:root@pam: OK
Mar 11 23:07:33 pve pvedaemon[1450]: <root@pam> successful auth for user 'root@pam'
Mar 11 23:07:55 pve pvedaemon[1449]: <root@pam> update VM 104: -hostpci0 mapping=sas2308,pcie=1
Mar 11 23:08:00 pve pvedaemon[8277]: start VM 104: UPID:pve:00002055:00034DBC:65EF1E50:qmstart:104:root@pam:
Mar 11 23:08:00 pve pvedaemon[1450]: <root@pam> starting task UPID:pve:00002055:00034DBC:65EF1E50:qmstart:104:root@pam:
Mar 11 23:08:00 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Mar 11 23:08:00 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221107000000)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221107000000)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: enclosure logical id(0x500605b009acb4c0), slot(4)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: sending message unit reset !!
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: message unit reset: SUCCESS
Mar 11 23:08:00 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:00 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:05 pve kernel: vmbr0: port 1(enp1s0f0np0) entered disabled state
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0 enp1s0f0np0 (unregistering): left allmulticast mode
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0 enp1s0f0np0 (unregistering): left promiscuous mode
Mar 11 23:08:05 pve kernel: vmbr0: port 1(enp1s0f0np0) entered disabled state
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:06 pve kernel: mlx5_core 0000:01:00.0: E-Switch: cleanup
Mar 11 23:08:07 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:07 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:11 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:12 pve kernel: mlx5_core 0000:01:00.1: E-Switch: cleanup
Mar 11 23:08:13 pve systemd[1]: Started 104.scope.

Replication process:
1. Created vmbr0 bridged to enp1s0f0np0 (mcx4121a network card port 1).
2. Created a Resource Mapping named "sas2308."
3. VM 104 was directly assigned the PCIe device sas2308 and started.
4. Checked syslog, and after VM 104 started, mlx5_core began reporting errors, and the network card went offline.

Other Notes:
1. Tested direct assignment of the onboard i-210 network card, which did not cause mlx5_core errors. No additional PCIe devices are available for further testing.
2. Starting a virtual machine with the PCIe device sas2308 directly assigned does not cause mlx5_core errors when starting one or more virtual machines using vmbr0 bridged network cards.

leesteken · Mar 12, 2024

Check your iOMMU groups. Devices in the same group cannot be shared between VMs and/or the Proxmox host. This comes up regularly on the forum.

appid · Mar 12, 2024

leesteken said:
Check your iOMMU groups. Devices in the same group cannot be shared between VMs and/or the Proxmox host. This comes up regularly on the forum.

How to check the iOMMU group，
I checked the pci id of the device：
sas2308:0000:02:00:0
mcx4121a:0000:01:00:0

leesteken · Mar 12, 2024

appid said:
How to check the iOMMU group，

Look in the IOMMU column in the Proxmox web GUI when selecting the (raw) device to passthrough.
Or run this command pvesh get /nodes/NODENAME/hardware/pci --pci-class-blacklist "" where NODENAME is your Proxmox server/node name, and look in the iommugroup.
Or run

for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

and look.
Or follow the Proxmox Wiki page: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation

appid · Mar 12, 2024

leesteken said:
Look in the IOMMU column in the Proxmox web GUI when selecting the (raw) device to passthrough.
Or run this command pvesh get /nodes/NODENAME/hardware/pci --pci-class-blacklist "" where NODENAME is your Proxmox server/node name, and look in the iommugroup.
Or run for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done and look.
Or follow the Proxmox Wiki page: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation

Thanks, based on your info I found the cause of the problem

leesteken · Mar 12, 2024

appid said:
Thanks, based on your info I found the cause of the problem

Can you share the solution for other users that run into thos? Please mark this thread as Solved (by editing the first post and selecting Solved from the pull-down menu).

Search

Search

[SOLVED] mcx4121a NIC drops after booting VM with sas2308 passthrough under

appid

New Member

leesteken

Distinguished Member

appid

New Member

leesteken

Distinguished Member

appid

New Member

leesteken

Distinguished Member