[SOLVED] mcx4121a NIC drops after booting VM with sas2308 passthrough under

appid

New Member
Mar 9, 2024
4
0
1
Environment Details:
- PVE Version: 8.1.4
- Kernel Version: 6.5.13-1-pve
- Network Card Model: MCX4121A-ACAT
- sas2308 Model: LSI SAS 9217-8i

SysLog:
Code:
Mar 11 23:07:25 pve pvedaemon[1449]: <root@pam> end task UPID:pve:00001FE2:00033FFC:65EF1E2D:qmclone:1000:root@pam: OK
Mar 11 23:07:33 pve pvedaemon[1450]: <root@pam> successful auth for user 'root@pam'
Mar 11 23:07:55 pve pvedaemon[1449]: <root@pam> update VM 104: -hostpci0 mapping=sas2308,pcie=1
Mar 11 23:08:00 pve pvedaemon[8277]: start VM 104: UPID:pve:00002055:00034DBC:65EF1E50:qmstart:104:root@pam:
Mar 11 23:08:00 pve pvedaemon[1450]: <root@pam> starting task UPID:pve:00002055:00034DBC:65EF1E50:qmstart:104:root@pam:
Mar 11 23:08:00 pve kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Mar 11 23:08:00 pve kernel: sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221107000000)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221107000000)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: enclosure logical id(0x500605b009acb4c0), slot(4)
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: sending message unit reset !!
Mar 11 23:08:00 pve kernel: mpt2sas_cm0: message unit reset: SUCCESS
Mar 11 23:08:00 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:00 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:05 pve kernel: vmbr0: port 1(enp1s0f0np0) entered disabled state
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0 enp1s0f0np0 (unregistering): left allmulticast mode
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0 enp1s0f0np0 (unregistering): left promiscuous mode
Mar 11 23:08:05 pve kernel: vmbr0: port 1(enp1s0f0np0) entered disabled state
Mar 11 23:08:05 pve kernel: mlx5_core 0000:01:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:06 pve kernel: mlx5_core 0000:01:00.0: E-Switch: cleanup
Mar 11 23:08:07 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:07 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:11 pve kernel: mlx5_core 0000:01:00.1: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0)
Mar 11 23:08:12 pve kernel: mlx5_core 0000:01:00.1: E-Switch: cleanup
Mar 11 23:08:13 pve systemd[1]: Started 104.scope.

Replication process:
1. Created vmbr0 bridged to enp1s0f0np0 (mcx4121a network card port 1).
2. Created a Resource Mapping named "sas2308."
3. VM 104 was directly assigned the PCIe device sas2308 and started.
4. Checked syslog, and after VM 104 started, mlx5_core began reporting errors, and the network card went offline.

Other Notes:
1. Tested direct assignment of the onboard i-210 network card, which did not cause mlx5_core errors. No additional PCIe devices are available for further testing.
2. Starting a virtual machine with the PCIe device sas2308 directly assigned does not cause mlx5_core errors when starting one or more virtual machines using vmbr0 bridged network cards.
 
Last edited:
Check your iOMMU groups. Devices in the same group cannot be shared between VMs and/or the Proxmox host. This comes up regularly on the forum.
 
Check your iOMMU groups. Devices in the same group cannot be shared between VMs and/or the Proxmox host. This comes up regularly on the forum.
How to check the iOMMU group,
I checked the pci id of the device:
sas2308:0000:02:00:0
mcx4121a:0000:01:00:0
 
How to check the iOMMU group,
Look in the IOMMU column in the Proxmox web GUI when selecting the (raw) device to passthrough.
Or run this command pvesh get /nodes/NODENAME/hardware/pci --pci-class-blacklist "" where NODENAME is your Proxmox server/node name, and look in the iommugroup.
Or run for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done and look.
Or follow the Proxmox Wiki page: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation
 
Look in the IOMMU column in the Proxmox web GUI when selecting the (raw) device to passthrough.
Or run this command pvesh get /nodes/NODENAME/hardware/pci --pci-class-blacklist "" where NODENAME is your Proxmox server/node name, and look in the iommugroup.
Or run for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done and look.
Or follow the Proxmox Wiki page: https://pve.proxmox.com/wiki/PCI_Passthrough#Verify_IOMMU_isolation
Thanks, based on your info I found the cause of the problem
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!