Hi,
I've got a PVE cluster with Supermicro H13SSL-N single socket based servers with Epyc 9254 CPU, 2x dual port Intel XL710 NICs, and 1x dual port ConnectX-4 LX NIC.
I'm trying to use resource mappings to pass in the ConnectX-4 NIC's virtual functions.
The problem I'm having is every time I reboot, the iommu groups change for the NIC virtual functions for the ConnectX-4 card. This causes errors for resource mappings. VMs won't power on until i re-define the mapping so the iommu group gets updated.
The error message appears under Datacenter / Resource Mappings.
The Status field for the mapping on a host thats been rebooted says: "Configuration for iommu group is not correct ('97' != '59')"
Obviously the numbers can be different.
I'm using the default NIC driver (trying to avoid the pain of the ofed driver).
I've used mstconfig to configure the virtual functions:
mstconfig -d 0000:41:00.0 -y set SRIOV_EN=True NUM_OF_VFS=64
This works fine.
The PCI IDs for each virtual function look to be stable between boots, its just the iommu groups that seem to change.
I'm guessing it's non-deterministic during boot for which port and therefore VFs enumerate first, and perhaps this causes variability in the IOMMU group.
I've Tried changing a bunch of bios settings that are related to pci device passthrough etc, but its not changed the behaviour.
Device passthrough works perfectly fine if I use the PCI IDs, including for the intel XL710s that we're passing through for some production VMs.
Thanks!
I've got a PVE cluster with Supermicro H13SSL-N single socket based servers with Epyc 9254 CPU, 2x dual port Intel XL710 NICs, and 1x dual port ConnectX-4 LX NIC.
I'm trying to use resource mappings to pass in the ConnectX-4 NIC's virtual functions.
The problem I'm having is every time I reboot, the iommu groups change for the NIC virtual functions for the ConnectX-4 card. This causes errors for resource mappings. VMs won't power on until i re-define the mapping so the iommu group gets updated.
The error message appears under Datacenter / Resource Mappings.
The Status field for the mapping on a host thats been rebooted says: "Configuration for iommu group is not correct ('97' != '59')"
Obviously the numbers can be different.
I'm using the default NIC driver (trying to avoid the pain of the ofed driver).
I've used mstconfig to configure the virtual functions:
mstconfig -d 0000:41:00.0 -y set SRIOV_EN=True NUM_OF_VFS=64
This works fine.
The PCI IDs for each virtual function look to be stable between boots, its just the iommu groups that seem to change.
I'm guessing it's non-deterministic during boot for which port and therefore VFs enumerate first, and perhaps this causes variability in the IOMMU group.
I've Tried changing a bunch of bios settings that are related to pci device passthrough etc, but its not changed the behaviour.
Device passthrough works perfectly fine if I use the PCI IDs, including for the intel XL710s that we're passing through for some production VMs.
Thanks!